id: QgoNsbMw4wN8EBuCHqBvB date: 2024-11-25T21:05:45.438Z summary: DeepSeek Interview messages: - id: g3aYI43JVopTlg4mpDWRM date: 2024-11-25T19:26:48.327Z type: system text: I am a careful translator that responds with excellent english translations imageUrls: [] - id: KFyd7JT3-5TcicK31rLP4 date: 2024-11-25T19:27:02.773Z type: human text: >- 揭秘DeepSeek:一个更极致的中国技术理想主义故事 Original 暗涌 暗涌Waves 2024年07月17日 12:01 Image 文 | 于丽丽 编辑 | 刘旌 Image 中国的7家大模型创业公司中,DeepSeek(深度求索)最不声不响,但它又总能以出其不意的方式被人记住。 一年前,这种出其不意源自它背后的量化私募巨头幻方,是大厂外唯一一家储备万张A100芯片的公司,一年后,则来自它才是引发中国大模型价格战的源头。 在被AI连续轰炸的5月,DeepSeek一跃成名。起因是他们发布的一款名为DeepSeek V2的开源模型,提供了一种史无前例的性价比:推理成本被降到每百万token仅 1块钱,约等于Llama3 70B的七分之一,GPT-4 Turbo的七十分之一。 DeepSeek被迅速冠以“AI界拼多多”之称的同时,字节、腾讯、百度、阿里等大厂也按耐不住,纷纷降价。中国大模型价格战由此一触即发。 弥漫的硝烟其实掩盖了一个事实:与很多大厂烧钱补贴不同,DeepSeek是有利润的。 这背后,是DeepSeek对模型架构进行了全方位创新。它提出的一种崭新的MLA(一种新的多头潜在注意力机制)架构,把显存占用降到了过去最常用的MHA架构的5%-13%,同时,它独创的DeepSeekMoESparse结构,也把计算量降到极致,所有这些最终促成了成本的下降。 在硅谷,DeepSeek被称作“来自东方的神秘力量”。SemiAnalysis首席分析师认为,DeepSeek V2论文“可能是今年最好的一篇”。OpenAI前员工Andrew Carr认为论文“充满惊人智慧”,并将其训练设置应用于自己的模型。而OpenAI前政策主管、Anthropic联合创始人Jack Clark认为,DeepSeek“雇佣了一批高深莫测的奇才”,还认为中国制造的大模型,“将和无人机、电动汽车一样,成为不容忽视的力量。” 在基本由硅谷牵动故事进展的AI浪潮里,这是罕有的情形。多位行业人士告诉我们,这种强烈的反响源自架构层面的创新,是国产大模型公司乃至全球开源基座大模型都很罕见的尝试。一位AI研究者表示,Attention架构提出多年来,几乎未被成功改过,更遑论大规模验证。“这甚至是一个做决策时就会被掐断的念头,因为大部分人都缺乏信心。” 而另一方面,国产大模型之前很少涉足架构层面的创新,也是因为很少有人主动去击破那样一种成见:美国更擅长从0-1的技术创新,而中国更擅长从1-10的应用创新。何况这种行为非常不划算——新一代模型,过几个月自然有人做出来,中国公司只要跟随、做好应用即可。对模型结构进行创新,意味着没有路径可依,要经历很多失败,时间、经济成本都耗费巨大。 DeepSeek显然是逆行者。在一片认为大模型技术必然趋同,follow是更聪明捷径的喧哗声中,DeepSeek看重“弯路”中积累的价值,并认为中国的大模型创业者除应用创新外,也可以加入到全球技术创新的洪流中。 DeepSeek的很多抉择都与众不同。截至目前,7家中国大模型创业公司中,它是唯一一家放弃“既要又要”路线,至今专注在研究和技术,未做toC应用的公司,也是唯一一家未全面考虑商业化,坚定选择开源路线甚至都没融过资的公司。这些使得它经常被遗忘在牌桌之外,但在另一端,它又经常在社区被用户“自来水”式传播。 DeepSeek究竟是如何炼成的?我们为此访谈了甚少露面的DeepSeek创始人梁文锋。 这位从幻方时代,就在幕后潜心研究技术的80后创始人,在DeepSeek时代,依旧延续着他的低调作风,和所有研究员一样,每天“看论文,写代码,参与小组讨论”。 和很多量化基金创始人都有过海外对冲基金履历,多出身物理、数学等专业不同的是,梁文锋一直是本土背景,早年就读的也是浙江大学电子工程系人工智能方向。 多位行业人士和DeepSeek研究员告诉我们,梁文锋是当下中国AI界非常罕见的“兼具强大的infra工程能力和模型研究能力,又能调动资源”、“既可以从高处做精准判断,又可以在细节上强过一线研究员”的人,他拥有“令人恐怖的学习能力”,同时又“完全不像一个老板,而更像一个极客”。 这是一次尤为难得的访谈。访谈里,这位技术理想主义者,提供了目前中国科技界特别稀缺的一种声音:他是少有的把“是非观”置于“利害观”之前,并提醒我们看到时代惯性,把“原创式创新”提上日程的人。 一年前,DeepSeek刚下场时,我们初次访谈了梁文锋 :《疯狂的幻方:一家隐形AI巨头的大模型之路》 。如果说当时那句「务必要疯狂地怀抱雄心,且还要疯狂地真诚」还是一句美丽的口号,一年过去,它已经在成为一种行动。 以下为对话部分 Image 价格战第一枪是怎么打响的? 「暗涌」:DeepSeek V2模型发布后,迅速引发一场血雨腥风的大模型价格战,有人说你们是行业的一条鲶鱼。 梁文锋:我们不是有意成为一条鲶鱼,只是不小心成了一条鲶鱼。 「暗涌」:这个结果让你们意外吗? 梁文锋:非常意外。没想到价格让大家这么敏感。我们只是按照自己的步调来做事,然后核算成本定价。我们的原则是不贴钱,也不赚取暴利。这个价格也是在成本之上稍微有点利润。 「暗涌」:5天后智谱AI就跟进了,之后是字节、阿里、百度、腾讯等大厂。 梁文锋:智谱AI降的是一个入门级产品,和我们同级别的模型仍然收费很贵。字节是真正第一个跟进的。旗舰模型降到和我们一样的价格,然后触发了其它大厂纷纷降价。因为大厂的模型成本比我们高很多,所以我们没想到会有人亏钱做这件事,最后就变成了互联网时代的烧钱补贴的逻辑。 「暗涌」:外部看来,降价很像在抢用户,互联网时代的价格战通常如此。 梁文锋:抢用户并不是我们的主要目的。我们降价一方面是因为我们在探索下一代模型的结构中,成本先降下来了,另一方面也觉得无论API,还是AI,都应该是普惠的、人人可以用得起的东西。 「暗涌」:在这之前,大部分中国公司都会直接copy这一代的 Llama结构去做应用,为什么你们会从模型结构切入? 梁文锋:如果目标是做应用,那沿用 Llama结构,短平快上产品也是合理选择。但我们目的地是AGI,这意味着我们需要研究新的模型结构,在有限资源下,实现更强的模型能力。这是scale up到更大模型所需要做的基础研究之一。除了模型结构,我们还做了大量其他的研究,包括怎么构造数据,如何让模型更像人类等,这都体现在我们发布的模型里。另外,Llama的结构,在训练效率和推理成本上,和国外先进水平估计也已有两代差距。 「暗涌」:这种代差主要来自哪里? 梁文锋:首先训练效率有差距。我们估计,国内最好的水平和国外最好的相比,模型结构和训练动力学上可能有一倍的差距,光这一点我们要消耗两倍的算力才能达到同样效果。另外数据效率上可能也有一倍差距,也就是我们要消耗两倍的训练数据和算力,才能达到同样的效果。合起来就要多消耗4倍算力。我们要做的,正是不停地去缩小这些差距。 「暗涌」:大部分中国公司都选择既要模型又要应用,为什么DeepSeek目前选择只做研究探索? 梁文锋:因为我们觉得现在最重要的是参与到全球创新的浪潮里去。过去很多年,中国公司习惯了别人做技术创新,我们拿过来做应用变现,但这并非是一种理所当然。这一波浪潮里,我们的出发点,就不是趁机赚一笔,而是走到技术的前沿,去推动整个生态发展。 「暗涌」:互联网和移动互联网时代留给大部分人的惯性认知是,美国擅长搞技术创新,中国更擅长做应用。 梁文锋:我们认为随着经济发展,中国也要逐步成为贡献者,而不是一直搭便车。过去三十多年IT浪潮里,我们基本没有参与到真正的技术创新里。我们已经习惯摩尔定律从天而降,躺在家里18个月就会出来更好的硬件和软件。Scaling Law也在被如此对待。 但其实,这是西方主导的技术社区一代代孜孜不倦创造出来的,只因为之前我们没有参与这个过程,以至于忽视了它的存在。 Image 真正的差距不是一年或两年,而是原创和模仿之差 「暗涌」:为什么DeepSeek V2会让硅谷的很多人惊讶? 梁文锋:在美国每天发生的大量创新里,这是非常普通的一个。他们之所以惊讶,是因为这是一个中国公司,在以创新贡献者的身份,加入到他们游戏里去。毕竟大部分中国公司习惯follow,而不是创新。 「暗涌」:但这种选择放在中国语境里,也过于奢侈。大模型是一个重投入游戏,不是所有公司都有资本只去研究创新,而不是先考虑商业化。 梁文锋:创新的成本肯定不低,过去那种拿来主义的惯性也和过去的国情有关。但现在,你看无论中国的经济体量,还是字节、腾讯这些大厂的利润,放在全球都不低。我们创新缺的肯定不是资本,而是缺乏信心以及不知道怎么组织高密度的人才实现有效的创新。 「暗涌」:为什么中国公司——包括不缺钱的大厂,这么容易把快速商业化当第一要义? 梁文锋:过去三十年,我们都只强调赚钱,对创新是忽视的。创新不完全是商业驱动的,还需要好奇心和创造欲。我们只是被过去那种惯性束缚了,但它也是阶段性的。 「暗涌」:但你们究竟是一个商业组织,而非一个公益科研机构,选择创新,又通过开源分享出去,那要在哪里形成护城河?像5月这次MLA架构的创新,也会很快被其他家copy吧? 梁文锋:在颠覆性的技术面前,闭源形成的护城河是短暂的。即使OpenAI闭源,也无法阻止被别人赶超。所以我们把价值沉淀在团队上,我们的同事在这个过程中得到成长,积累很多know-how,形成可以创新的组织和文化,就是我们的护城河。 开源,发论文,其实并没有失去什么。对于技术人员来说,被follow是很有成就感的事。其实,开源更像一个文化行为,而非商业行为。给予其实是一种额外的荣誉。一个公司这么做也会有文化的吸引力。 「暗涌」:你怎么看类似朱啸虎的这种市场信仰派观点? 梁文锋:朱啸虎是自洽的,但他的打法更适合快速赚钱的公司,而你看美国最赚钱的公司,都是厚积薄发的高科技公司。 「暗涌」:但做大模型,单纯的技术领先也很难形成绝对优势,你们赌的那个更大的东西是什么? 梁文锋:我们看到的是中国AI不可能永远处在跟随的位置。我们经常说中国AI和美国有一两年差距,但真实的gap是原创和模仿之差。如果这个不改变,中国永远只能是追随者,所以有些探索也是逃不掉的。 英伟达的领先,不只是一个公司的努力,而是整个西方技术社区和产业共同努力的结果。他们能看到下一代的技术趋势,手里有路线图。中国AI的发展,同样需要这样的生态。很多国产芯片发展不起来,也是因为缺乏配套的技术社区,只有第二手消息,所以中国必然需要有人站到技术的前沿。 Image 更多的投入并不一定产生更多的创新 「暗涌」:现在的DeepSeek有一种OpenAI早期的理想主义气质,也是开源的。后边你们会选择闭源吗?OpenAI和Mistral都有过从开源到闭源的过程。 梁文锋:我们不会闭源。我们认为先有一个强大的技术生态更重要。 「暗涌」:你们有融资计划吗?看有媒体报道,幻方对DeepSeek有独立拆分上市的计划,硅谷的AI创业公司,最终也都难免要和大厂绑定。 梁文锋:短期内没有融资计划,我们面临的问题从来不是钱,而是高端芯片被禁运。 「暗涌」:很多人认为,做AGI和做量化是完全不同的两件事,量化可以闷声去做,但AGI可能更需要高举高打,需要结盟,这样可以让你的投入变大。 梁文锋:更多的投入并不一定产生更多的创新。否则大厂可以把所有的创新包揽了。 「暗涌」:你们现在不做应用,是因为你们没有运营的基因吗? 梁文锋:我们认为当前阶段是技术创新的爆发期,而不是应用的爆发期。长远来说,我们希望形成一种生态,就是业界直接使用我们的技术和产出,我们只负责基础模型和前沿的创新,然后其它公司在DeepSeek 的基础上构建toB、toC的业务。如果能形成完整的产业上下游,我们就没必要自己做应用。当然,如果需要,我们做应用也没障碍,但研究和技术创新永远是我们第一优先级。 「暗涌」:但选择API的话,为什么选择DeepSeek,而不是大厂? 梁文锋:未来的世界很可能是专业化分工的,基础大模型需要持续创新,大厂有它的能力边界,并不一定适合。 「暗涌」:但技术真的可以拉开差距吗?你也说过并不存在绝对的技术秘密。 梁文锋:技术没有秘密,但重置需要时间和成本。英伟达的显卡,理论上没有任何技术秘密,很容易复制,但重新组织团队以及追赶下一代技术都需要时间,所以实际的护城河还是很宽。 「暗涌」:你们降价后,字节率先跟进,说明他们还是感受到某种威胁。你怎么看创业公司与大厂竞争的新解法? 梁文锋:说实话我们不太care这件事,只是顺便做了这件事。提供云服务不是我们的主要目标。我们的目标还是去实现AGI。 目前没有看到什么新解法,但大厂也没有明显占优。大厂有现成的用户,但它的现金流业务也是它的包袱,也会让它成为随时被颠覆的对象。 「暗涌」:你怎么看DeepSeek之外的6家大模型创业公司的终局? 梁文锋:可能活下来2到3家。现在都还处在烧钱阶段,所以那些自我定位清晰、更能精细化运营的,更有机会活下来。其它公司可能会脱胎换骨。有价值的东西不会烟消云散,但会换一种方式。 「暗涌」:幻方时代,面对竞争的姿态就被评价为“我行我素”,很少在意横向比较。关于竞争,你思考的原点是什么? 梁文锋:我经常思考的是,一个东西能不能让社会的运行效率变高,以及你能否在它的产业分工链条上找到擅长的位置。只要终局是让社会效率更高,就是成立的。中间很多都是阶段性的,过度关注必然眼花缭乱。 Image 一群做“高深莫测”事的年轻人 「暗涌」:OpenAI前政策主管、Anthropic联合创始人Jack Clark认为DeepSeek雇佣了“一批高深莫测的奇才”,做出DeepSeek v2的是怎样一群人? 梁文锋:并没有什么高深莫测的奇才,都是一些Top高校的应届毕业生、没毕业的博四、博五实习生,还有一些毕业才几年的年轻人。 「暗涌」:很多大模型公司都执着地去海外挖人,很多人觉得这个领域前50名的顶尖人才可能都不在中国的公司,你们的人都来自哪里? 梁文锋:V2模型没有海外回来的人,都是本土的。前50名顶尖人才可能不在中国,但也许我们能自己打造这样的人。 「暗涌」:这次MLA创新是如何发生的?听说idea最早来自一个年轻研究员的个人兴趣? 梁文锋:在总结出Attention架构的一些主流变迁规律后,他突发奇想去设计一个替代方案。不过从想法到落地,中间是一个漫长的过程。我们为此组了一个team,花了几个月时间才跑通。 「暗涌」:这种发散性灵感的诞生和你们完全创新型组织的架构很有关系。幻方时代,你们就很少自上而下地指派目标或任务。但AGI这种充满不确定性的前沿探索,是否多了管理动作? 梁文锋:DeepSeek也全是自下而上。而且我们一般不前置分工,而是自然分工。每个人有自己独特的成长经历,都是自带想法的,不需要push他。探索过程中,他遇到问题,自己就会拉人讨论。不过当一个idea显示出潜力,我们也会自上而下地去调配资源。 「暗涌」:听说DeepSeek对于卡和人的调集非常灵活。 梁文锋:我们每个人对于卡和人的调动是不设上限的。如果有想法,每个人随时可以调用训练集群的卡无需审批。同时因为不存在层级和跨部门,也可以灵活调用所有人,只要对方也有兴趣。 「暗涌」:一种松散的管理方式也取决于你们筛选到了一批强热爱驱动的人。听说你们很擅长从细节招人, 可以让一些非传统评价指标里优秀的人被选出来。 梁文锋:我们选人的标准一直都是热爱和好奇心,所以很多人会有一些奇特的经历,很有意思。很多人对做研究的渴望,远超对钱的在意。 「暗涌」: transformer诞生在谷歌的AI Lab,ChatGPT诞生在OpenAI,你觉得大公司的AILab 和一个创业公司对于创新产生的价值有什么不同? 梁文锋:不管是Google实验室,还是OpenAI,甚至中国大厂的AI Lab,都很有价值的。最后是OpenAI做出来,也有历史的偶然性。 「暗涌」:创新很大程度也是一种偶然吗?我看你们办公区中间那排会议室左右两侧都设置了可以随意推开的门。你们同事说,这就是给偶然留出空隙。transfomer诞生中就发生过那种偶然经过的人听到后加入,最终把它变成一个通用框架的故事。 梁文锋:我觉得创新首先是一个信念问题。为什么硅谷那么有创新精神?首先是敢。Chatgpt出来时,整个国内对做前沿创新都缺乏信心,从投资人到大厂,都觉得差距太大了,还是做应用吧。但创新首先需要自信。这种信心通常在年轻人身上更明显。 「暗涌」:但你们不参与融资,很少对外发声,社会声量上肯定不如那些融资活跃的公司,怎么确保DeepSeek就是做大模型的人的首选? 梁文锋:因为我们在做最难的事。对顶级人才吸引最大的,肯定是去解决世界上最难的问题。其实,顶尖人才在中国是被低估的。因为整个社会层面的硬核创新太少了,使得他们没有机会被识别出来。我们在做最难的事,对他们就是有吸引力的。 「暗涌」:前一段OpenAI的发布并没有等来GPT5,很多人觉得这是技术曲线明显在放缓,也很多人开始质疑Scaling Law,你们怎么看? 梁文锋:我们偏乐观,整个行业看起来都符合预期。OpenAI也不是神,不可能一直冲在前面。 「暗涌」:你觉得AGI还要多久实现,发布DeepSeek V2前,你们发布过代码生成和数学的模型,也从dense模型切换到了MOE,所以你们的AGI路线图有哪些坐标? 梁文锋:可能是2年、5年或者10年,总之会在我们有生之年实现。至于路线图,即使在我们公司内部,也没有统一意见。但我们确实押注了三个方向。一是数学和代码,二是多模态,三是自然语言本身。数学和代码是AGI天然的试验场,有点像围棋,是一个封闭的、可验证的系统,有可能通过自我学习就能实现很高的智能。另一方面,可能多模态、参与到人类的真实世界里学习,对AGI也是必要的。我们对一切可能性都保持开放。 「暗涌」:你觉得大模型终局是什么样态? 梁文锋:会有专门公司提供基础模型和基础服务,会有很长链条的专业分工。更多人在之上去满足整个社会多样化的需求。 Image 所有的套路都是上一代的产物 「暗涌」:过去这一年,中国的大模型创业还是有很多变化的,比如去年开头还很活跃的王慧文中场退出了,后来加入的公司也开始呈现出差异化。 梁文锋:王慧文自己承担了所有的损失,让其他人全身而退。他做了一个对自己最不利,但对大家都好的选择,所以他做人是很厚道的,这点我很佩服。 「暗涌」:现在你的精力最多放在哪里? 梁文锋:主要的精力在研究下一代的大模型。还有很多未解决的问题。 「暗涌」:其他几家大模型创业公司都是坚持既要又要,毕竟技术不会带来永久领先,抓住时间窗口把技术优势落到产品也很重要,DeepSeek敢于专注在模型研究上是因为模型能力还不够吗? 梁文锋:所有的套路都是上一代的产物,未来不一定成立。拿互联网的商业逻辑去讨论未来AI的盈利模式,就像马化腾创业时,你去讨论通用电气和可口可乐一样。很可能是一种刻舟求剑。 「暗涌」:过去幻方就有很强的技术和创新基因,成长也比较顺利,这是你偏乐观的原因吗? 梁文锋:幻方某种程度上增强了我们对技术驱动型创新的信心,但也不都是坦途。我们经历了一个漫长的积累过程。外部看到的是幻方2015年后的部分,但其实我们做了16年。 「暗涌」:回到关于原创式创新的话题。现在经济开始进入下行,资本也进入冷周期,所以它对原创式创新是否会带来更多抑制? 梁文锋:我倒觉得未必。中国产业结构的调整,会更依赖硬核技术的创新。当很多人发现过去赚快钱很可能来自时代运气,就会更愿意俯身去做真正的创新。 「暗涌」:所以你对这件事也是乐观的? 梁文锋:我是八十年代在广东一个五线城市长大的。我的父亲是小学老师,九十年代,广东赚钱机会很多,当时有不少家长到我家里来,基本就是家长觉得读书没用。但现在回去看,观念都变了。因为钱不好赚了,连开出租车的机会可能都没了。一代人的时间就变了。 以后硬核创新会越来越多。现在可能还不容易被理解,是因为整个社会群体需要被事实教育。当这个社会让硬核创新的人功成名就,群体性想法就会改变。我们只是还需要一堆事实和一个过程。 图片来源 |IC Photo 排版|姚楠 imageUrls: [] user: username: tarasglek name: Taras Glek avatarUrl: https://avatars.githubusercontent.com/u/857083?v=4 - id: _bYIuoMInSs9JZKv8g2yb date: 2024-11-25T19:27:02.972Z type: ai text: >- ### Unveiling DeepSeek: A More Extreme Story of Chinese Technological Idealism **Original Article by Yu Lili** **Edited by Liu Jing** Among China's seven major large model startups, DeepSeek is the most low-key, yet it always manages to leave a lasting impression in unexpected ways. A year ago, this impression stemmed from its backing by the quantitative hedge fund giant Huanfang, the only company outside the tech giants to stockpile tens of thousands of A100 chips. A year later, it came from being the originator of China's large model price war. In May, when AI was bombarding the industry, DeepSeek shot to fame. The reason was their release of an open-source model called DeepSeek V2, offering an unprecedented cost-effectiveness: the inference cost was reduced to just 1 yuan per million tokens, about one-seventh of Llama3 70B and one-seventieth of GPT-4 Turbo. While DeepSeek was quickly dubbed the "Pinduoduo of the AI world," ByteDance, Tencent, Baidu, Alibaba, and other tech giants couldn't sit still and began to cut prices. China's large model price war was thus ignited. The smokescreen of this price war actually obscured a fact: unlike many tech giants burning money for subsidies, DeepSeek is profitable. Behind this is DeepSeek's comprehensive innovation in model architecture. They proposed a new MLA (Multi-head Latent Attention) architecture that reduces memory usage to 5%-13% of the commonly used MHA architecture. Additionally, their proprietary DeepSeekMoESparse structure minimizes computational load, ultimately driving down costs. In Silicon Valley, DeepSeek is known as the "mysterious force from the East." The chief analyst of SemiAnalysis believes that the DeepSeek V2 paper "could be the best of the year." Former OpenAI employee Andrew Carr finds the paper "filled with astonishing wisdom" and has applied its training settings to his own models. Jack Clark, former policy director at OpenAI and co-founder of Anthropic, believes that DeepSeek "hired a group of enigmatic geniuses" and that Chinese-made large models "will become an undeniable force, just like drones and electric vehicles." In the AI wave that is largely driven by Silicon Valley, this is a rare occurrence. Multiple industry insiders told us that this strong reaction stems from innovations at the architectural level, a rare attempt for both domestic large model companies and global open-source base large models. An AI researcher noted that the Attention architecture, proposed years ago, has hardly been successfully modified, let alone extensively validated. "This is even a thought that would be cut off during decision-making because most people lack confidence." On the other hand, domestic large models rarely ventured into architectural innovations because few were willing to challenge the prevailing notion that the U.S. excels in 0-1 technological innovation, while China excels in 1-10 application innovation. Moreover, such behavior is highly uneconomical—a new generation of models will naturally emerge in a few months, and Chinese companies only need to follow and excel in applications. Innovating on model structures means no existing path to follow, with many failures and significant time and economic costs. DeepSeek is clearly a maverick. Amidst the clamor that large model technology will inevitably converge and that following is the smarter shortcut, DeepSeek values the accumulated value in "detours" and believes that Chinese large model entrepreneurs can also join the global technological innovation stream beyond application innovation. Many of DeepSeek's decisions are unconventional. Among China's seven large model startups, it is the only one to abandon the "both-and" approach, focusing solely on research and technology without venturing into toC applications. It is also the only one not fully considering commercialization, steadfastly choosing an open-source route, and has never raised funds. This often leaves it forgotten at the table, yet on the other hand, it frequently gains "word-of-mouth" promotion from users in the community. How did DeepSeek come to be? We interviewed DeepSeek's rarely seen founder, Liang Wenfeng. This 80s-born founder, who has been quietly researching technology behind the scenes since the Huanfang era, continues his low-key style in the DeepSeek era, spending his days "reading papers, writing code, and participating in group discussions" like all researchers. Unlike many hedge fund founders with overseas experience and backgrounds in physics or mathematics, Liang Wenfeng has always been locally based, having studied artificial intelligence at Zhejiang University's Department of Electronic Engineering. Multiple industry insiders and DeepSeek researchers told us that Liang Wenfeng is a rare figure in China's AI world—someone who "combines strong infrastructure engineering and model research capabilities with the ability to mobilize resources," "can make precise judgments from a high level and excel in details over frontline researchers," possesses "terrifying learning abilities," and "is completely unlike a boss, but more like a geek." This interview was particularly rare. In it, this techno-idealist provided a voice currently scarce in China's tech world: one of the few who places "moral values" before "material values" and reminds us to see the inertia of the times, elevating "original innovation" to the agenda. A year ago, when DeepSeek first entered the scene, we interviewed Liang Wenfeng for the first time: "The Madness of Huanfang: The Large Model Path of a Hidden AI Giant." If the phrase "must embrace ambition crazily and be crazily sincere" was then just a beautiful slogan, a year later, it has become an action. ### Excerpts from the Dialogue **How the First Shot of the Price War Was Fired** **Waves:** After the release of the DeepSeek V2 model, it quickly triggered a fierce large model price war. Some say you are an industry catfish. **Liang Wenfeng:** We didn't intend to become a catfish, just accidentally became one. **Waves:** Was this result unexpected for you? **Liang Wenfeng:** Very unexpected. Didn't expect the price to be so sensitive. We just do things at our own pace and then price based on costs. Our principle is not to subsidize, nor to make excessive profits. This price is slightly above cost with a bit of profit. **Waves:** Zhipu AI followed up 5 days later, then ByteDance, Alibaba, Baidu, Tencent, and other giants. **Liang Wenfeng:** Zhipu AI lowered the price of an entry-level product, but models of the same level as ours are still very expensive. ByteDance was the first to truly follow. Their flagship model was priced the same as ours, triggering other giants to cut prices. Since the giants' model costs are much higher than ours, we didn't expect anyone to lose money on this, eventually turning it into the logic of burning money for subsidies in the internet era. **Waves:** Externally, the price cut seems like a move to grab users, which is common in internet price wars. **Liang Wenfeng:** Grabbing users is not our main goal. We cut prices partly because our costs came down first in exploring the next-generation model structure, and also because we believe that APIs and AI should be inclusive and affordable for everyone. **Waves:** Before this, most Chinese companies would directly copy the current Llama structure for applications. Why did you start with the model structure? **Liang Wenfeng:** If the goal is to build applications, using the Llama structure for quick product development is a reasonable choice. But our destination is AGI, which means we need to research new model structures to achieve stronger model capabilities with limited resources. This is one of the basic studies needed to scale up to larger models. Besides model structures, we've done a lot of other research, including how to construct data and make models more human-like, all reflected in our released models. Also, Llama's structure, in terms of training efficiency and inference costs, is likely two generations behind the international advanced level. **Waves:** What mainly causes this generation gap? **Liang Wenfeng:** First, there's a gap in training efficiency. We estimate that the best domestic level is about twice as inefficient as the best international level in model structure and training dynamics, meaning we consume twice the computing power to achieve the same effect. There may also be a gap in data efficiency, meaning we consume twice the training data and computing power to achieve the same effect. Combined, we need four times the computing power. What we're doing is constantly narrowing these gaps. **Waves:** Most Chinese companies choose to have both models and applications. Why does DeepSeek currently focus only on research exploration? **Liang Wenfeng:** Because we believe the most important thing now is to participate in the global innovation wave. For many years, Chinese companies have been used to others doing technological innovation, and we take it for application monetization, but this is not a given. In this wave, our starting point is not to make a quick profit but to reach the forefront of technology and promote the development of the entire ecosystem. **Waves:** The inertia of the internet and mobile internet era leaves most people with the perception that the U.S. excels in technological innovation, while China is better at applications. **Liang Wenfeng:** We believe that with economic development, China should also become a contributor, not always a free rider. In the past three decades of IT waves, we have hardly participated in real technological innovation. We've been accustomed to Moore's Law dropping from the sky, with better hardware and software appearing every 18 months at home. Scaling Law is treated similarly. In fact, this is the result of the Western-led technical community's continuous creation over generations, just because we didn't participate in this process, we overlooked its existence. **The Real Gap is Not One or Two Years, but Originality vs. Imitation** **Waves:** Why did DeepSeek V2 surprise many in Silicon Valley? **Liang Wenfeng:** Among the numerous innovations happening daily in the U.S., this is quite ordinary. The surprise comes from a Chinese company joining their game as an innovation contributor. After all, most Chinese companies are used to following, not innovating. **Waves:** But this choice is also too luxurious in the Chinese context. Large models are a high-investment game, not all companies have the capital to focus solely on research and innovation without considering commercialization first. **Liang Wenfeng:** The cost of innovation is certainly not low, and the past habit of opportunism is also related to the past national conditions. But now, whether in terms of China's economic size or the profits of giants like ByteDance and Tencent, they are not low globally. What we lack in innovation is not capital but confidence and knowing how to organize high-density talent to achieve effective innovation. **Waves:** Why do Chinese companies—including those with ample funds—so easily prioritize rapid commercialization? **Liang Wenfeng:** For the past three decades, we've only emphasized making money and neglected innovation. Innovation is not entirely driven by commerce; it also requires curiosity and creativity. We are just bound by past inertia, which is also a stage. **Waves:** But you are a commercial organization, not a public research institution. Choosing innovation and sharing it through open-source, where do you form your moat? Innovations like the MLA architecture in May will soon be copied by others, right? **Liang Wenfeng:** In the face of disruptive technology, the moat formed by closed-source is temporary. Even if OpenAI is closed-source, it cannot prevent being surpassed. So we deposit value in the team. Our colleagues grow and accumulate a lot of know-how during this process, forming an innovative organization and culture, which is our moat. Open-source and publishing papers don't lose anything. For technical personnel, being followed is a very rewarding thing. In fact, open-source is more of a cultural behavior than a commercial one. Giving is an extra honor. A company doing this also has cultural appeal. **Waves:** What do you think of market-faith-based views like Zhu Xiaohu's? **Liang Wenfeng:** Zhu Xiaohu is self-consistent, but his approach is more suitable for companies that make quick money. The most profitable companies in the U.S., however, are high-tech companies that accumulate over time. **Waves:** But in large models, mere technical superiority is hard to form an absolute advantage. What's the bigger bet you're making? **Liang Wenfeng:** We see that Chinese AI cannot always be in a following position. We often say there's a one or two-year gap between Chinese AI and the U.S., but the real gap is originality vs. imitation. If this doesn't change, China will always be a follower, so some exploration is unavoidable. Nvidia's lead is not just the effort of one company but the collective effort of the entire Western technical community and industry. They can see the next-generation technology trends and have a roadmap. China's AI development also needs such an ecosystem. Many domestic chips fail to develop because of a lack of supporting technical communities, only second-hand information. So China inevitably needs someone to stand at the forefront of technology. **More Investment Doesn't Necessarily Lead to More Innovation** **Waves:** DeepSeek now has an early OpenAI-like idealistic atmosphere, also open-source. Will you choose to close-source later? Both OpenAI and Mistral have gone from open-source to closed-source. **Liang Wenfeng:** We won't close-source. We believe having a strong technical ecosystem is more important. **Waves:** Do you have any fundraising plans? Some media reports say Huanfang plans to spin off DeepSeek for an independent listing. Silicon Valley AI startups eventually tend to align with tech giants. **Liang Wenfeng:** There are no short-term fundraising plans. The problem we face is never money but the embargo on high-end chips. **Waves:** Many believe that doing AGI and quant are two completely different things. Quant can be done quietly, but AGI may need a high-profile approach, alliances, to make your investment larger. **Liang Wenfeng:** More investment doesn't necessarily lead to more innovation. Otherwise, giants could monopolize all innovations. **Waves:** You don't do applications now because you lack the operational genes? **Liang Wenfeng:** We believe the current phase is a period of technological innovation explosion, not application explosion. In the long run, we hope to form an ecosystem where the industry directly uses our technology and output. We are responsible for the base model and frontier innovation, and other companies build toB and toC businesses on DeepSeek's foundation. If a complete industrial upstream and downstream can be formed, we don't need to do applications ourselves. Of course, if necessary, we can also do applications, but research and technological innovation will always be our top priority. **Waves:** But why choose DeepSeek for APIs instead of giants? **Liang Wenfeng:** The future world is likely to be specialized and divided. Continuous innovation is needed for base large models, and giants have their boundaries, not necessarily suitable. **Waves:** But can technology really create a gap? You've also said there's no absolute technical secret. **Liang Wenfeng:** Technology has no secrets, but resetting requires time and cost. Nvidia's graphics cards, theoretically easy to replicate, but reorganizing the team and catching up with the next-generation technology takes time, so the actual moat is still wide. **Waves:** After your price cut, ByteDance was the first to follow, indicating they felt some threat. What's your view on the new approach for startups to compete with giants? **Liang Wenfeng:** To be honest, we don't care much about this, just happened to do it. Providing cloud services is not our main goal. Our goal is still to achieve AGI. No new solutions have been seen yet, but giants don't have a clear advantage either. Giants have existing users, but their cash flow businesses are also their burdens, making them vulnerable to disruption. **Waves:** What do you think the endgame for the other six large model startups will be? **Liang Wenfeng:** Maybe 2 to 3 will survive. Currently, they are all in the stage of burning money, so those with clear self-positioning and better精细化运营 will have a better chance of survival. Other companies may transform. Valuable things won't disappear but will change form. **Waves:** During the Huanfang era, the attitude towards competition was described as "doing my own thing," rarely paying attention to horizontal comparisons. What's the original point of your thinking about competition? **Liang Wenfeng:** I often think about whether something can improve social efficiency and whether you can find a suitable position in its industrial chain. As long as the end result is to improve social efficiency, it's valid. Many intermediate stages are temporary; excessive attention will only confuse. **A Group of Young People Doing "Enigmatic" Things** **Waves:** OpenAI's former policy director and Anthropic co-founder Jack Clark believes DeepSeek hired "a group of enigmatic geniuses." What kind of people made DeepSeek V2? **Liang Wenfeng:** There are no enigmatic geniuses, just some top university graduates, Ph.D. interns in their fourth or fifth year, and young people who have graduated for a few years. **Waves:** Many large model companies are obsessed with recruiting from overseas, with many believing the top 50 talents in this field may not be in Chinese companies. Where are your people from? **Liang Wenfeng:** There are no returnees from overseas in the V2 model team, all are domestic. The top 50 talents may not be in China, but maybe we can cultivate such people ourselves. **Waves:** How did the MLA innovation come about? It's said the idea originated from a young researcher's personal interest? **Liang Wenfeng:** After summarizing the mainstream evolutionary patterns of the Attention architecture, he suddenly thought of designing an alternative. However, turning the idea into reality was a long process. We formed a team and spent several months to make it work. **Waves:** This kind of divergent inspiration is related to your completely innovative organizational structure. During imageUrls: [] model: deepseek-chat versions: - id: wVKNAnw7Wm3Jp9QC6tj7P date: 2024-11-25T19:31:05.405Z model: deepseek-chat text: >- ### Unveiling DeepSeek: A More Extreme Story of Chinese Technological Idealism **Original Article by Yu Lili** **Edited by Liu Jing** Among China's seven major large model startups, DeepSeek is the most low-key, yet it always manages to leave a lasting impression in unexpected ways. A year ago, this impression stemmed from its backing by the quantitative hedge fund giant Huanfang, the only company outside the tech giants to stockpile tens of thousands of A100 chips. A year later, it came from being the originator of China's large model price war. In May, when AI was bombarding the industry, DeepSeek shot to fame. The reason was their release of an open-source model called DeepSeek V2, offering an unprecedented cost-effectiveness: the inference cost was reduced to just 1 yuan per million tokens, about one-seventh of Llama3 70B and one-seventieth of GPT-4 Turbo. While DeepSeek was quickly dubbed the "Pinduoduo of the AI world," ByteDance, Tencent, Baidu, Alibaba, and other tech giants couldn't sit still and began to cut prices. China's large model price war was thus ignited. The smokescreen of this price war actually obscured a fact: unlike many tech giants burning money for subsidies, DeepSeek is profitable. Behind this is DeepSeek's comprehensive innovation in model architecture. They proposed a new MLA (Multi-head Latent Attention) architecture that reduces memory usage to 5%-13% of the commonly used MHA architecture. Additionally, their proprietary DeepSeekMoESparse structure minimizes computational load, ultimately driving down costs. In Silicon Valley, DeepSeek is known as the "mysterious force from the East." The chief analyst of SemiAnalysis believes that the DeepSeek V2 paper "could be the best of the year." Former OpenAI employee Andrew Carr finds the paper "filled with astonishing wisdom" and has applied its training settings to his own models. Jack Clark, former policy director at OpenAI and co-founder of Anthropic, believes that DeepSeek "hired a group of enigmatic geniuses" and that Chinese-made large models "will become an undeniable force, just like drones and electric vehicles." In the AI wave that is largely driven by Silicon Valley, this is a rare occurrence. Multiple industry insiders told us that this strong reaction stems from innovations at the architectural level, a rare attempt for both domestic large model companies and global open-source base large models. An AI researcher noted that the Attention architecture, proposed years ago, has hardly been successfully modified, let alone extensively validated. "This is even a thought that would be cut off during decision-making because most people lack confidence." On the other hand, domestic large models rarely ventured into architectural innovations because few were willing to challenge the prevailing notion that the U.S. excels in 0-1 technological innovation, while China excels in 1-10 application innovation. Moreover, such behavior is highly uneconomical—a new generation of models will naturally emerge in a few months, and Chinese companies only need to follow and excel in applications. Innovating on model structures means no existing path to follow, with many failures and significant time and economic costs. DeepSeek is clearly a maverick. Amidst the clamor that large model technology will inevitably converge and that following is the smarter shortcut, DeepSeek values the accumulated value in "detours" and believes that Chinese large model entrepreneurs can also join the global technological innovation stream beyond application innovation. Many of DeepSeek's decisions are unconventional. Among China's seven large model startups, it is the only one to abandon the "both-and" approach, focusing solely on research and technology without venturing into toC applications. It is also the only one not fully considering commercialization, steadfastly choosing an open-source route, and has never raised funds. This often leaves it forgotten at the table, yet on the other hand, it frequently gains "word-of-mouth" promotion from users in the community. How did DeepSeek come to be? We interviewed DeepSeek's rarely seen founder, Liang Wenfeng. This 80s-born founder, who has been quietly researching technology behind the scenes since the Huanfang era, continues his low-key style in the DeepSeek era, spending his days "reading papers, writing code, and participating in group discussions" like all researchers. Unlike many hedge fund founders with overseas experience and backgrounds in physics or mathematics, Liang Wenfeng has always been locally based, having studied artificial intelligence at Zhejiang University's Department of Electronic Engineering. Multiple industry insiders and DeepSeek researchers told us that Liang Wenfeng is a rare figure in China's AI world—someone who "combines strong infrastructure engineering and model research capabilities with the ability to mobilize resources," "can make precise judgments from a high level and excel in details over frontline researchers," possesses "terrifying learning abilities," and "is completely unlike a boss, but more like a geek." This interview was particularly rare. In it, this techno-idealist provided a voice currently scarce in China's tech world: one of the few who places "moral values" before "material values" and reminds us to see the inertia of the times, elevating "original innovation" to the agenda. A year ago, when DeepSeek first entered the scene, we interviewed Liang Wenfeng for the first time: "The Madness of Huanfang: The Large Model Path of a Hidden AI Giant." If the phrase "must embrace ambition crazily and be crazily sincere" was then just a beautiful slogan, a year later, it has become an action. ### Excerpts from the Dialogue **How the First Shot of the Price War Was Fired** **Waves:** After the release of the DeepSeek V2 model, it quickly triggered a fierce large model price war. Some say you are an industry catfish. **Liang Wenfeng:** We didn't intend to become a catfish, just accidentally became one. **Waves:** Was this result unexpected for you? **Liang Wenfeng:** Very unexpected. Didn't expect the price to be so sensitive. We just do things at our own pace and then price based on costs. Our principle is not to subsidize, nor to make excessive profits. This price is slightly above cost with a bit of profit. **Waves:** Zhipu AI followed up 5 days later, then ByteDance, Alibaba, Baidu, Tencent, and other giants. **Liang Wenfeng:** Zhipu AI lowered the price of an entry-level product, but models of the same level as ours are still very expensive. ByteDance was the first to truly follow. Their flagship model was priced the same as ours, triggering other giants to cut prices. Since the giants' model costs are much higher than ours, we didn't expect anyone to lose money on this, eventually turning it into the logic of burning money for subsidies in the internet era. **Waves:** Externally, the price cut seems like a move to grab users, which is common in internet price wars. **Liang Wenfeng:** Grabbing users is not our main goal. We cut prices partly because our costs came down first in exploring the next-generation model structure, and also because we believe that APIs and AI should be inclusive and affordable for everyone. **Waves:** Before this, most Chinese companies would directly copy the current Llama structure for applications. Why did you start with the model structure? **Liang Wenfeng:** If the goal is to build applications, using the Llama structure for quick product development is a reasonable choice. But our destination is AGI, which means we need to research new model structures to achieve stronger model capabilities with limited resources. This is one of the basic studies needed to scale up to larger models. Besides model structures, we've done a lot of other research, including how to construct data and make models more human-like, all reflected in our released models. Also, Llama's structure, in terms of training efficiency and inference costs, is likely two generations behind the international advanced level. **Waves:** What mainly causes this generation gap? **Liang Wenfeng:** First, there's a gap in training efficiency. We estimate that the best domestic level is about twice as inefficient as the best international level in model structure and training dynamics, meaning we consume twice the computing power to achieve the same effect. There may also be a gap in data efficiency, meaning we consume twice the training data and computing power to achieve the same effect. Combined, we need four times the computing power. What we're doing is constantly narrowing these gaps. **Waves:** Most Chinese companies choose to have both models and applications. Why does DeepSeek currently focus only on research exploration? **Liang Wenfeng:** Because we believe the most important thing now is to participate in the global innovation wave. For many years, Chinese companies have been used to others doing technological innovation, and we take it for application monetization, but this is not a given. In this wave, our starting point is not to make a quick profit but to reach the forefront of technology and promote the development of the entire ecosystem. **Waves:** The inertia of the internet and mobile internet era leaves most people with the perception that the U.S. excels in technological innovation, while China is better at applications. **Liang Wenfeng:** We believe that with economic development, China should also become a contributor, not always a free rider. In the past three decades of IT waves, we have hardly participated in real technological innovation. We've been accustomed to Moore's Law dropping from the sky, with better hardware and software appearing every 18 months at home. Scaling Law is treated similarly. In fact, this is the result of the Western-led technical community's continuous creation over generations, just because we didn't participate in this process, we overlooked its existence. **The Real Gap is Not One or Two Years, but Originality vs. Imitation** **Waves:** Why did DeepSeek V2 surprise many in Silicon Valley? **Liang Wenfeng:** Among the numerous innovations happening daily in the U.S., this is quite ordinary. The surprise comes from a Chinese company joining their game as an innovation contributor. After all, most Chinese companies are used to following, not innovating. **Waves:** But this choice is also too luxurious in the Chinese context. Large models are a high-investment game, not all companies have the capital to focus solely on research and innovation without considering commercialization first. **Liang Wenfeng:** The cost of innovation is certainly not low, and the past habit of opportunism is also related to the past national conditions. But now, whether in terms of China's economic size or the profits of giants like ByteDance and Tencent, they are not low globally. What we lack in innovation is not capital but confidence and knowing how to organize high-density talent to achieve effective innovation. **Waves:** Why do Chinese companies—including those with ample funds—so easily prioritize rapid commercialization? **Liang Wenfeng:** For the past three decades, we've only emphasized making money and neglected innovation. Innovation is not entirely driven by commerce; it also requires curiosity and creativity. We are just bound by past inertia, which is also a stage. **Waves:** But you are a commercial organization, not a public research institution. Choosing innovation and sharing it through open-source, where do you form your moat? Innovations like the MLA architecture in May will soon be copied by others, right? **Liang Wenfeng:** In the face of disruptive technology, the moat formed by closed-source is temporary. Even if OpenAI is closed-source, it cannot prevent being surpassed. So we deposit value in the team. Our colleagues grow and accumulate a lot of know-how during this process, forming an innovative organization and culture, which is our moat. Open-source and publishing papers don't lose anything. For technical personnel, being followed is a very rewarding thing. In fact, open-source is more of a cultural behavior than a commercial one. Giving is an extra honor. A company doing this also has cultural appeal. **Waves:** What do you think of market-faith-based views like Zhu Xiaohu's? **Liang Wenfeng:** Zhu Xiaohu is self-consistent, but his approach is more suitable for companies that make quick money. The most profitable companies in the U.S., however, are high-tech companies that accumulate over time. **Waves:** But in large models, mere technical superiority is hard to form an absolute advantage. What's the bigger bet you're making? **Liang Wenfeng:** We see that Chinese AI cannot always be in a following position. We often say there's a one or two-year gap between Chinese AI and the U.S., but the real gap is originality vs. imitation. If this doesn't change, China will always be a follower, so some exploration is unavoidable. Nvidia's lead is not just the effort of one company but the collective effort of the entire Western technical community and industry. They can see the next-generation technology trends and have a roadmap. China's AI development also needs such an ecosystem. Many domestic chips fail to develop because of a lack of supporting technical communities, only second-hand information. So China inevitably needs someone to stand at the forefront of technology. **More Investment Doesn't Necessarily Lead to More Innovation** **Waves:** DeepSeek now has an early OpenAI-like idealistic atmosphere, also open-source. Will you choose to close-source later? Both OpenAI and Mistral have gone from open-source to closed-source. **Liang Wenfeng:** We won't close-source. We believe having a strong technical ecosystem is more important. **Waves:** Do you have any fundraising plans? Some media reports say Huanfang plans to spin off DeepSeek for an independent listing. Silicon Valley AI startups eventually tend to align with tech giants. **Liang Wenfeng:** There are no short-term fundraising plans. The problem we face is never money but the embargo on high-end chips. **Waves:** Many believe that doing AGI and quant are two completely different things. Quant can be done quietly, but AGI may need a high-profile approach, alliances, to make your investment larger. **Liang Wenfeng:** More investment doesn't necessarily lead to more innovation. Otherwise, giants could monopolize all innovations. **Waves:** You don't do applications now because you lack the operational genes? **Liang Wenfeng:** We believe the current phase is a period of technological innovation explosion, not application explosion. In the long run, we hope to form an ecosystem where the industry directly uses our technology and output. We are responsible for the base model and frontier innovation, and other companies build toB and toC businesses on DeepSeek's foundation. If a complete industrial upstream and downstream can be formed, we don't need to do applications ourselves. Of course, if necessary, we can also do applications, but research and technological innovation will always be our top priority. **Waves:** But why choose DeepSeek for APIs instead of giants? **Liang Wenfeng:** The future world is likely to be specialized and divided. Continuous innovation is needed for base large models, and giants have their boundaries, not necessarily suitable. **Waves:** But can technology really create a gap? You've also said there's no absolute technical secret. **Liang Wenfeng:** Technology has no secrets, but resetting requires time and cost. Nvidia's graphics cards, theoretically easy to replicate, but reorganizing the team and catching up with the next-generation technology takes time, so the actual moat is still wide. **Waves:** After your price cut, ByteDance was the first to follow, indicating they felt some threat. What's your view on the new approach for startups to compete with giants? **Liang Wenfeng:** To be honest, we don't care much about this, just happened to do it. Providing cloud services is not our main goal. Our goal is still to achieve AGI. No new solutions have been seen yet, but giants don't have a clear advantage either. Giants have existing users, but their cash flow businesses are also their burdens, making them vulnerable to disruption. **Waves:** What do you think the endgame for the other six large model startups will be? **Liang Wenfeng:** Maybe 2 to 3 will survive. Currently, they are all in the stage of burning money, so those with clear self-positioning and better精细化运营 will have a better chance of survival. Other companies may transform. Valuable things won't disappear but will change form. **Waves:** During the Huanfang era, the attitude towards competition was described as "doing my own thing," rarely paying attention to horizontal comparisons. What's the original point of your thinking about competition? **Liang Wenfeng:** I often think about whether something can improve social efficiency and whether you can find a suitable position in its industrial chain. As long as the end result is to improve social efficiency, it's valid. Many intermediate stages are temporary; excessive attention will only confuse. **A Group of Young People Doing "Enigmatic" Things** **Waves:** OpenAI's former policy director and Anthropic co-founder Jack Clark believes DeepSeek hired "a group of enigmatic geniuses." What kind of people made DeepSeek V2? **Liang Wenfeng:** There are no enigmatic geniuses, just some top university graduates, Ph.D. interns in their fourth or fifth year, and young people who have graduated for a few years. **Waves:** Many large model companies are obsessed with recruiting from overseas, with many believing the top 50 talents in this field may not be in Chinese companies. Where are your people from? **Liang Wenfeng:** There are no returnees from overseas in the V2 model team, all are domestic. The top 50 talents may not be in China, but maybe we can cultivate such people ourselves. **Waves:** How did the MLA innovation come about? It's said the idea originated from a young researcher's personal interest? **Liang Wenfeng:** After summarizing the mainstream evolutionary patterns of the Attention architecture, he suddenly thought of designing an alternative. However, turning the idea into reality was a long process. We formed a team and spent several months to make it work. **Waves:** This kind of divergent inspiration is related to your completely innovative organizational structure. During