In a world captivated by AI, the race for faster, more efficient compute has become the new frontier. At the heart of this race is Andrew Feldman, co-founder and CEO of Cerebras, a company challenging the giants of the semiconductor industry with a radical new approach. While NVIDIA and others scale out with clusters of GPUs, Cerebras scales up, building the world’s largest single AI chip—the Wafer-Scale Engine. This groundbreaking architecture allows Cerebras to deliver orders of magnitude more speed for AI inference and training, tackling problems that were previously intractable.
In this conversation, Andrew joins Nataraj to discuss the journey of building a deep tech company from the ground up. He shares the initial thesis behind starting a chip company in 2015, the immense engineering challenges they overcame, the shift towards sovereign AI and open-source models, and how Cerebras is redefining performance benchmarks for the entire industry.
→ Enjoy this conversation with Andrew Feldman, on Spotify or YouTube.
→ Subscribe to our newsletter and never miss an update.
Nataraj: You started Cerebras in 2015. There are a lot of neo-clouds coming up now, but back then, trying to create a chip company with existing players like Nvidia, Intel, and AMD was not an obvious thing to do. What was the thesis there back then?
Andrew: I think there are two parts. As a chip and system guy, that’s all we know. So it was obvious to us. We weren’t going to build a web app; that’s not who we are. I think entrepreneurship, like many things in life, pays dividends if you stay true to who you are. The founding team and the people we know are infrastructure builders. That’s what we love and what we’ve done our whole careers. For me and the founding team, it’s building chips and systems and the software that runs them, such that other people’s ideas can run on our machines and take flight. This is my fifth startup, and all the previous startups were building systems. All my co-founders were with me at the last startup I founded. So it was obvious to us that we were going to build a chip in a system.
I think what wasn’t obvious was AI. In 2015, NVIDIA was a $20 billion company, not a multi-trillion dollar company. The world looked very different. We had a meeting with Sam Altman and Ilya Sutskever, and the things they were saying sounded crazy. “Oh, we’re have to worry about safety and AI agents are gonna take over.” You’re looking at them saying this is crazy talk. But what Ilya said was true; it happened. What we saw was a new type of compute, and we thought AI would usher in a new compute workload in the same way that cell phones did, or switches and routers in the late 90s. When a new workload occurs, a new computer architecture emerges and new great companies are born.
When there’s an existing dominant design, as there’s been in x86, there’s been some change in market share between Intel and AMD, but there have been no meaningful entrants in two and a half, three decades. Whereas at this inflection point, at the rise of a new workload, there is tremendous opportunity. When the cell phone workload emerged, who was better positioned to take advantage of that than AMD and Intel? Both failed completely, zero share. New ARM emerged, Apple emerged as a major player, and Samsung, and they’d never been in the chip-making business before. So when you see a dislocation, that’s what we predicted. Now, we clearly didn’t think it would be this big, or we would have raised at a higher valuation. We had no idea that 10 years later, people would be spending $400 billion on CapEx.
In 2016, AI was finding a cat in a picture and making sure it was not a chair. That’s where AI was. But we saw a trajectory that would be big. We didn’t see that it would be this big, but we came to believe that we could build a new type of computer, beginning with the chip and the processor and the system, that would be really good at this workload—not a little bit better or a little bit cheaper, but orders of magnitude faster. And that’s what we did.
Nataraj: What does an iteration cycle look like when you’re trying to achieve that? If you’re building a software product, it’s pretty straightforward. But how do you do that for a hardware product?
Andrew: We chose to build the largest chip in the history of the computer industry. A typical chip is the size of a postage stamp, your thumbnail. This is 56 times larger than the largest chip that had ever been built before. We set out to do fundamental design, creativity, engineering, innovation, and invention. We spent about three years and, when all was said and done, about half a billion dollars to make the first one. Nobody in history had made one this size. By being bigger, we could keep more data on-chip. We could move it less often and less far. We could use less power because moving data is really expensive in power and time. So we could be much, much faster.
The dividend of doing this was huge, but it took us years and a great deal of internal fortitude because it wasn’t right the first time or the second time or the eighth time. In fact, we had about a 15-month period where we were spending eight million a month, and we couldn’t make one. You’re going to a board meeting every six or eight weeks saying, “Nope, still can’t make it.” And then in July or August of 2019, we made one. The founders just stood in a tiny little lab and we watched a computer run, which is about as exciting as watching paint dry. It’s just a big metal box with some lights flashing. We looked at each other and we were stunned. We’d solved a problem that nobody in the computer industry had ever solved before. A few months later we had our first customer, and we have been on a tear since then.
Nataraj: Who was your first customer?
Andrew: Our first government customer was Argonne National Labs, one of the Department of Energy labs in the US. Our first commercial customer was GlaxoSmithKline, the large pharmaceutical company.
Nataraj: You build the largest AI chip and it can have four trillion transistors. What does that mean when you compare it with a thermal chip? Can you pack more SRAM on top of it? How much bandwidth do you get?
Andrew: That’s exactly right. Memory has two types: there’s slow memory that can hold a lot, called DRAM—HBM is a flavor of DRAM—and that’s what GPUs use. They’re called graphics processing units for a reason; they were designed for graphics. Graphics was a problem where you’d move data once, do a lot of work on it, and then bring the results back into memory. There’s a different type of memory called SRAM. Historically, SRAM was extremely fast but with relatively low capacity. By going to a big chip, we could stuff it to the gills with SRAM, overcoming its capacity limitation by using a lot of it. The result is we have both capacity and speed. That’s why we’re 15, 20, 25, 30 times faster—and in some problems thousands of times faster—than B200 GPUs.
Nataraj: So if you have more memory, when you’re doing pre-training, you don’t have to break down the model as often as you would on a GPU. Is that the advantage?
Andrew: That’s right. Let’s look at inference because it’s a neater example. In generative inference, to generate a single word or a token, you have to move all the weights from memory to compute to do a giant matrix multiply. For a 70 billion parameter model, which is not very big, you’re going to move about 100 full-length movies’ worth of data to generate one word. If your memory is off-chip, you’ve got a thin little pipe to the GPU. You’ve got this slow memory with a thin pipe, and you’ve got to move a hundred movies’ worth of data across it to generate one word. That pipe is what we measure in memory bandwidth. By putting the SRAM right next to the compute core on the same silicon, we move more than 2,600 times more data more quickly. As a result, the inference results come out faster. It’s just that simple. Memory bandwidth is a known Achilles’ heel of GPU architectures, and it was one of the things we saw in our design that we could do vastly better.
Nataraj: Right now it seems it’s all about compute and pre-training, but everyone is now scaling inference. Two or three years down the line, do you think we’ll spend more on inference than compute as an ecosystem?
Andrew: Inference and training are different. Until about early 2024, AI was mostly a novelty. It was cool, but it wasn’t doing real work. During that time, everybody focused on training because training is how we make AI, but inference is how we use AI. When AI was a novelty, nobody was using it. What’s happened since mid-2024 is the use of AI has exploded, and that’s the inference explosion people talk about. Not only have more people been using it, but they use it more often and to do more complicated things. Each of those increases the compute needed, which is a product of three rapidly growing dimensions. That’s exponential growth, and that’s why inference has just exploded.
Nataraj: What is it like for a new chip company to compete in this space with AMD and Nvidia?
Andrew: Hard. Look, Jensen Huang and Lisa Su—who bought my last company—are two of the three great CEOs over the last 10 or 15 years. If you throw in Hock Tan at Broadcom, those three leaders have outperformed just about everybody else in the world. They’re dazzling. But their size also creates opportunity for us. They can’t move as quickly as we can. They can’t take the type of engineering risks we take. They can’t hire the caliber of people we can who don’t want hierarchy and structure. So there’s tremendous opportunity for the bold entrepreneur. You both have to take the giant in the field seriously and know that they can be beaten.
Nataraj: I think what you said about sticking to your strengths resonates. But as a strategy, is it better to take on a very big, high-stakes, hard problem than something that sounds a little easier?
Andrew: It depends on your passion and where you are in your career. Chip projects are enormously expensive and historically have not been a good place for young, first-time CEOs. There are a lot of returns to experience in the chip business. Other parts of the entrepreneurial ecosystem have been extraordinarily good to young CEOs, particularly where they and their friends look like their customers. There, they have unique expertise that experience can’t replicate. When they’re coding for their friends or building tools they want to use, that entrepreneur has an advantage. Obviously, the entire wave of social networking companies were like that, and many AI companies are like that now.
But in the chip business, you have to design the logic, have relationships with a fab, use EDA tools that cost millions a year, and do back-end and physical design. There are very few great hardware teams—maybe eight or 10 in the world—and we have one of them. They’ve been with me for 20-plus years, which made it easier for us to raise money for a big idea.
Nataraj: What do you think about the narrative that compute or inference will become a commodity?
Andrew: It’s an irony because Nvidia’s gross margins are 73%. Everybody says it could be a commodity, yet they have the highest gross margins of any hardware company in history. They look like a software company. As you know less about something, the details and complexity go away. From the moon, the Earth is just a blue and white orb. Get up close, and there are religious, political, and technology battles. People who don’t really know a domain often say it will be commoditized. AI compute is not looking like it’s going to be commoditized. Andrew Ng said the other day he’s never met anybody in AI who feels they have enough compute. That’s not a market driving toward a commodity.
Nataraj: Can you talk about your product strategy? You have the physical chip, data centers, and your own cloud, plus on-premise deployments.
Andrew: That’s pretty close. We don’t sell the chip; we sell a computer—the chip in a system. It’s parallel to, say, the NVL-72. We sell a whole solution that comes in a rack, fully delivered with everything you need. We will deploy that on your premise, or you can buy cycles on it from our cloud or our customer’s cloud. You can buy it through Amazon Marketplace, Microsoft Marketplace, from OpenRouter, Hugging Face, Vercel, or lots of other places. We have both cloud and on-premise offerings. You can bolt on to us via an industry-standard, OpenAI-like API. Finally, for large on-premise customers, we offer forward-deployed engineering, where our engineers collaborate with yours to design models, clean data, or work on a data pipeline to accelerate solution delivery.
Nataraj: You decided not to sell the chips directly. Why not, considering Nvidia’s high margins?
Andrew: I’m a believer in the system business. It’s very hard to get paid for software when you’re selling chips. Historically, there are very few examples of a successful entry strategy when you sell the chip on a PCI card. Then you’re dependent on Dell or Supermicro for your I/O and power. At large volumes, everybody is buying either DGXs, NVL-72s, or Cerebras boxes. AMD was so far behind in building systems that they had to buy ZT for billions because they didn’t have the expertise. In this market, building and delivering systems is how consumers, and even cloud providers, wish to consume.
Nataraj: You mentioned being available on marketplaces. When customers go to the Azure marketplace, are they hitting your data center?
Andrew: It depends. When you use Condor Galaxy from G42, you’re generally hitting their equipment. When you go to Microsoft Marketplace or AWS Marketplace, you are hitting our equipment in our data centers. We’ve made it easy for you to purchase, but their tokens have been directed to us.
Nataraj: One of the biggest challenges for a new chip company is CUDA. How do you handle that?
Andrew: When you do inference, you don’t use CUDA. The rise of inference weakens CUDA. That’s a really important observation. Developers who want to bring a cool chat application into their app don’t need to know any CUDA; they just need an API to bolt on to a chatbot. While CUDA is a moat in the training business—and we’ve developed compilers that take in PyTorch to get around that—in the inference business, CUDA is irrelevant.
Nataraj: If someone is already running inference on an NVIDIA GPU cluster, how easy is it to switch over?
Andrew: Ten keystrokes. If you’re using OpenAI on Azure, your API says something like `get_OpenAI_something`. You just change it from OpenAI to Cerebras and pick your model. That’s it. It’s literally 10 keystrokes.
Nataraj: You have a big strategic partnership with G42. Can you explain what G42 is?
Andrew: You should think about it as a sovereign institution and a national champion for the UAE. They are the AI national champion, and the country’s leadership has decided to make AI a priority. They are building data centers, investing in AI technologies globally, and building large partnerships. We’ve been working together for several years, and it’s been extraordinary. We have built out for them some of the largest AI data centers in the world, located in the US. We recently received licenses to deliver equipment in the UAE. It’s a huge partnership. Together, we’ve trained models in Arabic, done genomic research, and we serve customers throughout the world.
Nataraj: I heard one of your engineers mention that you discovered how much faster inference was while solving for training. Can you talk about that?
Andrew: Between 2000 and 2024, there wasn’t an inference business out there. Until ChatGPT, nobody was doing large-scale inference in production, so all of us were doing training. We saw that our architecture had enormous advantages not just for training, but for inference. I was probably too slow in recognizing that and adding resources to build out our inference program. I wish I had done that six or eight months earlier, but right now that business is on an absolute tear. There is no substitute for working with customers and building things for learning and for product strategy. It’s very hard to do it in a conference room.
Nataraj: You filed an S-1 to go public and then decided not to. What happened there?
Andrew: We didn’t decide not to go public. We decided we needed to revise the material. The data in our S-1 had gotten stale. We took it down and told everybody we would put it back up after we cleaned it up and updated it. It was from the summer of 2024, and it no longer was a good picture of our business. The numbers were too small. Imagine how much changes in 15 months in the AI business. We’re doing more business and have more customers. We took it down, and we’ll put it back up once we’ve cleaned and improved it.
Nataraj: Why can’t Nvidia build a wafer-scale chip like you did?
Andrew: First, anybody can try anything. Second, you have to ask, why didn’t they? We have a huge number of patents, and many of the most obvious approaches are foreclosed. But companies work around patents all the time. The truth is, it would take them five or seven years and two or three billion dollars, and we would still be 10 years ahead. For five years we’ve been delivering, and for 10 years we’ve been building and inventing. We knew exactly why the B200 would be late. It was for a problem we’d solved in 2018. Seven years earlier, we solved the problem that caused them to be 18 months late: the coefficient of thermal expansion mismatch between the chips and the interposer.
Nataraj: Do you have a thesis on AGI?
Andrew: There’s a huge amount of productivity gain in business to be done that doesn’t need anything close to AGI. For those who’ve ever worked in a large company, the amount of time spent cutting and pasting data from Salesforce to Workday is not superhuman work. Auditing is subhuman intelligence. Machines could do a better job. I can list dozens of functions that would make us vastly more efficient without needing gold-medal math capabilities. Many of the most frustrating things in day-to-day life are friction, not a need for superhuman intelligence. So there is tremendous opportunity long before we get to AGI. My experience in technology is that the last 10% takes 50% of the time. Look at self-driving. We’ve been 90% of the way there for a decade. We’re sort of there in a few cities, but not generally. The last 5% is really hard.
Nataraj: What are some underrated things in AI right now?
Andrew: I think underrated things are doing simple things. Everyone’s talking about doing hard things, but we should focus on the simple things that cause friction and frustration—payroll, tracking headcount. HR tools are horrible. AI can do a way better job. Also, on the other end of the spectrum, I think inference at the edge in tiny little sensors is very interesting. The sensor by the brake of your car, on the barrel of a gun for the military, or on a machine at a manufacturing plant. This isn’t big AI, this is tiny AI that does a little bit of inference to be sure that the data sent back is useful, not garbage. In the sub-milliwatt and milliwatt category of sensors with a little bit of AI, I think that’s something people aren’t talking about that will be very interesting.
Nataraj: Where should young founders look for ideas?
Andrew: In my experience, people are best at the things they really enjoy. I would begin in domains that you really enjoy. If you’re passionate and you write code to solve these problems in your free time, it’s not work. You’re just pursuing your passions. I would stay in that area. And take seriously what your friends believe they need. Some of the best ideas came from young entrepreneurs seeing holes in the tools and apps available to them. How come I can’t tell if she’s got a boyfriend? Well, there’s Facebook. I’ve been married 30 years; that’s not a question I ask. I could never come up with that idea. But at 19 or 20 in college, it’s really important. Exploring around the things you know well is the best advice.
Nataraj: What is the most contrarian belief that you have these days?
Andrew: I think we’ll have peace in the Middle East sooner than people think. The returns to being moderate have been demonstrated by the UAE. In 2005, the GDP of Dubai was the same as Gaza. In 2010, nobody had heard of Dubai. They chose a moderate path and have built an extraordinary nation. You’re seeing that movement in Qatar and Saudi Arabia. In those regions, people are too busy to hate right now; they’re busy working and building cool things. I got an email from a Hindu manager reminding me, a Jewish guy, to wish a Muslim team member Happy Eid. That happens in Silicon Valley because we’re all working together to build stuff. We don’t care. The only question is, can they build something cool? If we trade, do business together, and build things together, then there’s nothing to hate.
This deep dive with Andrew Feldman reveals the immense challenges and contrarian thinking required to build a category-defining hardware company. His insights on tackling monumental engineering problems, navigating the competitive AI landscape, and the future of compute offer a masterclass for any founder in deep tech.
→ If you enjoyed this conversation with Andrew Feldman, listen to the full episode here on Spotify or YouTube.
→ Subscribe to our Newsletter and never miss an update.
