Dylan Patel on the GPU Shortage, Nvidia and the Deep Learning Supply Chain
Dylan Patel is Chief Analyst at SemiAnalysis a boutique semiconductor research and consulting firm specializing in the semiconductor supply chain from chemical inputs to fabs to design IP and strategy. The SemiAnalysis substack has ~50,000 subscribers and is the second biggest tech substack in the world. In this interview we discuss the current GPU shortage, why hardware is a multi-month process, the deep learning hardware supply chain and Nvidia’s strategy.
(Note: you can click on any sub-topic of your liking in the outline below and then come back to the outline by clicking on the green arrow ⬆)
- Current Bottlenecks In Deep Learning Hardware
- Will Hardware Be The Bottleneck As Compute Demand Increases
- Nvidia’s Strategy
- Final Thoughts On Semiconductors From Dylan
Michaël:I’m here with Dylan Patel. A lot of people at ICML have recommended I talk to you about your blog, Semi Analysis. When I met this guy, I had no idea what his name was, but then I connected everything together, and now I’m ‘This is the guy’. So yeah, welcome to the show.
Dylan: Oh yeah, well, thank you. ⬆
Dylan’s Background, How Semi Analysis Started
Michaël: What’s your background in hardware? How did you get into this?
Dylan: I started consulting on the side of my job, hush hush, in 2017 and then I went full time in 2020 and I got bored in 2021 and I started a newsletter on Substack and then that newsletter’s done really well.
Michaël: Semi analysis?
Dylan: Yeah, semi analysis. And so we actually post mostly about hardware and a lot of times AI hardware as well, but a lot of times hardware, but, from time to time, turns out systems people know a lot about architecture of what’s running on their system.
Dylan: That’s sort of the backend, and trying to figure out what infrastructure is needed, how much networking do I need versus compute versus memory bandwidth, what’s the order of operations and how are things scaling? How does that influence hardware design?
Dylan: That’s important for my business itself. Not the newsletter, but the business. It’s understanding these sorts of things and the cost of running these models.
Michaël: What’s your business?
Dylan: It’s consulting for semi-architecture.
Dylan: That’s the business. And so, and I’d been doing that, before I started the newsletter and then the newsletter’s fun. And it’s grown a lot and has a lot of people reading it. That’s fun too. ⬆
Current Bottlenecks In Deep Learning Hardware
What Caused The GPU Shortage
Michaël: What’s the lastest news? I’ve heard there are a lot of shortages in GPUs, it’s harder and harder to get GPUs in clusters A100 or H100. Do you have any insights on this? What’s the current offer and demand?
Dylan: Yeah, I mean, since November, people have been wanting GPUs and that’s accelerated as money has flooded in. Meta sort of started investing heavily in GPUs and more so August last year, but they really, really accelerated than January, February.
Michaël: Was there anything that caused them to accelerate?
Dylan: Well, of course, there’s the chatGPT moment, partially, but there’s a lot of stuff around the recommendation networks and the infrastructure they were using there that they needed to update. And There’s multiple things, but you amalgamate that across every company.
Dylan: That’s when Meta started investing. Well, Google, started investing at this time. And then of course, they don’t buy so many GPUs, they mostly buy TPUs, but that production capacity is actually very similar from TSMC’s 5nm.
Michaël: Then TSMC was bottlenecked by not having enough workers and the price went up?
Dylan: It’s not really TSMC’s workers. ⬆
Hardware Involves Multi-Months Processes, Even Years
Dylan: The thing about hardware that is so different than software is that it takes, the moment you even put an order and even if TSMC started working on it immediately, you would not get a chip out for three more months.
Dylan: And that’s if the chip was already designed and you already submitted the design with TSMC and that’s a multi-year process, and then taped out and all the bugs were fixed and et cetera. It’s a multi-year process for even just getting a chip that you’ve designed, you’re started to design in a production. And even once it’s in production, if you want to increase orders, it takes three plus months to just get the chip back from TSMC, let alone put it into a server and then get that server installed at a data center.
Dylan: It’s a multi-month process. ⬆
Six Months Between The Order And The Datacenter Installation, Different From Open Source
Dylan: And where software is kind of, you think about it being snap of the finger. Especially a lot of the open source stuff we see is yeah, some guy was working all night and he uploaded this and now another person’s working all night and actually, oh no, another team of people working all night and it’s everyone’s night is different. Then they all worked all night.
Dylan: And so it’s you see things iterate really rapidly. But in hardware, it takes, call it four or five, five or six months between, when an order is placed and you can actually have it installed in your data center. If it got worked on immediately. ⬆
Expanding The Production, The Equipment Supply Chain
Dylan: Now, the other aspect is yeah, there’s some slack capacity. I mean, you think about manufacturing, you make a massive capital investment of billions and billions and tens of billions of dollars. And then you start running the production, but then it’s oh, well, I want more chips of this kind than you have production of that kind. So it’s okay, well then now I need to expand production.
Dylan: That’s the thing that’s going on and we’ve done a lot of work, in that. The equipment supply chain. There’s about 30 to 40 different equipment suppliers in the supply chain for not just making the chips, but actually just packaging them together. And that’s one aspect, that we’ve been following. ⬆
Putting Chips Together With The Memory Is TSMC’s Bottleneck
Dylan: And that’s actually one of the biggest bottlenecks is because, they actually, TSMC actually does have slack capacity for making the chips. They don’t have a slack capacity for putting the chips together with the memory, with the HBM memory. That’s actually the biggest bottleneck for them.
Michaël: Where does the memory come from?
Dylan: The memory comes from SK hynix and Samsung, Korean companies, they dominate the memory market, SK hynix especially. And That’s, that is another aspect increasing production for. It’s a different part of the supply chain and there’s many parts of the supply chain. Power supplies and cables and optical cables and all these sorts of things need to harmoniously come together.
Dylan: And each of those has a different supply chain with a lot of times different equipment and different lead times. And, hey, this, it takes three months and hey, I have access to supply here. And so inventory, so I can sell you some more. Everyone has to balance their supply chain and move up. ⬆
Will Hardware Be The Bottleneck As Compute Demand Increases
A Humonguous Increase In GPU Capacity, Comparing GPU Clusters
Dylan: But the long and short of it is that there’s actually a humongous increase in GPU capacity. There’s more GPU FLOPS shipping this year than there have that Nvidia shipped their entire history for the data center. That’s how much they’re shipping this year more. Part of that’s off the back of the H100 being, call it four to five times more FLOPS, but also part of that’s off the back of there’s more units too.
Dylan: But it’s just that so many people want more GPUs. And if you go to Bay Area companies. You talk to all these Bay Area companies Inflection and Meta and all these, and OpenAI and everybody’s bragging about how many GPUs they have. And it’s really cute. and it’s and it makes sense. Researchers want GPUs to run their experiments. And so it’s funny.
Michaël: Comparing how much money you make is how many GPUs you ⬆ have.
Five To Seven Companies Will Have GPT-4 Compute In The Next Year
Michaël: Do you think the increased demand in GPUs and the stock of Nvidia going up, et cetera, all this money in the markets will turn to all processes being more efficient, producing more things? Or do you think in the next three years it’s still going to be Nvidia, TSMC and all the same players being bottlenecked by small things?
Dylan: No, I mean production is increasing rapidly. I mean Nvidia is going to ship over a million H100s plus A100s this year. And that’s a lot of GPUs. If you think about how many GPUs GPT-4 was trained on and it’s well actually yeah, and then there’s multiple labs that are going to have enough GPUs to train something, not to say they have the skillset, but to say they have enough GPU horsepower and presumably they have enough skill, they have a lot of the skillset because it’s a lot of people from Google and OpenAI who have left and created these companies.
Dylan: But, I do believe that there’s going to be, five to seven companies that will have a GPT-4 size model at least. In terms of total FLOPS. It’s going to have a slightly different architecture, maybe fewer parameters, more tokens or vice versa.
Dylan: There’s a lot of innovations to do on the software side as well, but there’s five to seven companies who will have a model that is that many sort of GPU hours or compute FLOPS, GPU FLOPS in the next year of the quality of GPT-4… or of the size of. And then quality is beholden to a lot of other things. ⬆
Will Hardware Be The Bottleneck If Scale Is All We Need?
Michaël: I’m not sure how much you believe we could get human level AI in this decade, but assuming we go to this level by just scaling models up, meaning we can just scale the models 1,000 times and we get something close to human level across many different tasks. Do you think hardware will be the bottleneck? Will we have enough money to do those kinds of things, but we will just don’t have the supply chain to do it?
Dylan: I think, there’s so many GPUs being shipped this year and next year and I look at the supply chain and the amount of GPUs that are gonna get shipped next year is fricking crazy. And at least that’s what the supply chain is preparing to do now, of course, is there demand for it?
Dylan: But I think, yes, we can just scale another 1,000 X from GPT-4. And that is something humanity can easily do. And does that deliver, I mean, we’ve seen, sort of, I do joke transformers are the most capitalist thing that’s ever been invented because you just throw more money at it and it gets better. A log linear scaling is perfect. You look at these scaling laws, it’s yeah, it actually just keeps getting better. I don’t know if do we run out of data. Well, then we just go multi-modal and that’s more expensive on compute. ⬆
125 thousand H100s For Training In The Next Year Or Two
Dylan: I don’t see why, multiple companies, this is OpenAI, Meta and multiple other companies, why don’t they have, why isn’t their next model trained, our next mega model, maybe in the next year or two, get trained on a system that’s, 125,000 H100s. Which are 3X faster.
Dylan: That’s a, 100X, or 10X, or whatever. And maybe they train it for longer, I think you can continue to scalem and hardware will get better and the ability to build a larger and larger supercomputer because the economic benefit from these massive models is getting better.
Dylan: So I think people won’t be bottlenecked by hardware, more so about the ingenuity of putting them together and creating software that can scale and all that sort of stuff to work on, I don’t think people are gonna use dense models, that’s stupid, but yeah. ⬆
The Nvidia Customer Democracy, Independent Of The Demand
Michaël: It’s more the demand is not linear, if I buy a million GPUs from Nvidia, they will probably not want to because they want to keep their, some of their offer to other players. One company cannot just buy all the GPUs from Nvidia.
Dylan: Well, yeah, they don’t allow that, of course.
Michaël: Yeah, they don’t allow that, of course. So you can only buy let’s say 10% or something without the price going up too much.
Dylan: Well, I don’t think they’re raising the prices. They’re just literally saying we don’t have allocation and if you want more GPUs, then wait till we have built more. they’re gonna build about 400,000 GPUs and sell about 400,000 H100s in Q3. And okay, plausibly Meta is gonna buy 30,000, 40,000 of those. And OpenAI is gonna buy 30 or 40,000 of those through Microsoft and Microsoft for their cloud will buy another 10 to 20,000.
Dylan: But I’m sure they want more, but it’s… all Nvidia is going down their whole customer list of the big customers, yeah, well, I wanna give this many to this person, this person, this person, this person, and next quarter you can buy more. And I think that’s sort of how they’re playing the game and making sure nobody gets all the GPUs because they want AI to sort of be democratized because, or not democratized, but they wanna have many customers to sell their GPUs to. ⬆
Why Nvidia Invested In Inflection
Dylan: It’s actually in their success for, I mean, this is why they invested in in Inflection. It’s oh, I want another AI lab who’s buying GPUs long-term.
Michaël: Who invested in Inflection?
Dylan: Nvidia did. - Yeah.
Dylan: Many other people did as well, but Nvidia did, yeah. And it was oh, well, who’s these random, this random startup, do I wanna get them 22,000 GPUs this year? Well, actually, yeah, it’s better to, take 5,000 away from Meta and 5,000 away from Microsoft and 5,000 away from Amazon and give them to these guys because that creates another competitor. And so I think they are also playing that game.
Michaël: I think that they also raised something more than a billion dollars or something.
Dylan: Yeah, yeah, yeah, of course. And Nvidia was one of the investors. ⬆
Final Thoughts On Semiconductors From Dylan
Michaël: I think I ran out of questions. Do you have any last message to the Twitter or YouTube audience? Something that people, that you repeat to people and you want to have them learn or something.
Dylan: Yeah, I mean, if you’re just interested in learning more about semiconductors, I think they are the most complex thing in the world. AI is awesome and I love AI, but semiconductors are, in infinteasibly complex. There’s millions of people working on things and each person is in such a niche of the industry. It’s amazing how much you don’t understand even after having worked in them for a long time. So yeah, I’d say, hey, figure out working on semiconductors doesn’t necessarily need to be chip design. It can be it could be aspects of production. It could be aspects of design, manufacturing and many things.
Michaël: But yeah, so check out his guys blog, Simeon Elizeas is great.
Michaël: And hire him for a consulting.
Dylan: Yeah. (laughing) ⬆