Building a Custom AM5 Motherboard with Extra PCIe Lanes Using Broadcom Expansion

Building a Custom AM5 Motherboard with Extra PCIe Lanes Using Broadcom Expansion

A tech enthusiast builds a custom AM5 motherboard with extra PCIe lanes using Broadcom expansion chips, explaining the process and challenges.

The FORBIDDEN Motherboard!! Hacking More Lanes into AM5: LR-Link Broadcom PCIe Expansion. | Transcript:

All right, today we're going to build our fantasy AM5 motherboard that I've always wanted since the dawn of time that's going to have more PCIe lanes than is physically possible, and yet it's still going to work. To put it the other way, AM5 with 32 lanes of PCIe Gen 5, we need our friends at Al R Link to help us out. But before I get to that, there's some explanation that is needed. Okay, look. It's hard to get components right now. That's true for workstation and everything else, but some people have a business to run. Some of you, a lot of you in the audience, have small and medium-sized businesses that you have to

support. And I think gone are the days that you could really splurge on a super high-end hardware. Not only that, AMD has AMD epic server CPUs in the AM5 socket. You can also use just regular desktop AM5 CPUs as a server CPU. I've got This is the 650 chipset from Asrock, Asrock Rack. And this will make a perfectly reasonable file server or workgroup server. I mean, you could run 50, maybe 100 people off of this thing, 128, 192 gigs of RAM, depending on what your workload is and what you're doing. It's perfectly reasonable. But the part about it that's not reasonable is PCIe connectivity. But you have to understand PCIe connectivity is just lanes, and lanes are fungible. They can be spent on

all kinds of different things. GPUs, storage, different kinds of storage, networking, you name it. This is a pretty typical very low-end, but still server-class motherboard. We've got one physical x16 slot, and we've got one physical x8 slot, and then a by one slot. And so there's not really a lot of PCIe lanes here. And you can see from the CPU socket that the CPU socket is not very large, so it's not really a true server-class socket. It's a desktop-class socket, but there are CPUs that are available. We've only got two memory channels, four memory slots. But how you spend the PCIe lanes matters a fair bit. Over here, there are four OCuLink connectors, and this gives you some more PCIe lanes. What do motherboard designers do when you've

only got, you know, 20 or 24 PCIe lanes out of the CPU? How do you spend them? Well, 16 of them usually goes to a PCIe slot or two, and like maybe four of them would go to an M.2, and the rest go to the chipset. And the chipset itself takes PCIe lanes and connects to the CPU, but also has PCIe lanes that it manages itself, typically slower PCIe lanes. So, you know, you might have a Gen 5 connection from the CPU to the chipset, and then the chipset might have some Gen 4 PCIe lanes for slower peripherals, or it might have a bunch of, you know, X1 peripherals. That's what we got going on here. This X1 slot doesn't connect to the CPU, even though it's closer to the CPU, it connects to the chipset. The only PCIe slot here

that connects to the CPU is the X16 slot. So, this is not really a fabulous motherboard. If we look at this desktop motherboard from MSI, we have the primary X16 slot, which really is 16 PCIe lanes, PCIe Gen 5 direct to the CPU. We also have the primary M.2, which is five PCIe Gen 5 four lanes directly to the CPU. And we've also got some more M.2 that goes to the chipset. So, there is a high-speed PCIe connection from the CPU to the chipset, but then the chipset has USB and the interface that the network card uses, which is usually also PCIe, although not necessarily. The CPU actually does have a couple of USB resources built right into it, which is sort of a fun interesting thing that AMD did.

AMD has the X870E chipset, which is actually two chips that are daisy-chained together. You would think that there'd be, you know, you want that tree to be as short and fat as possible, the tree of peripherals, so that each chipset connects directly to the CPU. No, that's not how it works. Part of the reason for that is PCIe 5 is such a high-speed signal, it's hard to keep the signals clean. It's more convenient to just daisy-chain the connections together from one chipset to another because of the length of the board. The copper wires in a PCB like this are a little problematic. But PCIe, again, like I say, is fungible. So, it's like this is a PCIe slot like you're used to, it's 16 lanes. Depending on

other resources that are available for timing, you could break that out into 16 one-lane PCIe slots. Commonly, on some motherboards, you might see an X16 slot and then another slot down here, and in the manual, it would say it's X8 uh /0/X8 or X8. And so, you can take eight of your lanes from the primary slot and optionally use them down here. But what if you don't want to give up the PCIe lanes? What if you want 16 lanes up here and 16 lanes down there? It's a math problem. AM5 doesn't physically have that many PCIe lanes, but the clue is in the chipset cuz the chipset's doing something. Aren't there other chipsets?

Yes. Enter our adapter from LR-Link. Now, this is an LR-Link PCIe 5 um X16. It's got 16 Gen 5 lanes, but it has 32 Gen 5 lanes out the other side. Now, this is a special controller from LR-Link, but it's this is uh primarily designed to drive a bunch of in-NVMe, so you could have two, four, six, eight four-lane Gen 5 NVMe connected to this, and it will give you, you know, theoretically, like if you were using these really nice, like the Kioxia CM5, these are 15 GB/s, you're going to overrun the amount of bandwidth that you get from the But, the chipset that's in here is Broadcom, and it's reprogrammable, and I can reprogram it to do different fun interesting things.

This connector is called MCIO, as we've covered in the past. I can take that MCIO connector, and I can go to a board like this. This board has two PCIe slots that are wired for eight lanes each. I could also take this to another board where each slot has 16 lanes, and I could have two 16-lane connections from this card, or I could have two of these carrier boards that each have two slots, each one of these with eight lanes. So, 16 lanes on this board in two slots, plus another one connected to the other set of connectors on the end of this card, and I've got four eight-lane PCIe slots.

There are actually motherboards from the AM4 era and from the Intel era when the number of PCIe lanes was even less, where there were chipsets on the motherboards to do this, to, you know, provide even more PCIe lanes. But, they got expensive because servers needed more PCIe lanes first. And PCIe lanes is still a problem in bigger servers. One of the pieces of magic in modern AI servers from Nvidia that we saw at Computex is that the modern ConnectX NIC has a built-in PCIe switching fabric. And the switching fabric is actually PCIe Gen 6. All this is PCIe Gen 5. The CPUs the slow thing. CPUs not keeping up. There's not CPUs that are PCIe Gen 6. Well, I mean, there are, they're just not out yet. But, what Jensen did with,

you know, modern ConnectX 9 NICs is he moved the networking and the PCIe fabric onto one carrier. So, we saw Supermicro servers where you had all the connections at the back for the networking, one network port per GPU, and that's all on one carrier card. And then, that carrier card has lots of PCIe lanes that connect back to the CPU. But, the reason that you want that is because the GPUs can talk to each other. That also works great with this. So, if I've got two R9 700s in a system that's only AM5, those GPUs can talk to each other twice as fast with this as they can through the CPU, and that is worth it to some people. And I'll also let you run four of these R9 700 GPUs, 128 GB of VRAM on an AM5 system with just one of these

Broadcom chipsets. Now, it's going to run the GPUs with eight lanes instead of 16 lanes, but for AI applications, the GPUs mostly need to talk to each other, not the system. But, if you have one GPU that's receiving a bunch of stuff, it can receive a bunch of data at 16 lanes worth of speed, and then the information could be copied to the other GPUs without ever leaving this PCIe bridge. So, you can still get a ton of performance. There are other companies that are doing fun things with how you spend your lanes. ICY Dock, for example. ICY Dock has this kind of a thing, which is eight lanes.

It's made for PCIe bifurcation mode, but it'll give you two M.2 on this PCIe carrier card with a hot swap on the end. And these PCIe lanes don't necessarily have to be CPU PCIe lanes. They could be chipset PCIe lanes. It doesn't really matter. I could also go from MCIO to this with eight lanes here. This gets a little complicated because then I have to reprogram that bridge and tell it that this is going to be two X4 devices in an X8 slot. So, it's like as if this cable is going to, you know, two M.2 or two U.2. And I have other cables for that here. This is an MCIO header on this side. It's eight PCIe lanes wide, and on this side, I've got two U.2 connectors that are four PCI Express

lanes wide. And depending on the cabling and the re-driver and a bunch of settings that you have on your card, this can maintain PCIe Gen 5 signal integrity. It can be a little tough because PCIe Gen 5 is unforgiving and complex and sensitive to lots of interference. There's foil around all these wires to block hopefully RF interference. So even though if you look at a diagram of our B650 motherboard here, even though it's only got one X16 physical slot, I can go into a PCIe controller like this and then not elegantly, but I can break it out into a different case and then have a bunch more room for GPUs. And this is what it looks like in practice.

I mean, yeah, it's a little bit mess of wires, I'll give you that. But this is a fun way to connect four GPUs even to an AM5 system and it looks a little bit like a mining rig, but understand that it's not a mining rig at all because most mining rigs would connect to a GPU with one PCIe lane. The work to be done would live entirely on one GPU. Here, it's really important that the peripheral be able to communicate with the system at full speed as well as the peripheral being able to communicate with other peripherals. See, there's very little benefit for running multiple GPUs if the GPUs can't talk to each other at full speed. That's what the chipset gives us. It bypasses the CPU. Because well, even at scale, even with $10,000 CPUs,

the CPUs are too slow. PCIe Gen 6 is needed for full bandwidth. So we're doing the same kind of thing just on a smaller scale. Now, to an extent, I think AMD could solve these kinds of problems via the chipset. Remember, their chipset is four PCIe lanes wide. So it's going to be a maximum of eight in PCIe Gen 4 mode or 15 16 GB per second in PCIe Gen 5 mode. I think it'd be better for the chip design to have eight lanes, an eight-lane chipset. And then in the eight-lane chipset, you could go to four two four-lane PCIe Gen 5 M.2, or another PCIe Gen 5 expansion slot, and something

for slower peripherals. Maybe we're going to add a 25 gig NIC, and that would be perfectly fine at PCIe Gen 4. It would be amazing to have an eight-lane PCIe Gen 4 slot on some kind of a board like this. I think in 2026, when we've got this these massive parts shortages, it makes sense to have an AM5 motherboard that has some kind of a chipset on it to give you four eight-lane slots, and maybe a couple of four-lane slots. That would be ideal, because then you can pack GPUs on or high-speed networking, or some combination of GPUs and high-speed networking. And the bottleneck there becomes a little bit the CPU, but now we have dual 3D V-Cache. The other big component that's been missing up until now was having an

AM5 CPU fast enough to keep all of this fed. Ah, but we have the 9950X 3D 2 dual edition. So, the bad news. These are really in demand in server contexts still, even though Nvidia has basically cornered the market. And where previously we were using ASMedia controllers or Broadcom controllers on the motherboard, they've, you know, sort of slid into adjacent markets from that. I mean, there's still a huge market for all of that stuff, but it's less important because of what Nvidia is is doing. And also, there are other competitors that have sprung up that are offering PCIe chipsets like Phison and Texas and, you know, Texas Instruments even.

It's like over a thousand dollars for one of these cards. But if you want to host eight GPUs on a Threadripper board that doesn't physically have eight slots, it can only physically have seven slots. This can get it done. L R link has several uh PCIE bridges or PCIE retimer redriver cards that are appropriate for this kind of thing. If you are building a server that has a lot of GPUs and you're facing PCIE AER errors, a card from L R link and cabling will help alleviate that because it will either completely redrive and retime the signal or it'll go through a PCIE bridge and it's literally signal in, the bridge does PCIE bridge stuff, and then the signal goes back out the other

side because this is addressable on the bridge. It also solves some compatibility issues from different peripherals. Uh I have some proprietary FPGAs in another different work-related scenario, and this solves lots of compatibility issues. There's certain ASUS motherboards where the slots that are closer to the CPU don't have any redrivers or retimers and the slots that are farther away from the CPU have redrivers and retimers, and sometimes you got to fiddle with the drive strength and the retraining stuff in the BIOS, and you could just get one of these and the problem goes away. And even though it's like $1,200, there's a downtime, price of admission, reliability aspect to it, and it's like,

yeah, it's worth the $1,200 to just buy the card and be done with it. Sometimes arranging the PCBs in a physical fashion becomes a little bit of a challenge, but yeah. Now, what kind of a performance difference might it make? I'm glad you asked. This is our AORUS X870E, and it's got really good spacing for two GPUs. I'm running two GPUs off of the CPU, eight lanes each. That is connected to our R9 700s. So, I got a pair of them. 64 gigs of VRAM in this system, and our 9850X 3D two dual edition. Also, uh right now it's only 64 gigs of memory because memory is okay. With Quen 3.6, this is 107 tokens per second for inferencing. For training and anything like that, with this connection each GPU only has eight lanes and the

CPU is responsible for managing that. In this configuration, 107 tokens per second is pretty nice. But what about 125 tokens per second? It's about a 10% performance uplift for inferencing. And if you're doing training or fine-tuning a model, something like that, it can be upwards of 15% faster. How? The LR link connection. Instead of having two GPUs like this, we have it set up like our other machine where there's one card in the primary X16 slot and then our LR link card fans that out into two more X16 slots. Through this

card, the two GPUs can talk to each other at the full PCIE Gen 5 by 16 lanes bandwidth. And when you're doing training or anything like that, you're going to get a little bit of a speed bump. Now, is that much of a speed bump worth $1,000? It depends on what you're doing. For some people, it definitely would be. For some people, this setup is reasonable, a reasonable way to get you to 64 gigs of VRAM and a state-of-the-art model like Quen 3.6 35B that is in the Q8 running at 117 100 well, about 100 tokens per second. And with the deeper context, we're still at over 80 tokens per second in this setup. Of course, with the deeper context, this is also faster. And if you're running even higher-end GPUs like RTX 6000s and

you're doing fine-tuning and training, the performance benefit is even more pronounced because those GPUs have more memory, they're 600 W cards as opposed to these. They become a lot more difficult to juggle in other aspects like power and physical layout and dealing with the heat that much power generates. But, for inferencing and training, you can get upwards of 20% better performance. For just pure inferencing, if you're not doing, you know, a combination task, the performance uplift from just inferencing is not as much. It's also on the order of about 10%. But, 10% difference in, you know, tokens per second for token generation, it's pretty good. Prompt processing, you know, in that 10 to 20% neighborhood, also not bad.

Now, just to be clear, if you're looking at LR-Link's website, they have re-driver, re-timer, and then they also have this. Now, this is meant for We look at this contraption. It's PCIe card on one side and PCIe slots on the other. And these are the eight-lane slots. We could also go to 16-lane slots. So, like, two cables, cuz each cable is eight lanes. But, this is the 4 MCIO 8I NVMe expansion card. This normally comes configured for NVMe. And what this looks like is you have a two NVMe headers per connector. So, you can connect eight of these U.2 to this controller. That is what this normally comes for. It is very off-label and very me and very I'm reprogramming this thing in order to be able to do the PCIe slot

thing, unless you just want eight four-lane PCIe slots. So, don't do what I did with the graphics cards, cuz it's not really meant to do that. But, if you do want to do something like that, you're better off using your motherboard bifurcation and the other card that LR-Link offers, which does retiming and redriving. This one from their website. It's a better choice and is less expensive, because it doesn't involve the crazy expensive Broadcom chipset that can lift and shift PCIe lanes however you want. Just for demonstration, I'm using the Arc B50. It's convenient, cuz it doesn't require external power, but it also validates that I'm getting power through my little add-on board. So, yay. And so, in this configuration, it's in a

shipping config, which is four groups of lanes per port group. So, you've got 32 lanes divided by four for eight NVMe. And our GPU still works fine cuz again, PCIe is kind of fungible. The only tricky part is configuring how many lanes you want where. You got to be able to reprogram the PCIe bridge, not for the faint of heart. Um but you can see that we've got both of our U.2 NVMe in device manager, plus our Intel Arc Pro B50. All connected to our plucky little AM5 platform. And to just be clear, to just really drive the point home, the motherboard slot is configured for X16. We are not using motherboard bifurcation. If you're going to use motherboard bifurcation, you can, but

you should use not the Broadcom adapter, but the passive one, because then the CPU controls how that 16 lanes is divvied up. You could split it up into four groups of four lanes. But, we have eight groups of four lanes because it's PCIe bridge. There's a chip on here that's doing active work figuring out how to sort and route PCIe lanes, as was the case in the days of yore, all the way back on like X299, like the really high-end, like the first X299 motherboards out the gate and some X99 motherboards. Same thing. It's pretty bananas. Okay, so my crazy fantasy motherboard probably doesn't look like a mess of cables. How do we go from something like this to something that can look a little bit more like this? And I think we can

put our eye toward AM6. What does AM6 look like? I'm glad you asked. I've got to diagram. I think it makes a lot of sense to move back to a two chipset solution. Like X870E, but not X870E. I want to use just two PCIe Gen 6 lanes for all the relatively low speed IO at the back of the motherboard. That'll include Thunderbolt 5, USB, 10 gigabit Ethernet. I think we can get an interface for, you know, Realtek's inexpensive 10 gigabit Ethernet, 5 gigabit, you know, two 10 gigabit interfaces, whatever we need, everything that we need at the rear IO connected to the CPU with just two PCI Express 6 lanes. That gives us as much bandwidth as four PCI Express Gen 5 lanes, which is great. That means all the stuff at the rear of

your computer has about 15 GB per second of total aggregate bandwidth to the CPU. Fantastic. I'll take it. To keep costs down, I guess, probably maybe, we could have 16 lanes for the GPU. And we could keep doing the same old where it's eight lanes for two GPUs or 16 lanes for one GPU directly into the CPU. Now, physical full-size ATX motherboards have got room for seven slots. We're going to eliminate two of the slots because I think everybody is going to have at least a double-width GPU. It definitely doesn't make sense to try to shove a bunch of slots on here. This is also not the end-all be-all. So, two of our slots are spoken for. We've got three other possible physical slots. Where do those come from? Those I think come from the

chipset, and those cap out at PCIe Gen 4. Maybe we have one slot that's eight lanes or up to eight lanes, and two other slots that are four lanes. The rest of our lanes from the chipset can go to M.2 at PCIe Gen 5 speeds. It would be nice if we had eight PCIe Gen 6 lanes between the CPU and the chipset. So, that'd be eight plus 16 plus two. Now, we've got a choice here. We can have two more PCI Express Gen 6 lanes that we can do something creative with, or we could have four Gen 6 lanes for dedicated M.2. I'm kind of on the fence here. I could go back and forth between having our M.2s have to hang off of the chipset as opposed to being directly connected to the CPU, but it is really

convenient and nice from a latency perspective having the storage connected directly to the CPU. Two lanes of Gen 6, two lanes of Gen 5 for existing Gen 5 SSDs. Don't know that I like that trade-off. I would really like to squeeze in four lanes direct to the CPU, but that's 4 + 8 + 2 + 16 lanes total on our AM5 CPU. That gives us a total of 30 lanes. That might sound crazy, but there was a Chinese AM4 motherboard that actually had 32 lanes out of the AM4 sockets. There was even a dual socket version of that where there were 16 lanes between two AM4 sockets, and then each socket provided 16 lanes to the platform. That was Hygon. That was a long time ago. We're going to let bygones be bygones.

So, AM6 having 30 or 32 PCIe lanes would be great. And motherboard makers don't have to go with this layout. Like, we could have an inexpensive four PCIe Gen 6 lane chipset, and then that chipset maybe has fewer or slower M.2 or fewer or slower PCIe lanes. But, I think the top-tier SKU being eight PCIe lanes from the CPU assuages any bandwidth concerns that we would have for the lifetime of that AM6 socket. But, hey, that's just my fantasy wish list. I do have this thing, and if you want to have me build something or test something, we have a community for that in the Level 1 forums. You should definitely come and check that out, and maybe we can put something together in the AM5 side of

things and just sort of do a test pilot to see how that goes. Cuz, remember, PCIe lanes are fungible. And what other channel you going to hear that from? Feel like you spend your PCIe lanes however you want. Storage, U.2, E1, E3. I would love to see E1 and E3 come to the desktop for something a little higher end than M.2, especially when you're shelling out as much money as these X870E motherboards cost in the first place. Like, if you're going to spend that kind of money on your workstation, like, you're going to probably get some more expensive peripherals, right? Especially in this day and age. Although, I think we're in for a reset pretty soon. So, that's about it for this one on thoughts my sort of fantasy AM5 motherboard that hopefully will

translate into AM6 reality cuz I don't think anybody's going to build AM5. I'm on this level one if you have any questions or miss anything you wanted to see me do a test build. You can find me in the level one forums. I'm signing out and I'll see you there. Woo! Go community!

More Tech Transcript