I recently faced what can only be described as a hard drive nightmare with my Proxmox server, which runs on an HP DL380 server with 12 SAS drives on the front and three bays on the back. The main array was humming along just fine, but the three 16TB drives I added to the back? Not so much. Whether I was adding directories in the GUI or trying to write data to them, the drives would error out, often immediately.
I tried everything. I moved the drives to the P840 RAID card, then to the onboard SATA controllers 1 and 2. No luck. I swapped out SATA cables, used different partitioning tools like Parted, and even tried alternative programs. I rotated the drives, testing one at a time. Two were Seagate, one was Western Digital—no dice with any of them. Over the course of a week or two, I must have rebooted the system over 100 times, trying different configurations.
Eventually, I ruled out the operating system as the culprit since the other drives were fine. It seemed unlikely that all three brand-new drives from two different manufacturers were bad. Changing the SATA cables didn’t help either. That left only one suspect: the backplane.
Now, here’s where it gets interesting. When I first started this build, I had mistakenly bought a Gen10 3LFF drive bay, thinking it would fit my Gen9 server. Spoiler alert: it doesn’t. But in my desperation, I noticed that while the Gen10 backplane was slightly narrower, all the connectors were in the right places. It even had a clamp connector and one screw hole that lined up perfectly, and all the slots on the bottom still worked.
With nothing to lose, I swapped in the Gen10 backplane, carefully seated all the drives, and powered up. For the first time, the drive activity lights on the back drives lit up—something I had never seen before. Apparently, those lights weren’t just decorative after all.
I booted up, added the drives as directory storage in Proxmox, and they stuck. To test, I started a backup, and to my amazement, it worked flawlessly.
This experience taught me a valuable lesson: don’t rule out hardware too quickly. If you’re facing a mysterious issue with your server, make sure to eliminate hardware as a potential cause early in your troubleshooting process. It could save you from weeks of frustration.
Hopefully, this helps someone avoid the painful troubleshooting process I endured. Good luck!