Game devs commiserate with the Helldivers 2 team by sharing their own server launch horror stories: 'You can only attempt to prepare for scale'
Helldivers 2 is one of my favourite games of the year so far. I also barely get to play it at the moment, thanks to a payload of server problems that've knocked its Steam user review average down to "Mixed."
It's hardly the first game to suffer from server problems at launch. I'm used to the catastrophic waiting times of MMORPG expansion launches like Final Fantasy 14's Endwalker, which fully had to stop selling copies while it waited for things to calm down. I anticipate Dawntrail will be much the same.
It's a frustrating situation for players—some of which are more understanding than others. A small contingent of not-so-patient ones have been calling for the Helldivers 2 team to 'just buy more servers' or some variant thereof for a while now, which came to a head the other day when Arrowhead CEO and creative director Johan Pilestedt sarcastically responded to a fan telling him to "stop tweeting and fix" the game.
"Yes! Good idea, I will sit behind the engineers and ask them 'are we there yet?'" wrote Pilestedt. "Or... I could let the engineers work independently, towards our common goal without me as the CEO pestering them at every moment."
Like rugged veterans trading war stories in a smoky bar, other developers responded to Pilestedt's exasperation with their own brutal game launch experiences. One particular quote tweet shone a bright, glaring light on how chaotic a game launch can be—written by Christina Pollock, a writer and game developer who worked on Dauntless, a Monster Hunter-esque game that launched out of its beta phase and into crossplay in 2019.
"The launch of Dauntless (2019) was the most difficult launch of my career," Pollock writes, before unravelling her yarn—the studio had originally planned for 260,000 "[concurrent users] peak". Its open beta period sported 65,000 players, but when the thing launched: "It fell over at 10,000 [concurrent users]. Took 3 weeks of 15 hour days, 7 days a week to get it stable."
Pollock says this was due to a year of tweaks to the code and infrastructure, "we'd 1/6th'd our capacity before we hit issues. We'd load-tested every service up to 260,000, and run bots to 100,000. Didn't matter. Shit broke."
She proceeds to describe a war room with "permanently open calls" to a quartet of giants—Google, Playstation, Xbox, and Epic Games. "I cannot think of a single piece of infrastructure that didn't have issues," she writes. "And again, every single one of these pieces had performed admirably during load testing. We spent SO much money on cloud services running bots, scripts, and swarms." In other words, sometimes it's possible to do everything right and still lose.
Pollock describes a nightmare scenario where Dauntless' system for load "across player databases" was weighting them unevenly—a problem she says is nigh-on-impossible to fix once it's already causing issues.
"We ended up in a situation where load on individual databases meant that 1 minute of live time was taking more than 1 minute to back up. Which meant backups got further and further behind. Eventually we chose to just turn backups off and rely on read replicas for redundancy … so while we had live services falling over, we had a ticking time bomb."
Pollock describes the resulting situation as balancing "the Sword of Damocles with the fires of Rome."
The rest of the thread is lengthy, detailed, and far too much to summarise here—though I encourage anyone with a passing interest in server infrastructure to give it a read. It also struck a chord with several game designers including Pilestedt himself, who wrote: "Thank you so much for this entire thread. I don't have any words to adequately describe my thoughts around this. So just, thank you."
Devs from across the board chimed in to agree with Pollock's assessment that server issues can be labyrinthine, complex hellscapes where even an innocuous attempt to fix something could resolve in a toppled house of cards.
"The launch of Sea of Thieves had a similar story. Live service game postures are hard, and you can only attempt to prepare for scale," writes the principal gameplay engineer of Sea of Thieves Chris Marlow. "But until hundreds of thousands of real players are going through the pipes do the cracks show, and we patch them as fast as possible."
Meanwhile, senior community manager at Enshrouded developer Keen Games, François Hardy, writes: "I won't say it too loud, but the best thing that happened to Enshrouded was other games having a successful launch and taking some pressure off the team." (Hardy is doubtlessly referring to Palworld, which wound up with a $500,000 projected monthly server bill as a result of its success.)
Evan Berman, senior community manager at Tencent Games, adds: "Oh, the live service horror stories that could be told as comms lead on Rend. Or Archeage. Or Hawken. Or Tera. Or Hellgate: London."
Here's a few more examples from Scott W. Bradford, lead narrative designer on Skate, Joseph Burgos, a 3D weapons artist working on Valorant, and Jeremy Laumon, a principal tech programmer at Guerilla (Horizon, Killzone).
"ship it!"the ship first day: https://t.co/xIk6Fkj8Sp pic.twitter.com/G3CbnAC3TmFebruary 20, 2024
Ultimately, it seems like every dev has a horror story to tell around the proverbial campfire. While nobody can deny that Helldivers 2 is in a state from a consumer perspective (still fun as hell to play once you're in) it's not a rare or unexpected story for a game suffering the weight of unanticipated success. I'm sure a similar panic-siren scene is playing out at Arrowhead Games HQ as I write this, and I certainly don't envy its position.
Pollock ends her thread with a call for calm: "Please be kind to Arrowhead. They're in a hell you can't even begin to imagine. Don't make it worse."