GitLab crawling back online after breaking its brain in two

Database replication SNAFU took down three out of five PostgreSQL servers

In a classic example of the genre, GitLab yesterday dented its performance by accidentally triggering a database failover.

The resulting “split-brain problem” left the code-collection trying to serve its users out of a single database server, postgres-02, while it tries to sort out the remaining three.

The problem first arose at around 1:30am UTC on Thursday, and the resulting rebuilds are continuing.

When the accidental failover was triggered, Alex Hanselka wrote that while the fleet “continued to follow the true primary”, the event was apparently painful:

“We shut down postgres-01 since it was the rogue primary. In our investigation, both postgres-03 and postgres-04 were trying to follow postgres-01. As such, we are rebuilding replication on postgres-03 as I write this issue and then postgres-04 when it is finished.”

Also impacting performance are a backup (needed because there wasn't a full pg_basebackup since before the failover), and GitLab's shut down its Sidekiq cluster because it causes large queries.

That was the situation when things first broke: nearly 20 hours later, the ticket hasn't been closed.

For a start, the backup of postgres-03 is running at 75GB per hour and took until after 23:00 (11pm) to complete. There are still other database tasks to complete, but performance is starting to return to normal according to posts from Andrew Newdigate.

There's also a timeline here.

At least the backups are working: in February 2017, a data replication error was compounded by backup failures: “So in other words, out of 5 backup/replication techniques deployed none are working reliably or set up in the first place”.

The missing data was found on a staging server, and after much much soul-searching, marketing veep Tim Anglade told The Register understood its role as “a critical place for peoples' projects and businesses”.

Working backups, it has to be said, indicate at least some of the lessons were learned. ®

Similar topics

Other stories you might like

  • World’s smallest remote-controlled robots are smaller than a flea
    So small, you can't feel it crawl

    Video Robot boffins have revealed they've created a half-millimeter wide remote-controlled walking robot that resembles a crab, and hope it will one day perform tasks in tiny crevices.

    In a paper published in the journal Science Robotics , the boffins said they had in mind applications like minimally invasive surgery or manipulation of cells or tissue in biological research.

    With a round tick-like body and 10 protruding legs, the smaller-than-a-flea robot crab can bend, twist, crawl, walk, turn and even jump. The machines can move at an average speed of half their body length per second - a huge challenge at such a small scale, said the boffins.

    Continue reading
  • IBM-powered Mayflower robo-ship once again tries to cross Atlantic
    Whaddayaknow? It's made it more than halfway to America

    The autonomous Mayflower ship is making another attempt at a transatlantic journey from the UK to the US, after engineers hauled the vessel to port and fixed a technical glitch. 

    Built by ProMare, a non-profit organization focused on marine research, and IBM, the Mayflower set sail on April 28, beginning its over 3,000-mile voyage across the Atlantic Ocean. But after less than two weeks, the crewless ship broke down and was brought back to port in Horta in the Azores, 850 miles off the coast of Portugal, for engineers to inspect.

    With no humans onboard, the Mayflower Autonomous Ship (MAS) can only rely on its numerous cameras, sensors, equipment controllers, and various bits of hardware running machine-learning algorithms to survive. The computer-vision software helps it navigate through choppy waters and avoid objects that may be in its path.

    Continue reading
  • Revealed: The semi-secret list of techs Beijing really really wishes it didn't have to import
    I think we can all agree that China is not alone in wishing it had an alternative to Microsoft Windows

    China has identified "chokepoints" that leave it dependent on foreign countries for key technologies, and the US-based Center for Security and Emerging Technology (CSET) claims to have translated and published key document that name the technologies about which Beijing is most worried.

    CSET considered 35 articles published in Science and Technology Daily from April until July 2018. Each story detailed a different “chokepoint” or tech import dependency that China faces. The pieces are complete with insights from Chinese academics, industry insiders and other experts.

    CSET said the items, which offer a rare admission of economic and technological vulnerability , have hitherto “largely unnoticed in the non-Chinese speaking world.”

    Continue reading

Biting the hand that feeds IT © 1998–2022