Oh no, you're thinking, yet another cookie pop-up. Well, sorry, it's the law. We measure how many people read us, and ensure you see relevant ads, by storing cookies on your device. If you're cool with that, hit “Accept all Cookies”. For more info and to customize your settings, hit “Customize Settings”.

Review and manage your consent

Here's an overview of our use of cookies, similar technologies and how to manage them. You can also change your choices at any time, by hitting the “Your Consent Options” link on the site's footer.

Manage Cookie Preferences
  • These cookies are strictly necessary so that you can navigate the site as normal and use all features. Without these cookies we cannot provide you with the service that you expect.

  • These cookies are used to make advertising messages more relevant to you. They perform functions like preventing the same ad from continuously reappearing, ensuring that ads are properly displayed for advertisers, and in some cases selecting advertisements that are based on your interests.

  • These cookies collect information in aggregate form to help us understand how our websites are being used. They allow us to count visits and traffic sources so that we can measure and improve the performance of our sites. If people say no to these cookies, we do not know how many people have visited and we cannot monitor performance.

See also our Cookie policy and Privacy policy.

This article is more than 1 year old

Netflix picks up Molly at university, scores harsh character assessment

Prototype UC Berkeley fault-generating code finds flaws streamer's human fixers missed

Video streamer Netflix has deployed a prototype University of California, Berkley, fault generating platform to find and fix five problems that otherwise could have affected users.

The platform, dubbed MOLLY, is described in a 2015 Berkeley paper Lineage-driven Fault Injection [pdf] as a "novel approach for discovering bugs in fault-tolerant data management systems".

Berkeley academics Peter Alvaro, Joshua Rosenm, and Joesph M. Hellerstein say if fault-tolerance bugs exist - those which could cause application failure - then their prototype will find it rapidly often using far fewer executions than would random fault injection.

"Failure is always an option; in large-scale data management systems, it is practically a certainty," the trio write.

Netflix already knew of key fault injection points thanks to its existing in-house FIT tool which was coupled to MOLLY for deeper analysis. Company engineering director Ben Schmaus (@schmaus) and internet scale engineer Kolton Andrus (@koltonandrus) say they found five faults in App Boot, the request that loads a list of videos for users, including one that had multiple faults.

"This (App Boot) is also a very complex request, touching dozens of internal services and hundreds of potential failure points," the engineers say.

"Brute force exploration of this space would take 2^100 iterations, whereas our approach was able to explore it in about 200 experiments.

"We found five potential failures, one of which was a combination of failure points."

The duo says a small number of experiments were run so to impact as few users as possible.

Repairing the faults was a manual affair with their FIT tool used to verify and help determine the best fix.

Netflix may extend the prototype to scan a wider request space to find more user-impacting failures.

"We’re very excited that we were able to build this proof of concept implementation and find real failures using it."

Excited readers can apply to work with Netflix in its open positions for fault finders. ®

Similar topics

TIP US OFF

Send us news


Other stories you might like