Huawei Cloud reveals the dynamic traffic allocation system it uses to cut bandwidth bills

Created during COVID to handle video boom and sliced bandwidth costs by 30 percent

Huawei has released details of how it manages its own cloud with a dynamic traffic allocation system optimized by machine learning and developed in response to surging demand for its services during the COVID-19 pandemic.

The system was engineered using an array of operations research methodologies – such as continuous optimization, integer programming, graph theory, scheduling, and network-flow problem solving, along with state-of-the-art machine learning algorithms,” proclaimed the Chinese tech giant in a paper recently published in Informs Journal on Applied Analytics.

The world started streaming in 2020

The authors of the study hail from both Huawei’s Cloud Algorithm Innovation Laboratory and The University of Hong Kong, and identified “explosive adoption of live platforms in online meetings and teaching, particularly during the COVID-19 pandemic” as the driving force behind the need for the system.

China experienced a 47 percent annual increase in live streaming users as of June 2021, according to the authors. Companies rushed migration of digital assets, services, databases, and applications into the cloud to accommodate, and in turn, cloud service providers (CSP) faced sudden escalated demand – along with higher quality expectations.

Huawei Cloud began providing business-to-business (B2B) live streaming services in 2020, as the worldwide boom, and all of its challenges hit.

The infrastructure and billing policies framed the solution

The bulk of the costs CSPs incur from live streaming are related to network bandwidth, for which they must pay carriers who distribute content to edge nodes from which end-users retrieve material.

Huawei decided to minimize bandwidth costs accrued connecting to edge nodes.

Breaking the problem down

“We dissected the usage of bandwidth resources across various stages of live streaming services, ultimately decomposing the entire process into several distinct modules,” wrote study authors.

The Huawei lab and university together developed the system of managing bandwidth as five different interlinked, mutually reinforcing modules. It titled the system “GSCO.”

A traffic forecaster module in GSCO first uses machine learning techniques to estimate future requests based on historical data. It relies on machine learning methods like BHT-ARIMA, a statistical modeling technique that can take into account sudden changes, like those that can occur when there is breaking news a rush of viewers want to stream.

A network planner module then generates connections between edge regions and edge nodes, incorporating terms of service level agreements like minimum quality of service thresholds. Turns out keeping streaming quality within agreement is challenging enough that the system has to reassess feasibility of network connections every two minutes.

Three modules handle scheduling. Two are “offline solvers” and the other is an “online solver.” It’s a difficult task as online traffic allocation must be generated within milliseconds, so GSCO creates monthly allocation strategies and appraises bandwidth cost at a month level, then adjusts it on a daily level.

The monthly offline solver is formulated as a minimum-cost network flow problem (MCNFP) linearly approximated with several combined algorithms- such as the generalized primal-dual Algorithm, the balanced augmented Lagrangian method, the extended alternating direction method of multipliers, and the network simplex method.

The MNCFP can solve for real datasets within 10 seconds in Python on a PC with an Intel Core i7-8700 CPU and 32GB RAM, the authors brag. A neighborhood search algorithm (NSA) tunes results for better quality without increasing bandwidth cost.

The daily solver operates every day except at the last day of the month when the monthly solver takes precedence. It also uses the NSA to further optimize allocation.

Next, the online solver kicks in, leverage the data from the offline solvers to create an allocation table full of probabilities that access requests from each edge region will be routed to each respective edge node by the traffic allocation actuator.

That table is used to assign traffic, which it can do in those necessary milliseconds.

“The GSCO system only needs to execute a search algorithm to obtain the corresponding probabilities from the allocation table, and then a random number generation method to finalize the allocation decision,” explained the boffins.

Deployment challenges worth the result

The modules were deployed progressively over the course of two years. Huawei ran into some snags along the way- including the relatable problem of getting senior management to understand GSCO’s value, then convincing operations and maintenance teams to adopt it. By the end of Q1 2022, all five existing modules of GSCO were deployed.

Huawei has claimed that between Q1 2020 and Q3 2022 GSCO has reduced network bandwidth expenses by around 30 percent and led to savings exceeding $49.6 million.

The Chinese cloud and hardware company self-reported GSCO amplified peak bandwidth by a factor more than 10; from 1.5 terabits per second (Tbps) to 16 Tbps.

Huawei's Cloud unit taking a priority

As mentioned, that number is self-reported so take from it what you will. However, Huawei’s Cloud unit was recently revealed to be the company’s current growth vehicle as other areas of business face the challenges of export controls.

But while Huawei Cloud has a steady presence in China, its international presence is greatly overshadowed by other industry players, like Microsoft Azure and AWS. An optimist might see this as having growth opportunity.

Conveniently, Huawei has also hit a high in its research and development spending at 164.7 billion yuan (US$22.7 billion). Throwing a bit of that cash into optimizing the tech in its growth segment is certainly an understandable strategy. ®

More about


Send us news

Other stories you might like