Tencent Cloud's home-grown traffic-tamer halves WAN latency
MegaTE can arrange things so each endpoint gets just the network it needs
Sigcomm 2024 Chinese web giant Tencent has revealed “MegaTE”, a traffic engineering (TE) system it uses on its own cloud and which it claims outperforms rivals by tailoring network configurations to the needs of individual flows generated by VMs or containers.
A paper [PDF] detailing MegaTE was presented today at the Association for Computing Machinery’s SIGCOMM conference in Sydney, Australia, by Tencent senior researcher Congcong Miao, who explained that virtual machines and containers interact with routers that collect network states at the granularity of aggregated flows and send them to the bandwidth broker on the TE control plane. Optimized traffic flow allocations are produced, but they’re based on aggregated flows – not the needs of each VM.
Today’s traffic engineering (TE) tools - Miao named scFLOW and TEAL (Traffic Engineering Accelerated by Learning) – aren’t designed to cater to the needs of each flow.
That’s not efficient in a hyperscale cloud, he asserted, because applications need networks to be aware of and implement their particular needs – either due to the nature of the workload or the service levels promised by a cloud operator.
MegaTE, by contrast, can “satisfy the needs of each fine-grained traffic flow at the virtual instance level.”
To do so, the tool relies on a database of known TE configurations, and a Redis instance to store them and allow constant rapid transactions as VMs seek good info about how to direct the traffic they create.
Miao said the tool does not assume it is possible to arbitrarily decide what traffic flow between two endpoints will need. Instead, endpoints run an agent that takes advantage of the Berkeley Packet Filter (eBPF) - tech that allows code to run in sandboxes within the Linux kernel. That agent collects traffic to create flow data that is eventually shared with a controller that calculates an optimal network path. That calculation is sent back to the agent, which stores it in an “eBPF map” so the application’s needs can be advertised and acted upon.
MegaTE is also aware of network topologies.
Packets leaving a cloudy endpoint include routing information, meaning that their passage across a network has already been planned and should be smoother.
The result, Miao told the conference, is that high-priority applications can meet service level agreements. Tencent uses it in production in its cloud and apparently reduced packet latency across the WAN by 51 percent. Miao said the tool achieved that while handling over 20,000 flows at a time – many more than is possible with either scFLOW or TEAL.
- Tencent Cloud launches CentOS variant tuned for Chinese silicon
- Tencent Cloud to revisit design after circular dependencies slowed emergency API fix
- Alibaba Cloud reveals its datacenter design, homebrew network used for LLM training
- China pushes its payment platforms towards an international presence
MegaTE has run at Tencent Cloud since 2022. The company’s cloud spans seven regions and 31 availability zones across China and Hong Kong, plus another eleven regions and 22 availability zones in other countries. Improved WAN latency won’t hurt tenants one bit.
Miao didn’t say if Tencent uses it for its own apps, too. If it does, the results may again be significant as Tencent’s WeChat and Weixin services have over 1.5 billion users, its video and music streaming services boast 100-million-plus subscriber apiece. The company also runs China’s second-biggest app store, and is a colossal games publisher and operator.
The company likely operates millions of servers and a substantial WAN to connect them all, making MegaTE’s impact enormous. ®