AI developers can now rent Google’s Cloud TPU chips in the US, Asia, and Europe by the hour.
The Cloud TPU, also referred to as TPU2, is the second generation of Google’s TPU hardware aimed at optimizing machine learning models written in TensorFlow. The first TPU was more of an internal experiment and could only handle inference. The second TPU is made for training and inference, and is the first chip that is available for cloud customers.
Now, from this week, developers in the United States, Asia, and Europe can rent them out by the hour. The price can drop by 70 per cent for those willing to sign up for Google’s preemptible service, where users run the risk of being kicked off the service if demand rises.
Cloud TPU2 pricing across three different regions for their normal and preemptible service ... click to enlarge
It works in a similar way to Amazon Web Service’s GPU reserved instances (RI) and its on-demand (OD) spot pricing service. The P3 instances uses Nvidia’s latest Tesla V100 chips.
AWS P3 pricing for Tesla V100 chips for its reserved and on demand services ... Source: The Next Platform
There are advantages to using AWS since its available in more regions, including: US East (N. Virginia), US West (Oregon), US East (Ohio), Europe West (Ireland), Asia Pacific (Tokyo), Asia Pacific (Beijing), Asia Pacific (Seoul) and GovCloud (US) regions. It’s also more flexible and works for systems that are written in other frameworks other than TensorFlow.
Comparing chips can be difficult as companies don’t really like to disclose explicit benchmarks. Projects like DAWNBench, a competition that encourages developers to submit models optimizing for training time, and costs to carry out training and inference, and inference latency are interesting and useful.
Google engineers entered and won some of the entries for cheapest and fastest ImageNet training with a ResNet-50 model. It’s AmoebaNet model also featured in the leaderboards for image classification.
Here’s how much it costs to train some of Google’s open-source models:
Cost to train some popular open-source models on using Cloud TPUs using the normal and preemptible service.
It would be interesting to see how this compared with other cloud services, but it’s a little tricky since the same models can be optimized across different frameworks. Startups like RiseML have tried to have a crack at it, and it turns out the difference in speed between the Tesla V100 and Cloud TPUs is pretty small.
There is also the MLPerf project that measures the speed of training and inference times for popular machine learning models across a variety of chips and frameworks to look forward to. Engineers from Google, Intel, Baidu, AMD, other hardware startups like SambaNova and Wave Computing, and universities are currently submitting their models in the competition until the July 31 deadline. ®