Data-wranglers Trifacta go public with Google Cloud collaboration

Pair takes aim at $20bn analytics market

California-based self-service data biz Trifacta has today launched its Google-friendly data preparation service into public beta.

The startup's focus is data wrangling – cleaning up raw, complex data into structured formats for analysis – and the collaborative platform is based on its Photon Compute Framework.

That framework, according to the biz, offers users a "richly interactive" experience for large in-memory datasets with "immediate feedback" to give them better visibility into raw data.

The latest collaboration, called Google Cloud Dataprep, is part of Trifacta's bid to bag a chunk of the potential cloud market – the company estimates cloud analytics will grow to more than $20bn by 2020.

"Data is increasingly in the cloud - often in multiple clouds in addition to on-premise," said Trifacta CEO Adam Wilson. "This is why the work we have undertaken with Google is so important."

According to Sean Ma, product manager at Trifacta, Google has reported an increase in customers wanting to analyse diverse datasets in their cloud, and that much of their time was spent on data preparation.

Trifacta, which has raised more than $76m to date, seems to have set its sights on big businesses. Its own customers, of which there are around 7,000, include Pepsi, Royal Bank of Scotland and LinkedIn. This makes Google an ideal partner to push the idea that it can work with data at scale.

Google, meanwhile, said the tie-in is aimed at analysts that want to work with data directly "without needing to rely on data engineers".

Wilson said that are their customers are increasingly business users. "This shift enables IT to focus on data governance and scaling best practices. This is why self-service tools are now essential to improving information agility."

The data wrangling solution, he said, can handle "massive amounts of data from a variety of sources". It will automatically detect schema, type, distribution, and missing or mismatched values, and use machine learning to recommend corrections.

The idea is to allow users to easily and visually explore, clean and prepare structured and unstructured data so it can be used for analysis or training machine-learning models.

The platform natively integrates with Google Cloud Dataflow, which the companies said would allow serverless auto-scaling execution of data preparation – selling this as time-saving and problem-solving for customers, for instance because they don't have to choose where their jobs run.

Other integrations include Google Cloud storage and Google BigQuery, which the companies say will allow users to browse, preview and import that data through a single interface.

"This has huge potential for teams that rely on Google-generated data," Trifacta said. For instance, marketing teams using DoubleClick ads data can now use the Cloud Dataprep platform to prepare the result back into BigQuery for analysis.

Google Cloud Dataprep has been in private beta for about six months, and has now entered public beta. It will then become generally available once any issues are ironed out, but the companies haven't specified a date for when that will be.

Trifacta said it does not publish its pricing, but a spokesperson pointed us to a free version of its Wrangler product. ®

Broader topics

Other stories you might like

  • VMware claims 'bare-metal' performance from virtualized Nvidia GPUs
    Is... is that why Broadcom wants to buy it?

    The future of high-performance computing will be virtualized, VMware's Uday Kurkure has told The Register.

    Kurkure, the lead engineer for VMware's performance engineering team, has spent the past five years working on ways to virtualize machine-learning workloads running on accelerators. Earlier this month his team reported "near or better than bare-metal performance" for Bidirectional Encoder Representations from Transformers (BERT) and Mask R-CNN — two popular machine-learning workloads — running on virtualized GPUs (vGPU) connected using Nvidia's NVLink interconnect.

    NVLink enables compute and memory resources to be shared across up to four GPUs over a high-bandwidth mesh fabric operating at 6.25GB/s per lane compared to PCIe 4.0's 2.5GB/s. The interconnect enabled Kurkure's team to pool 160GB of GPU memory from the Dell PowerEdge system's four 40GB Nvidia A100 SXM GPUs.

    Continue reading
  • Nvidia promises annual datacenter product updates across CPU, GPU, and DPU
    Arm one year, x86 the next, and always faster than a certain chip shop that still can't ship even one standalone GPU

    Computex Nvidia's push deeper into enterprise computing will see its practice of introducing a new GPU architecture every two years brought to its CPUs and data processing units (DPUs, aka SmartNICs).

    Speaking on the company's pre-recorded keynote released to coincide with the Computex exhibition in Taiwan this week, senior vice president for hardware engineering Brian Kelleher spoke of the company's "reputation for unmatched execution on silicon." That's language that needs to be considered in the context of Intel, an Nvidia rival, again delaying a planned entry to the discrete GPU market.

    "We will extend our execution excellence and give each of our chip architectures a two-year rhythm," Kelleher added.

    Continue reading
  • Now Amazon puts 'creepy' AI cameras in UK delivery vans
    Big Bezos is watching you

    Amazon is reportedly installing AI-powered cameras in delivery vans to keep tabs on its drivers in the UK.

    The technology was first deployed, with numerous errors that reportedly denied drivers' bonuses after malfunctions, in the US. Last year, the internet giant produced a corporate video detailing how the cameras monitor drivers' driving behavior for safety reasons. The same system is now apparently being rolled out to vehicles in the UK. 

    Multiple camera lenses are placed under the front mirror. One is directed at the person behind the wheel, one is facing the road, and two are located on either side to provide a wider view. The cameras are monitored by software built by Netradyne, a computer-vision startup focused on driver safety. This code uses machine-learning algorithms to figure out what's going on in and around the vehicle.

    Continue reading

Biting the hand that feeds IT © 1998–2022