California-based self-service data biz Trifacta has today launched its Google-friendly data preparation service into public beta.
The startup's focus is data wrangling – cleaning up raw, complex data into structured formats for analysis – and the collaborative platform is based on its Photon Compute Framework.
That framework, according to the biz, offers users a "richly interactive" experience for large in-memory datasets with "immediate feedback" to give them better visibility into raw data.
The latest collaboration, called Google Cloud Dataprep, is part of Trifacta's bid to bag a chunk of the potential cloud market – the company estimates cloud analytics will grow to more than $20bn by 2020.
"Data is increasingly in the cloud - often in multiple clouds in addition to on-premise," said Trifacta CEO Adam Wilson. "This is why the work we have undertaken with Google is so important."
According to Sean Ma, product manager at Trifacta, Google has reported an increase in customers wanting to analyse diverse datasets in their cloud, and that much of their time was spent on data preparation.
Trifacta, which has raised more than $76m to date, seems to have set its sights on big businesses. Its own customers, of which there are around 7,000, include Pepsi, Royal Bank of Scotland and LinkedIn. This makes Google an ideal partner to push the idea that it can work with data at scale.
Google, meanwhile, said the tie-in is aimed at analysts that want to work with data directly "without needing to rely on data engineers".
Wilson said that are their customers are increasingly business users. "This shift enables IT to focus on data governance and scaling best practices. This is why self-service tools are now essential to improving information agility."
The data wrangling solution, he said, can handle "massive amounts of data from a variety of sources". It will automatically detect schema, type, distribution, and missing or mismatched values, and use machine learning to recommend corrections.
The idea is to allow users to easily and visually explore, clean and prepare structured and unstructured data so it can be used for analysis or training machine-learning models.
The platform natively integrates with Google Cloud Dataflow, which the companies said would allow serverless auto-scaling execution of data preparation – selling this as time-saving and problem-solving for customers, for instance because they don't have to choose where their jobs run.
Other integrations include Google Cloud storage and Google BigQuery, which the companies say will allow users to browse, preview and import that data through a single interface.
"This has huge potential for teams that rely on Google-generated data," Trifacta said. For instance, marketing teams using DoubleClick ads data can now use the Cloud Dataprep platform to prepare the result back into BigQuery for analysis.
Google Cloud Dataprep has been in private beta for about six months, and has now entered public beta. It will then become generally available once any issues are ironed out, but the companies haven't specified a date for when that will be.
Trifacta said it does not publish its pricing, but a spokesperson pointed us to a free version of its Wrangler product. ®