This article is more than 1 year old
If there's a hole in your S3 bucket, data thieves will be sprayed by Macie
Data loss prevention bot patrols Amazon's cloud storage solution
Analysis Data loss prevention is about to get a whole lot smarter.
Macie is an Amazon Web Services bot that safeguards the sensitive contents of S3 buckets. Amazon bought the company behind it, Harvest.ai, surreptitiously in January this year, paying a rumoured $19m.
Neither Amazon nor Harvest.ai confirmed that anyone bought anything. Fortunately, we don't need an artificial intelligence to work it out. Investor Trinity Ventures has given Harvest an "Acquired" status on its website. Moonshot Capital co-founder and managing partner Craig Cummings says on his LinkedIn page that Harvest.ai has been "acquired by Amazon".
Amazon naturally wants its cloud to be watertight. Insecure S3 buckets are involved in embarrassing data leaks, such as one from Groupize, a hotel-booking service.
Think of Macie as a data loss prevention agent, a DLPbot, that uses machine learning to understand a user's pattern of access to data in S3 buckets. The buckets have permission levels and the data in a bucket can be ranked for sensitivity or risk, using items such as credit card numbers, and other sensitive personal information.
The software monitors users' behaviour and profiles it. If there are changes in the pattern of that behaviour and they are directed towards high-risk data then Macie can alert admin staff to a potential breach risk.
For example, if a hacker successfully impersonates a valid user and then goes searching for data in unexpected places and/or from an unknown IP address then Macie can flag this unusual pattern of activity. The product could also identify a valid employee going rogue, say, generating a store of captured data ready to steal it.
Harvest.ai says: "Macie integrates with your cloud and on-premise systems, examining patterns of logins, remote network access, access to data and documents to discover attacks and compile a comprehensive case for further review." It says that, through its user interface, admin staff are provided with "unprecedented detail, with the ability to start with a narrative explanation of alerts discovered by machine learning analytics, all the way down to the raw events that led to the alert."
This is real-time information and admin staff can revoke user access to stop leaks occurring.
User-initiated data searches can be analysed. Macie uses word and paragraph vectors in its natural language processing to provide weighted links between terms in a sentence and also machine-learnt data that associates search terms such as "TV", "Hulu" and "Netflix" with "Boxee", "Vudu" and "Amazon Unbox".
Let's say that a business has sensitive coded product developments and an apparently innocuous search query from a new IP address looks for the name of a team member in one of these projects. In theory Macie could understand the importance of that team member as a potential gateway to secret project information and alert people to a pending attack.
This type of DLPbot activity could be used to monitor accesses to any type of data store – files, mails, video assets, spreadsheets, objects, whatever. A DLPbot can be trained to be both data and user aware. It could be specific to a business or, in the case of a cloud provider, generic across its customer accounts.
We would expect the use of such DLPbots to spread rapidly, as they operate automatically, get better with use and start generating a history of detected and repelled attacks. ®