NASA to launch 247 petabytes of data into AWS – but forgot about eye-watering cloudy egress costs before lift-off
Audit finds error could mean less data flows to boffins unless agency ponies up for downloads
NASA needs 215 more petabytes of storage by the year 2025, and expects Amazon Web Services to provide the bulk of that capacity. However, the space agency's budget didn't include download charges, an omission that has left the project and NASA's cloud strategy in peril.
The data in question will come from NASA’s Earth Science Data and Information System (ESDIS) program, which collects information from the many missions that observe our planet. NASA makes those readings available through the Earth Observing System Data and Information System (EOSDIS).
To store all the data and run EOSDIS, NASA operates a dozen Distributed Active Archive Centers (DAACs). But NASA is tired of managing all that infrastructure, so in 2019, it picked AWS to host it all, and started migrating its records to the Amazon cloud as part of a project dubbed Earthdata Cloud. The first cut-over from on-premises storage to the cloud was planned for Q1 2020, with more to follow. The agency expects to transfer data off-premises for years to come.
NASA has also predicted that 15 imminent missions, such as the NASA-ISRO Synthetic Aperture Radar (NISAR) and the Surface Water and Ocean Topography (SWOT) satellites, will together deliver more than 100 terabytes a day of data. SWOT and NISAR are the first missions planned to use Earthdata Cloud.
The space agency therefore projects that by 2025 it will have 247 petabytes to handle, rather more than the 32PB it currently wrangles.
NASA thinks Earthdata Cloud is a great idea. In documentation for the migration project, it said:
Researchers and commercial users of NASA Earth Science data will have increased opportunity to access and process large quantities of data quickly, allowing new types of research and analysis. Data that was previously geographically dispersed will now be accessible via the cloud, saving time and resources.
And it will – if NASA can afford to operate it.
And that’s a live question because a March audit report [PDF] from NASA's Inspector General noticed EOSDIS hadn’t properly modeled what data download charges will do to its cloudy plan.
“Specifically, the agency faces the possibility of substantial cost increases for data egress from the cloud,” the Inspector General’s Office wrote, explaining that today NASA doesn’t incur extra costs when users access data from its DAACs. “However, when end users download data from Earthdata Cloud, the agency, not the user, will be charged every time data is egressed.
“That means EDSIS wearing cloud egress costs. Ultimately, ESDIS will be responsible for both cloud costs, including egress charges, and the costs to operate the 12 DAACS.”
And to make matters worse, NASA “has not yet determined which data sets will transition to Earthdata Cloud nor has it developed cost models based on operational experience and metrics for usage and egress.
Scientific data may become less available to end users if NASA imposes limitations on the amount of data egress for cost control reasons
"As a result, current cost projections may be lower than what will actually be necessary to cover future expenses and cloud adoption may become more expensive and difficult to manage.”
There’s more. The watchdog concluded: “Collectively, this presents potential risks that scientific data may become less available to end users if NASA imposes limitations on the amount of data egress for cost control reasons.”
And to put a cherry on top, the report found the project's organizers didn't consult widely enough, didn't follow NIST data integrity standards, and didn't look for savings properly during internal reviews, in part because half of the review team worked on the project itself.
The result is three recommendations from the auditors:
- Once NISAR and SWOT are operational and providing sufficient data, complete an independent analysis to determine the long-term financial sustainability of supporting the cloud migration and operation while also maintaining the current DAAC footprint.
- Incorporate in appropriate agency guidance language specifying coordination with ESDIS and OCIO early in a mission’s life cycle during data management plan development.
- Ensure all applicable information types are considered during DAAC categorization, that appropriate premises are used when determining impact levels, and that the appropriate categorization procedures are standardized.
The audit suggests an increased cloud spend of around $30m a year by 2025, as a result of the egress charges, on top of NASA’s $65m-per-year deal with AWS.
You don't need to be a rocket scientist to learn about and understand data egress costs. Which left The Register wondering how an agency capable of sending stuff into orbit or making marvelously long-lived Mars rovers could also make such a dumb mistake.
It turns out NASA makes plenty: your humble vulture found this story after looking into Tuesday’s audit of the agency's development work on its mobile launchers – the colossal vehicles designed to assemble, transport, and launch SLS and Orion rockets and capsules.
That audit found the project “has greatly exceeded its cost and schedule targets in developing ML-1. As of January 2020, modification of ML-1 to accommodate the SLS has cost $693 million — $308 million more than the agency's March 2014 budget estimate — and is running more than 3 years behind schedule.” ®