[HN Gopher] NASA to launch 247 petabytes of data into AWS, but f...
       ___________________________________________________________________
        
       NASA to launch 247 petabytes of data into AWS, but forgot about
       egress costs
        
       Author : nobita
       Score  : 224 points
       Date   : 2020-03-19 10:15 UTC (12 hours ago)
        
 (HTM) web link (www.theregister.co.uk)
 (TXT) w3m dump (www.theregister.co.uk)
        
       | turdnagel wrote:
       | Requester pays!
        
       | julienchastang wrote:
       | This article is misleading. The entire point is to not move data
       | out of the cloud. Instead bring your computing (analysis,
       | visualization) to the data and pay for compute cycles on AWS. If
       | your workflows are short/bursty, you will come out ahead.
       | Moreover, you will be able to do big data-style computations that
       | you cannot do in a local computing environment. This is bad
       | journalism, IMO.
        
       | gigatexal wrote:
       | might be cheaper to spin up virtual workstations on AWS and use
       | the data there
        
         | julienchastang wrote:
         | Exactly. Move your computation to the data instead of the other
         | way around. At that point, there are many ways to keep costs
         | down such as using spot instances and tearing down VMs when
         | your analysis is over.
        
           | gigatexal wrote:
           | And you get to rent the latest hardware than use likely old
           | machines ... I mean you use the existing machines as dumb
           | terminals but still
        
       | toomuchtodo wrote:
       | Cue the cloud apologists that "it's better to use the cloud than
       | to build and manage your own infra".
       | 
       | This is why you build and run your own storage, similar to
       | Backblaze (who is almost entirely bootstrapped except for one
       | reasonable round of investment).
        
         | ben509 wrote:
         | To cloud or not to cloud is the same as any outsourcing
         | decision.
         | 
         | For many operations, you may get to a point where it makes
         | sense to build your own cloud.
         | 
         | If you're a seller, you might also get to a point where you
         | want to sell goods directly.
         | 
         | It partly depends on your core expertise, meaning, is this part
         | of how your business creates value? If NASA doesn't want be a
         | datacenter provider, they should continue to outsource it.
         | 
         | It also depends on whether their business model aligns with
         | yours. AWS's egress rules specifically work when you are
         | getting revenue from the data being downloaded. If you're
         | selling software or other media, and you can factor the cost of
         | downloads into the price of it, pay-for-egress is very
         | sustainable.
         | 
         | Other models like pay-for-capacity don't align as well if you
         | want to maintain a large library of media and people are
         | attracted by the variety, but only download the popular stuff.
         | 
         | For NASA, pay-for-egress may be entirely justified if their
         | budget is based on usage of the data. Or if they can simply use
         | "requester pays" to mitigate the cost.
        
         | pas wrote:
         | I thought the ultimate argument was that if you're big enough
         | AWS will make you a deal. But maybe now AWS is just so big and
         | already growing so fast, they don't want to make exceptions and
         | lower their profitability.
        
           | belval wrote:
           | They got a 50% deal. From the article:
           | 
           | "At least NASA seems to have bagged a good deal from AWS: The
           | Register used Amazon's cloudy cost calculator to tot up the
           | cost of storing 247PB in the cloud giant's S3 service. The
           | promised pay-as-you-go price for us on the street was a
           | staggering $5,439,526.92 per month, not taking into account
           | the free tier discount of 12 cents. The audit, meanwhile,
           | suggests an increased cloud spend of around $30m a year by
           | 2025, on top of NASA's $65m-per-year deal with AWS."
        
             | jleahy wrote:
             | $5.4m/mo * 12 mo/yr = $65m/yr. My guess is the "$65m/year
             | deal with AWS" is actually the S3 cost and the extra
             | $30m/year of 'increased cloud spend' is the egress costs
             | found by the audit. Otherwise it's a coincidence of the
             | numbers.
        
         | Karunamon wrote:
         | Cue the cloud detractors that "a failure to do due diligence
         | (in this case: 15 minutes on the pricing calculator) on your
         | computing platform should be held against the whole platform".
         | 
         | Snark aside, it entirely depends on what you're doing. AWS
         | probably has better engineers, better processes, and more of
         | them than your company.
        
           | falcolas wrote:
           | None of which will _really_ help you, since AWS priority is
           | AWS, not the uptime of your business. And no number of those
           | better engineers or processes have prevented downtime and
           | service interruptions on AWS.
        
             | unethical_ban wrote:
             | Oh, man.
             | 
             | Better run your own Internet, after all, you care more
             | about connectivity to your friends than your ISP does!
             | 
             | Dogmatism is passe. There are good uses for cloud, and good
             | times for on-premise, depending on what you need, what your
             | skillsets are as an organization, the kinds of workloads
             | and length of time required for that workload.
             | 
             | AWS and others have absolutely outstanding amounts of
             | infrastructure and tooling. Their reliability is off the
             | charts in the past few years, and (once it actually gets
             | figured out by your engineers) the cloud concept of IAM is
             | incredibly secure.
             | 
             | There are pitfalls - cost, up-front complexity and several
             | other things - but I no longer rag on "the cloud".
        
               | toomuchtodo wrote:
               | Amazon has outages all the time, hidden on their status
               | board with a green triangle, and you still lose S3
               | objects once you're operating at a large enough scale.
               | 
               | A quick google search for "amazon outages" lists the
               | numerous extended outages they've experienced.
        
               | Karunamon wrote:
               | How many of those outages were multi-region and would
               | have taken down a properly distributed application? How
               | many outages and instances of lost data would the average
               | enterprise, likely without their own datacenters,
               | redundant power, hardware staff, etc have taken in the
               | same period?
        
               | toomuchtodo wrote:
               | Most applications will never be architected to be
               | "properly distributed" because of cost. Many popular web
               | properties (Reddit) still have outages on AWS even when
               | architected properly. Netflix still distributes content
               | from their own CDN with their OpenConnect appliances, and
               | only uses AWS for non streaming use cases (jedberg will
               | correct me on both Netflix and Reddit points if I'm
               | missing something and comes across this comment).
               | 
               | https://www.usatoday.com/story/tech/news/2017/02/28/amazo
               | ns-...
               | 
               | If my app is architected for reliability, I'll run it on
               | bare metal and keep the costs savings. Why pay twice by
               | building it for cloud durability _and_ running it on
               | expensive cloud resources? Clearly the AWS marketing is
               | working ("you're just building it wrong").
               | 
               | We'll see what happens when CFOs take the reins from CTOs
               | and CIOs and start putting cost controls in place during
               | this recession ("why exactly are we paying so much in
               | opex when this could be capex we can depreciate?").
        
               | Karunamon wrote:
               | Ok, so we replace a lot of opex with a little capex and a
               | lot more opex. You only need devops types if your
               | business runs on a cloud provider, now you need to employ
               | facilities, sysadmins, security, etc. It's not just the
               | cost of the hardware we're talking about, your labor
               | budget will necessarily increase as well.
        
               | toomuchtodo wrote:
               | Devops types are sysadmins that cost more for mostly the
               | same skillset (you know cloud primitives, you know infra
               | as code, you know some python/bash or powershell
               | depending on the underlying OS). Facilities, security,
               | etc are usually covered by your hardware hosting
               | provider, or colocation provider. Still a lower cost than
               | cloud. You are still paying similar labor costs
               | regardless if you're in the cloud or have your own metal.
               | 
               | Disclaimer: Previously a devops/infra guy, before that
               | ops/networking/sysadmin, built out colo
               | facilities/datacenters/hosting companies before cloud.
               | Have done a lot of cost models for storage and compute,
               | still do on the side.
        
               | Karunamon wrote:
               | So who takes care of the non-development tasks that AWS
               | (or any cloud provider, really) is handling on the
               | backend? Schlepping the hardware around, swapping failing
               | drives, hardware monitoring, actually speccing out and
               | running a datacenter, physical security, and so forth?
               | 
               | It's generally not the same people who are going to be at
               | their computers running awscli (or if it is, now we get
               | to figure in how much time they're spending on tasks that
               | are not their primary job and how many extra of them we
               | get to hire to maintain the same velocity, not to mention
               | the occasional bit of firefighting)
        
               | falcolas wrote:
               | On a tangent from the sibling comments (which are spot
               | on), colocation does exist. They handle the network
               | drops, power, security, cooling, and you just have to
               | ship them servers. Before AWS, this is how most
               | businesses ran (including Amazon).
               | 
               | Few businesses ever get to the point where they need to
               | run their own datacenter. And when they do, the costs
               | would be roughly even or lower to AWS due AWS' markup
               | (for handling those DC-related things for you, plus
               | profit).
        
           | toomuchtodo wrote:
           | Due diligence only somewhat mitigates the damage done by
           | having a generation of engineers who believe going straight
           | to AWS or another expensive cloud provider is the first and
           | or best course of action, when you have engineers scoff at
           | building a cheaper, more efficient solution better fit for
           | purpose. Backblaze proves it can be done, and I argue they
           | are just as competent, if not more, than Amazon. They've
           | provided a similar object storage system as S3 at a
           | drastically lower cost.
           | 
           | In most scenarios, it's not my money, and I don't care if
           | it's not my money. In this case, as a taxpayer, it's my money
           | ( _our money_ to be specific) and I care. I intend to contact
           | my representatives about this failure, and have already fired
           | off a FOIA request for AWS NASA contract details.
        
       | Havoc wrote:
       | Can't they just use the current DAACs as a caching layer? Seems
       | like the least ugly way out of this mess.
       | 
       | Also - can't they use torrent tech? I wouldn't mind helping out a
       | bit on space & data
        
       | Dunedan wrote:
       | > "However, when end users download data from Earthdata Cloud,
       | the agency, not the user, will be charged every time data is
       | egressed.
       | 
       | Not necessarily, depending on how the users access the data. If
       | users access the data through their own AWS accounts, NASA could
       | leverage S3's "Requester Pays" feature [1], to let the user pay
       | for downloading the data.
       | 
       | 1:
       | https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPay...
        
         | Bedon292 wrote:
         | I wonder if there is a problem with this because it requires
         | you to have an Amazon account and such to do it. There is now a
         | much higher barrier of entry for random people to access small
         | amounts of data. And no longer have direct http links. You have
         | to use the CLI / SDKs once requester pays is on there.
        
         | dpcx wrote:
         | I immediately thought about this as well, however I seem to
         | recall reading somewhere (and I could be entirely wrong here)
         | that NASA has a requirement to give away freely their science
         | data.
        
           | teruakohatu wrote:
           | Isn't requestor pays just like I pay for gas to drive to my
           | local library, when I can't bike because I want to borrow so
           | many books, but the books are free to loan.
        
             | NikolaeVarius wrote:
             | I'm pretty sure its like when I buy a book, and than I pay
             | for it.
        
               | [deleted]
        
             | SteveNuts wrote:
             | It's more like we both have a library, the books are free,
             | but if I want to take some of your books I have to pay for
             | shipping.
        
           | wikiman wrote:
           | I'm not an expert, but most government agencies are allowed
           | to charge reasonable fees for access to their data. I don't
           | know if this qualifies, but it at least seems like a
           | possibility, especially if it's transparently just passing
           | along their costs in the form of AWS' own cost structure
        
           | dylan604 wrote:
           | While the data is free, the cost of getting the data to you
           | can be charged. Originally, it was to cover the expense of
           | someone pulling the data, making copies, and then mailing
           | that data out to you. If it was photographic, you'd be
           | charged for the prints. I'd see using Requester Pays in the
           | same vein. They are not charging you for the data, but any
           | fees incurred to obtain the data would be at your expense.
        
           | TallGuyShort wrote:
           | It's required to be public domain. IMO it's comparable to
           | FOIA requests still requiring the requester to attach a stamp
           | to the envelope their request goes in. Or at most, include a
           | self-addressed stamped envelope too.
           | 
           | Requiring you to pay S3 is little different than requiring
           | you to have Internet access, and thus pay whichever company
           | includes you in THAT monopoly, IMO.
        
             | macintux wrote:
             | To me it feels very different.
             | 
             | Imagine for a moment that in order to access NASA data sets
             | you had to have a Fastmail email account. Gmail won't work,
             | Outlook won't work, it has to be Fastmail alone.
             | 
             | That would be very objectionable (as much as I adore
             | Fastmail).
             | 
             | Ability to pay one specific cloud provider should not be a
             | gate for public domain government data.
        
               | TallGuyShort wrote:
               | The alternative here, though, to get comparable
               | distribution / durability, etc. by spending _way_ more of
               | the public 's money upfront regardless of who wanted it.
               | I get the purist / idealistic argument here, but it feels
               | a bit like cutting off one's nose to spite their face.
        
               | somethingwitty1 wrote:
               | I don't think this analogy works. For Fastmail, there is
               | a cost regardless of whether you want to access
               | government data. You have to pay for the account itself.
               | For most cloud providers, there is zero cost for having
               | an account. Even if they hosted this themselves, they
               | could just as likely charge for data transfer costs...and
               | get to choose how to collect that. They could choose
               | PayPal and you have to create an account. Or they take
               | credit cards...and you must have a card belonging to one
               | of the networks they support. The barrier to entry
               | doesn't change regardless of how many cloud providers
               | there are, all it does is increase infrastructure costs
               | unnecessarily.
        
           | jfk13 wrote:
           | If there's a marginal cost for each copy of the data that's
           | transferred to a user, I don't think asking the user to cover
           | that cost conflicts with a requirement to "give away the
           | data".
           | 
           | (If they distributed their science data in printed form,
           | surely they'd be allowed to charge people for the cost of
           | printing & mailing the paper copies; that's quite different
           | from charging for the data itself.)
        
             | dragonwriter wrote:
             | > If there's a marginal cost for each copy of the data
             | that's transferred to a user, I don't think asking the user
             | to cover that cost conflicts with a requirement to "give
             | away the data".
             | 
             | Charging the user for data, even if it is on a marginal
             | cost basis, conflicts with a mandate to give data away
             | freely. Because "at the marginal cost of delivery" is not
             | "free".
             | 
             | (It's true that it is common for mandates to specify
             | something like at marginal cost of delivery rather than
             | free--sunshine laws providing copies of public records
             | often work that way--but that's not the applicable mandate
             | here; in fact, since without the separate mandate here the
             | data would be available on a marginal cost basis under
             | FOIA, the main reason for a separate mandate is to negate
             | that cost.)
        
               | jfk13 wrote:
               | Do you have a citation for the "mandate to give data away
               | freely"?
               | 
               | I found https://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPD
               | &c=2230&s=1, which mentions things like "Ensure public
               | access...", but I don't see anything there mandating such
               | public access to necessarily be at zero cost.
        
               | 3pt14159 wrote:
               | Also, public access can mean that once someone gets a
               | copy of the data they can host it for free as well. It's
               | not as if it's under a commercial license.
        
             | elcritch wrote:
             | Why the downvotes? This isn't uncommon or unreasonable if
             | you're downloading TB's of data. Also the data would be
             | freely redistributable if someone took the data and put up
             | a torrent. Still I'd rather see NASA host their own data.
             | Put up an FTP server, torrent server and save a lot of
             | money on hosting fees.
        
               | harlanji wrote:
               | Records departments always charge for copies, and that is
               | the use I thought of immediately when I learned of
               | Requester Pays. I'd be surprised if NASA couldn't use it.
        
               | topkai22 wrote:
               | While proxying through a torrent system is a good idea. I
               | doubt it would get well seeded outside a few popular
               | datasets- the agency would end up the sole seeder of the
               | long tail.
               | 
               | I'm willing to bet NASA saves a ton of money by going to
               | a cloud provider- US government storage setups are
               | insanely expensive. I remember a project I was on got a
               | quote of over $10,000/TB in 2014, and there is no way
               | egress is actually free right now- they are paying for a
               | government regulation compliant internet connection one
               | way or another.
               | 
               | I do worry about vendor lock in to a degree, but I'm
               | confident the agency and tax payers would save money
               | going to any major cloud provider.
        
               | Aeolun wrote:
               | What causes a cost of $10000/TB? Even with multiple
               | redundant failsafes I just cannot see how the cost could
               | run up to that.
        
               | Spooky23 wrote:
               | In 2014?
               | 
               | You'd be buying something like an EMC vMax that can
               | sustain 1M+ IOPS on lots of 15K spinning drives, with
               | caching tiers on crazy expensive flash.
               | 
               | To support that, you need a fibre channel network layer
               | and a bunch of FTEs to attend to it. Usually compliance
               | requirements require segmentation of roles, which
               | increases cost. If you're a federal government entity,
               | those FTEs are most likely contractors billed out at
               | $125-300/hr. Figure $3-5M/year on labor costs alone,
               | although that may be divided out over multiple systems.
               | 
               | This happens in commercial business too. I had a buddy
               | who was making about $150k in NYC to zone luns on a SAN.
               | Basically he kept a spreadsheet and updated a specific
               | configuration setting 2-3x a day and spent about 60-90
               | minutes/day doing that. The rest was waiting or studying
               | for his MBA.
               | 
               | It's pretty wacky to compare S3 to this type of storage.
        
               | elcritch wrote:
               | Wow! That's good to know, if a bit disheartening. I guess
               | I was thinking costs for small startup costs with some
               | cheap-ish linux raid setups and likely massive fiber taps
               | NASA must surely already have. Not government/big
               | business costs.
        
               | Spooky23 wrote:
               | Sounds like there is a bigger story there and it's
               | probably a managed SAN.
               | 
               | I've operated pretty significant government shared
               | infrastructures like this in the past... we were offering
               | fast, flash-cached disk in 2010 for about $5,000/TB.
               | $10k/TB is not unreasonable for highly available Tier-1
               | storage for something like SAP, especially in that era
               | where you couldn't use all flash in most case.
               | 
               | Today, cost structures can be very different. You can
               | land high-iop storage for a fraction of the cost without
               | the overhead of a big SAN. If you need capacity focused
               | storage, that is also much cheaper.
               | 
               | An agency like NASA gets hosed on services, and cloud is
               | no different. AWS is probably a net savings for
               | operational workloads whose characteristics are known.
               | Backup is a no-brainer. But for a high-volume,
               | operationally highly variable thing like a public archive
               | of data, AWS a square peg in a round hole because of the
               | metered access.
        
               | mentat wrote:
               | By the way, depending on where it's hosted, S3 can seed
               | torrents automatically: https://docs.aws.amazon.com/Amazo
               | nS3/latest/dev/S3TorrentRet...
        
               | xxs wrote:
               | Why FTP - torrent it all the way, perhaps have the AWS as
               | nodes...
        
               | NikolaeVarius wrote:
               | https://docs.aws.amazon.com/AmazonS3/latest/dev/S3Torrent
               | .ht...
        
         | angry_octet wrote:
         | And this would be an even worse outcome.
        
           | ben509 wrote:
           | Why?
        
         | advisedwang wrote:
         | This then requires that everyone have an AWS account and
         | billing relationship with Amazon to access public data.
        
         | VikingCoder wrote:
         | I'm a huge fan of requester pays, and I frankly don't
         | understand why we haven't switched more of the internet to it.
         | 
         | I'm also a liberal, so then I also think government should give
         | everyone a monthly quota of internet usage allowance. Universal
         | Basic Internet Income, or something.
        
           | [deleted]
        
       | 7777fps wrote:
       | I assume the data accessed is a heavily skewed pareto
       | distribution.
       | 
       | Given that, it's maybe still cheaper to build their own serving /
       | caching layer in front to save egress costs than to have
       | constructed the whole storage solution themselves.
        
         | vidarh wrote:
         | Putting a caching layer in front of AWS is often very cost
         | effective even without much skew in the access pattern. It
         | tends to take a very low hit rate before it pays for itself.
        
       | mensetmanusman wrote:
       | This seems like a good use of torrenting?
        
         | maerF0x0 wrote:
         | It looks like they were aware of Bit torrent as recent as Oct
         | 2011
         | 
         | https://web.archive.org/web/20111024223108/https://visibleea...
        
         | caymanjim wrote:
         | Torrents are only helpful when there's a large number of people
         | who download the data and are willing to share it. There's not
         | a large userbase for the vast majority of NASA data. It
         | wouldn't be distributed in any meaningful way.
        
       | unhammer wrote:
       | YOU ARE NOT AFRAID?         'Not yet. But, er...which way to the
       | egress, please?'         There was a pause. Then Death said, in a
       | puzzled voice: ISN'T THAT A FEMALE EAGLE?
       | 
       | I've been reading A Hat Full of Sky to my daughter these days,
       | and there's a running joke that "supposedly intelligent people"
       | don't know the meaning of the word "egress", mixing it up with
       | things like egret, ogress or eagles.
       | 
       | (See also the inspiration for the joke:
       | https://unrealfacts.com/pt-barnum-would-trick-people-with-a-... )
        
       | slowhand09 wrote:
       | Wow! I worked on EODSIS in 93-96. We estimated 16 petabytes, at
       | the time it would be one of the worlds largest databases. We
       | changed horses midstream moving our user interfaces from
       | X-windows Motif to WWW. And built a very early Oracle DB
       | accessible via WWW. There was no cloud then except missions
       | studying atmospheric water vapor. When this was originally
       | designed there were to be several (6-7) DAACs - Distributed
       | Active Archive Centers (https://earthdata.nasa.gov/eosdis/daacs)
       | to store data near where it was needed or captured. Now they have
       | 12 and are storing on AWS. Amazon didn't exist when this was
       | originally built.
        
         | acruns wrote:
         | I was at ESRI when we were going to host this data, then
         | Congress got involved and blocked it.
        
       | [deleted]
        
         | [deleted]
        
       | ackbar03 wrote:
       | Oh but aws didn't forget. Aws never forgets
        
         | gonzo41 wrote:
         | This is kinda bad press for AWS. If I were NASA I'd be shitty
         | about the relationship manager not hinting and trying to help
         | architect for lowest cost.
        
           | NikolaeVarius wrote:
           | Since when the hell does NASA actually care about bad press
           | regarding costs these days.
        
             | johnmaguire2013 wrote:
             | NASA or AWS? Parent said AWS.
        
             | badwolf wrote:
             | NASA is spending over $1B for a launch pad that will be
             | used no more than 4 times.
             | 
             | https://spacenews.com/report-finds-delays-and-cost-
             | overruns-...
        
               | whatshisface wrote:
               | How much would a launch pad that will be used four times
               | normally cost for what they're planning to launch?
               | Without knowing that I can't say if they overpaid 10x,
               | 2x, got it exactly right or got an amazing bargain.
        
               | yborg wrote:
               | In 1965, the Vertical Assembly Building, which was at
               | that time the largest enclosed volume in the world, cost
               | $117M (on a $23.5M original construction contract). That
               | would be about a billion dollars in 2020, but it was
               | completed in 3 years and was used to stack 13 Saturn Vs.
               | It was later used for the 100+ Shuttle missions as well,
               | but there were additional costs to modify the building
               | for this purpose. The VAB is still planned for use for
               | future missions.
               | 
               | https://www.popsci.com/blog-network/vintage-space/nasas-
               | vab-...
        
         | TomMckenny wrote:
         | Judging from the random yet inexplicable 42 cent bill on a free
         | account I set up years ago, I'd say their memory is positively
         | unbelievable.
        
       | ph2082 wrote:
       | 1 Terabyte of hard disk cost ~50USD.
       | 
       | 247 Petabyte ~ 247000 Terabyte > 50000 USD.
       | 
       | Network cards, bandwidth, electricity cost > I can't guess.
       | 
       | Couple of good engineers (hardware and software ones), which they
       | definitely have.
       | 
       | May be they could have built their own cloud in < ~10-15 million
       | USD. And that won't be recurring cost.
       | 
       | May be they missed article about Bank of America saving ~2
       | Billion USD, by building their own cloud.
        
         | SEJeff wrote:
         | You realize that the entire openstack project came from the
         | opensourcing of NASAs opennebula project, right? They've got
         | one of the biggest infiniband networks in the world
         | underpinning it.
        
           | ph2082 wrote:
           | I didn't knew that. Thank you for telling.
           | 
           | Now I am more curious why go along with AWS instead of using
           | Openstack. Need to find some case study of openstack vs rest
           | of cloud provider.
        
             | duskwuff wrote:
             | Because OpenStack is a piece of software, not a provider.
             | And it's instructive to consider why none of the major
             | cloud providers use it...
        
         | rmrfstar wrote:
         | In addition to saving money, they will also make the US more
         | resilient by helping avoid a concentration of expertise and an
         | infrastructure mono-culture.
         | 
         | I suspect that ideas like this will become more popular as the
         | US asks itself "what happened to our resilience?"
         | 
         | [1] https://en.wikipedia.org/wiki/Self-Reliance
        
         | supdatecron wrote:
         | Your numbers are way off, as you didn't account for redundancy
         | of the drives (any failure or bit flips of 1 of those 2,470
         | drives will cause corruption of likely the entire data set).
         | 
         | > Network cards, bandwidth, electricity cost > I can't guess.
         | 
         | This is where a huge amount of cost is.
         | 
         | > And that won't be recurring cost.
         | 
         | Maintenance, humans, cooling, drive replacements, property,
         | building, land tax, payroll tax are all recurring costs.
        
           | ph2082 wrote:
           | > Your numbers are way off, as you didn't account for
           | redundancy of the drives (any failure or bit flips of 1 of
           | those 2,470 drives will cause corruption of likely the entire
           | data set).
           | 
           | Let take another setup of same count as backup. Then another
           | setup as back up of back up. ~150K
           | 
           | > This is where a huge amount of cost is.
           | 
           | Maintenance, humans, cooling, drive replacements cost > can't
           | be greater than first time set up cost.
           | 
           | > property, building, land tax, payroll tax
           | 
           | Nasa runs on Government budget, I am sure they can claim some
           | tax break there.
           | 
           | The point I am trying to make is, it may be cheaper to do in-
           | house with the level of engineering talent they have.
        
             | sitkack wrote:
             | The government should be running its own object store. And
             | by government, I mean coordinated by Internet2/NSF with
             | federation across all member orgs.
             | 
             | https://en.wikipedia.org/wiki/Internet2
             | 
             | Use backblaze pods, demand off peak bandwidth of gilded age
             | megacorps that own said fiber for sync/replication.
             | 
             | https://www.backblaze.com/b2/storage-pod.html 480TB/4U
             | 
             | Have 3x sites around the US the build the pods, each new
             | pod gets preloaded with a smattering of rarely requested
             | and low replication count objects (as a redundant backup).
             | Then shipped to the site where it will be used. Local
             | writes go directly to pods which are then kept in sync with
             | the rest of the cluster.
             | 
             |  __edit, from the TFA
             | 
             | ``` And to put a cherry on top, the report found the
             | project's organizers didn't consult widely enough, didn't
             | follow NIST data integrity standards, and didn't look for
             | savings properly during internal reviews, in part because
             | half of the review team worked on the project itself. ```
        
           | Neil44 wrote:
           | His numbers can be out by many mulitples and still beat AWS's
           | 5 Mil a month with no egress.
        
       | knorker wrote:
       | This surely was entirely known to AWS, where they were rubbing
       | their hands at the fact that every user of this data has to
       | process it using EC2 on site.
       | 
       | This is Cloud lock-in using data location.
        
       | Spooky23 wrote:
       | Using AWS for this type of use case is dumb for an org as large
       | as NASA, if cost savings is a goal. It's cheaper to just land
       | capacity at a datacenter.
        
         | toyg wrote:
         | I guess they have additional legal constraints that don't allow
         | them to just "land space" here or there - the vendor must
         | probably be security-vetted, compliant to a hundred government-
         | produced checklists, and willing to go through extra-long sales
         | and support cycles. It will inevitably push up prices
         | significantly.
         | 
         | In fact, I can imagine ops-teams at Nasa licking their lips at
         | the idea of doing away with a lot of that bureaucracy once they
         | switch to AWS... note how the report mentions that some of the
         | controllers are actual sponsors of the move: it's obviously a
         | conflict of interest, but it might well arise when the org as a
         | whole is a bit too happy to steer away from a suboptimal
         | situation.
         | 
         | This said, AWS will rob them blind, simply because they can.
         | Like all outsourcers (which is effectively what they are), they
         | get in with the simplicity argument, then boil that frog up
         | with extra charges. It's good that somebody pointed out one of
         | those charges, but I doubt anything will change substantially-
         | Amazon will probably cut them a discount and that will be it.
         | And once you're invested in a cloud env to the tune of hundreds
         | of petabytes, you'll likely not switch away for decades.
        
           | Karunamon wrote:
           | > _..then boil that frog up with extra charges._
           | 
           | That implies a level of dishonesty or nontransparency that
           | AWS doesn't have. Their pricing is disclosed, up front, and
           | they offer a calculator to model your costs out. Knowing how
           | much data egress you're going to have is not some arcane art,
           | NASA just plain forgot to do it.
           | 
           | It may be _complicated_ , but so is any workload at this
           | size. Figuring the cost is part of due dilligence, and
           | they've made it as straightforward as possible.
        
             | toyg wrote:
             | _> That implies a level of dishonesty or nontransparency
             | that AWS doesn 't have._
             | 
             | Have you ever been part to an enterprise-level sales cycle?
             | Things like the official calculator are waved away, since
             | the customer is on a special deal, so "of course is not as
             | much as that!". The customer asks for a quote with a
             | certain degree of detail, the vendor provides an answer
             | with the degree of accuracy required to get them in the
             | door. If it turns out after a year that the customer ended
             | up paying 2x, well, too bad - clearly they must have had
             | higher requirements than forecasted! "Did you record all
             | your traffic? No? Well, _we_ did, and the result is this
             | bill, sorry. Alright, alright, I hear your complaint, I
             | tell you what - I 'll give you a big discount on your next
             | order, what about that?" Rinse, repeat. This is not
             | dishonesty and I'm not alleging malfeasance or anything
             | like that, it's just how that world works in my experience.
             | 
             | In order to figure out the real cost of outsourcing, you
             | need an adversarial attitude that most shops simply lack,
             | because they've fundamentally made the choice to abandon
             | the previous solution even before they've entered the sales
             | cycle. This is particularly clear in a case where some
             | controller is also part of the group promoting the switch.
             | It's surprising it was flagged up, there must be a
             | competing group somewhere that is desperately trying to
             | fight on - maybe some Oracle-friendly "japanese in the
             | jungle" or something. Or maybe bureaucratic procedures to
             | safeguard the institution are actually working as they
             | should, for once, but that would be pretty exceptional in
             | itself.
        
             | Spooky23 wrote:
             | That's a half-truth.
             | 
             | All of the cloud vendors de-empathize network egress costs.
             | It's similar to products that depend on Microsoft licensing
             | who will always omit those types of costs. (Oh, so you
             | needed to spend another $500k in SQL Server Enterprise?)
             | 
             | Many organizations lack the operational metrics to allow
             | them to effectively measure their egress needs. And
             | AWS/GCP/MS salesmen arent in the business of slowing down
             | deals with awkward questions.
             | 
             | This is especially true where an org like NASA probably
             | contracts out things like network services. Going from a
             | model where you make fixed capital investments to paying
             | for the byte is difficult to measure.
        
               | Karunamon wrote:
               | I'm not sure what you mean by "de-empathize".
               | 
               | Here's the official pricing calculator[1] - note that
               | ingress and egress costs are included in all relevant
               | services. Also note that for something like S3 (which is
               | probably what the article mentions the "earthdata cloud"
               | is based on), the pricing details are right there on the
               | description page[2].
               | 
               | There is no evidence of any malfeasance by AWS here, just
               | lots of casting aspersions. What _specifically_ do you
               | want that was not provided?
               | 
               | [1]: https://calculator.s3.amazonaws.com/index.html
               | 
               | [2]: https://aws.amazon.com/s3/pricing/
        
       | tehalex wrote:
       | I wonder if this includes or if they can use Direct Connect? [1]
       | 
       | Cloud data transfers are too expensive, personally I assume that
       | it costs more to measure and bill for bandwidth than the usage
       | itself...
       | 
       | 1: https://aws.amazon.com/directconnect/
        
         | angry_octet wrote:
         | They could use direct connect, from each of their data centres,
         | essentially turning AWS into a giant NAS. However this gives up
         | the idea of using AWS compute to provide value added analysis.
        
       | Wheaties466 wrote:
       | at that point why not just use a P2P based system.
        
       | ralusek wrote:
       | I wonder why they wouldn't use Wasabi:
       | 
       | https://wasabi.com/cloud-storage-pricing/
       | 
       | Looks like egress is free.
       | 
       | Maybe because it's comparably untested? Does anyone here have any
       | experience with it?
        
         | Eikon wrote:
         | I wouldn't rely on that.                   Wasabi does not
         | charge for egress but our pricing model is not suitable for use
         | cases involving the hosting of videos in a manner where the
         | ratio of egress downloads exceeds the amount of storage.
         | 
         | https://wasabi-support.zendesk.com/hc/en-us/articles/3600004...
        
         | alexfromapex wrote:
         | Probably need assurances or regulatory solutions that only a
         | cloud giant like AWS could address
        
       | pixelbath wrote:
       | Unless my numbers are _way_ off, I got around $15.5 million per
       | year using Backblaze 's calculator:
       | https://www.backblaze.com/b2/cloud-storage-pricing.html
       | 
       | Numbers used:                 Initial upload:   258998272 GB
       | (1024*1024*247)       Monthly upload:   100 GB (default)
       | Monthly delete:   5 GB (default)       Monthly download: 1048576
       | GB (1 PB)            Period of Time:   12 months (default)
        
         | adtac wrote:
         | It'll take 215,000 years to reach 247 petabytes if you averaged
         | 100 GB of upload a month.
        
           | bhandziuk wrote:
           | I think they're saying NASA would add ~100GB of new data to
           | this dataset every month.
        
             | adtac wrote:
             | I know. And I'm saying if that was the rate they've
             | historically added data to their dataset, it would've taken
             | them 200,000+ years to get here. Which is why 100GB/mo is
             | virtually nothing for NASA -- it doesn't match with their
             | historical throughput.
        
           | kylebarron wrote:
           | The initial upload is 247 petabytes
        
       | chx wrote:
       | If you are facing similar problems you should know traffic via
       | Cloudflare from B2 is free. I am not 100% CF would be happy if
       | NASA picked the CF free tier but probably their quote would be
       | magnitudes lower than Amazon's.
        
       | anthonylukach wrote:
       | This article seems short sighted.
       | 
       | 1. Using the AWS cost calculator is pointless, naturally an
       | entity the size of NASA would get heavily discounted rates. 2. As
       | data volume grows, the complexities of working with that data
       | expands. NASA appears to be embracing cloud computing by
       | embracing a paradigm where scientists push computation to where
       | the data rests rather than downloading data [1], [2], [3],
       | thereby paying egress on only the higher order data products. 3.
       | The report notes that NASA has tooling to rate limit and throttle
       | access to data. This, in itself, proves that NASA didn't
       | "[forget] about eye-watering cloudy egress costs before lift-
       | off".
       | 
       | People may scream about vendor lock in, which is a fair
       | complaint; but acting like NASA just didn't think about egress is
       | misleading.
       | 
       | NASA is ultimately a science institution, I think diverting
       | effort away from infrastructure management and towards studying
       | data is likely a wise decision.
       | 
       | [1:
       | https://www.hec.nasa.gov/news/features/2018/cloud_computing_...]
       | [2: https://link.springer.com/article/10.1007/s10712-019-09541-z]
       | [3:
       | https://ui.adsabs.harvard.edu/abs/2017AGUFMIN21F..02P/abstra...]
        
         | matchagaucho wrote:
         | Yeah, this looks like a FUD hit job, possibly by entities made
         | obsolete by a move to AWS.
         | 
         | There are just too many solutions to egress optimization to
         | mention (CDN edge caching, rate limit, throttling, tiered
         | discounts, multi-year agreements).
         | 
         | No gov procurement deal at this scale gets sticker shock from
         | retail prices.
        
         | kempbellt wrote:
         | >NASA is ultimately a science institution, I think diverting
         | effort away from infrastructure management and towards studying
         | data is likely a wise decision.
         | 
         | Indeed. I am glad to see them leveraging the power of an
         | already proven infrastructure provider rather than spending X
         | billions of dollars trying to build and maintain their own.
        
           | sgt wrote:
           | No, that is just ridiculous. NASA is more than capable of
           | running their own server infrastructure. They've got
           | expertise, they've got DC's and they don't need 99.999%
           | uptime for most of their services. Cloud providers can turn
           | out to be insanely expensive. I am not against cloud - mostly
           | I would recommend it for businesses but when reaching a
           | certain size you have to consider doing your own cloud
           | infrastructure.
        
         | Supermancho wrote:
         | > NASA would get heavily discounted rates
         | 
         | Having spent a lot of money with AWS, that's giving Amazon more
         | credit than I think is warranted.
        
           | [deleted]
        
           | ganstyles wrote:
           | +1 to this. I've been on teams that spent $75k/mo and didn't
           | get any hint of a discount. Though we got our own on call rep
           | to handle issues.
        
             | soared wrote:
             | $75k/mo is tiny in the enterprise world. At Oracle they'd
             | give a 22 year old fresh out of school ~30 accounts that
             | size, for reference. I worked on a team of 9ish on a
             | ~$5MM/mo account. (Not cloud, but a comparable business
             | unit)
        
               | bosswipe wrote:
               | At which level do you start having real negotiation
               | power?
        
         | pathseeker wrote:
         | >NASA is ultimately a science institution, I think diverting
         | effort away from infrastructure management and towards studying
         | data is likely a wise decision.
         | 
         | True, but once you're a certain scale, outsourcing everything
         | just because it's not your competency isn't a good excuse. You
         | can afford to hire enough people for it to become your
         | competency.
        
         | tda wrote:
         | Probably there is even good competition between the cloud
         | providers, because hosting the data means that you can sell a
         | lot of compute time to all the users of the data. NASA choosing
         | for AWS means that any IO intensive analysis on that data will
         | run faster/better/cheaper on AWS.
        
           | eeZah7Ux wrote:
           | No, there isn't, and going for the quasi-monopolist only
           | encourages lock-in.
        
             | tspike wrote:
             | Microsoft especially has been aggressive in courting large
             | companies away from AWS for cloud needs.
        
         | [deleted]
        
       | X6S1x6Okd1st wrote:
       | > NASA also knows that a torrent of petabytes is on the way.
       | 
       | Oh that sounds like a potential solution.
       | 
       | /s
        
       | oh_hello wrote:
       | "The audit, meanwhile, suggests an increased cloud spend of
       | around $30m a year by 2025"
       | 
       | Isn't this a rounding error for NASA?
        
       | beastman82 wrote:
       | Torrent FTW
        
       | djrogers wrote:
       | I'm not saying this won't be a financial cluster - it likely will
       | cost many times more than planned - but the headline here is just
       | a flat-out lie.
       | 
       | TFA says:
       | 
       | "a March audit report [PDF] from NASA's Inspector General noticed
       | EOSDIS hadn't properly modeled what data egress charges would do
       | to its cloudy plan."
       | 
       | 'Hadn't properly modeled' is very different from 'forgot about'.
       | And if you actually read the linked report, it says things like:
       | 
       | "ESDIS officials said they plan to educate end users on accessing
       | data stored in the cloud, including providing tools to enable
       | them to process the data in the cloud to avoid egress charges."
       | and "To mitigate the challenges associated with potential high
       | egress costs when end-users access data, ESDIS plans to monitor
       | such access and "throttle" back access to the data"
       | 
       | Neither of those statements would be _in the audit_ if the entire
       | topic had been a surprise.
        
         | tyingq wrote:
         | From that linked report...
         | 
         |  _" In addition, ESDIS has yet to determine which data sets
         | will transition to the cloud nor has it developed cost models
         | with the benefit of operational experience and metrics for
         | usage and egress."_
         | 
         | That sounds fairly close to the headline.
        
       | OzzyB wrote:
       | Looks like even the big boys get bitten by the Cloud Meme when
       | forgetting about bandwidth costs; glad I'm not the only one.
        
       | vnchr wrote:
       | Cloud VERSUS Space. Who will come out on top?
        
       | movedx wrote:
       | It's The Register, people. Don't take it seriously. It's
       | practically The Onion of the IT industry, especially the comments
       | sections.
       | 
       | I've written two articles for them and the comments are a joke.
       | They're all anti-Cloud, anti-progressive. Try selling them
       | Kubernetes has a solution to their problems: they'll think you've
       | come to steal their children. I know, I've tried.
       | 
       | In short: this never happened. NASA didn't forget anything. It
       | does, however, make for a great eye catching headline!
       | 
       | Sorry to be bitter about this, but publications like The Register
       | serve little purpose these days. It caters to a specific kind of
       | IT personality that can't let go of their physical tin and they
       | think public Cloud has no place or use at all. Again I know, I've
       | tried convincing these people of such things.
        
       | szczepano wrote:
       | To sum up no matter how big the hard drives or data center we
       | produce we will always have problem with storage capacity.
        
       | ghostpepper wrote:
       | There's a joke around here somewhere about AWS pricing being too
       | difficult even for rocket scientists.
        
         | leni536 wrote:
         | Apparently AWS pricing is not rocket science
        
       | api wrote:
       | This is exactly why the costs are set up that way. The first time
       | I saw AWS pricing I chuckled and thought "roach motel." Data goes
       | in but it doesn't come out. Its one of many soft lock in
       | mechanisms cloud hosts use.
        
       | jka wrote:
       | What's the opposite of AWS Snowmobile[0]?
       | 
       | [0] - https://aws.amazon.com/snowmobile/
        
         | chickenpotpie wrote:
         | Downloading no data extremely fast
        
       | Mave83 wrote:
       | just build your own storage and save an incredible amount.
       | 
       | It's hard you might think, but it's not. croit.io provides all
       | you need to deploy a scalable cluster even on multiple geographic
       | regions.
       | 
       | Price for 1 PB sized cluster including everything from rack to
       | hardware to license to labor for below 3EUR/TB/Month or at the
       | Amazon Glacier price tag but with the S3-IA access.
        
         | driverdan wrote:
         | Are you seriously suggesting that NASA didn't consider
         | alternatives, like their current self-hosted solutions?
        
           | GordonS wrote:
           | Given they "forgot" about egress bandwidth costs, I think the
           | parent's comment was fair.
        
         | mister_hn wrote:
         | What about maintenance? Some forget about that... Broken
         | drives, broken RAID, broken NAS.
         | 
         | A 120TB SSD NAS might cost over 200kEUR ..imagine a 250PB one
        
         | RandomTisk wrote:
         | Seems like a poor choice. If they're getting an incredible deal
         | with AWS, then fine, but I would be utterly shocked if most
         | seasoned and competent IT professionals couldn't design and
         | build a multi-region storage array for far less than Amazon
         | will charge them.
        
         | dna_polymerase wrote:
         | Do you want to add your contact details to your post so NASA
         | can get in touch or what is going on here. Add a little
         | disclaimer that you work for/are croit.io so people can
         | instantaneously see why you would argue for the space agency of
         | the U.S. to run their own data storage.
        
       | tzm wrote:
       | $5,439,526.92 per month
        
       | [deleted]
        
       | NikolaeVarius wrote:
       | Senator Shelby should get AWS to launch a new region in Alabama
       | for NASA at this rate.
        
       ___________________________________________________________________
       (page generated 2020-03-19 23:00 UTC)