[HN Gopher] NASA to launch 247 petabytes of data into AWS, but f... ___________________________________________________________________ NASA to launch 247 petabytes of data into AWS, but forgot about egress costs Author : nobita Score : 224 points Date : 2020-03-19 10:15 UTC (12 hours ago) (HTM) web link (www.theregister.co.uk) (TXT) w3m dump (www.theregister.co.uk) | turdnagel wrote: | Requester pays! | julienchastang wrote: | This article is misleading. The entire point is to not move data | out of the cloud. Instead bring your computing (analysis, | visualization) to the data and pay for compute cycles on AWS. If | your workflows are short/bursty, you will come out ahead. | Moreover, you will be able to do big data-style computations that | you cannot do in a local computing environment. This is bad | journalism, IMO. | gigatexal wrote: | might be cheaper to spin up virtual workstations on AWS and use | the data there | julienchastang wrote: | Exactly. Move your computation to the data instead of the other | way around. At that point, there are many ways to keep costs | down such as using spot instances and tearing down VMs when | your analysis is over. | gigatexal wrote: | And you get to rent the latest hardware than use likely old | machines ... I mean you use the existing machines as dumb | terminals but still | toomuchtodo wrote: | Cue the cloud apologists that "it's better to use the cloud than | to build and manage your own infra". | | This is why you build and run your own storage, similar to | Backblaze (who is almost entirely bootstrapped except for one | reasonable round of investment). | ben509 wrote: | To cloud or not to cloud is the same as any outsourcing | decision. | | For many operations, you may get to a point where it makes | sense to build your own cloud. | | If you're a seller, you might also get to a point where you | want to sell goods directly. | | It partly depends on your core expertise, meaning, is this part | of how your business creates value? If NASA doesn't want be a | datacenter provider, they should continue to outsource it. | | It also depends on whether their business model aligns with | yours. AWS's egress rules specifically work when you are | getting revenue from the data being downloaded. If you're | selling software or other media, and you can factor the cost of | downloads into the price of it, pay-for-egress is very | sustainable. | | Other models like pay-for-capacity don't align as well if you | want to maintain a large library of media and people are | attracted by the variety, but only download the popular stuff. | | For NASA, pay-for-egress may be entirely justified if their | budget is based on usage of the data. Or if they can simply use | "requester pays" to mitigate the cost. | pas wrote: | I thought the ultimate argument was that if you're big enough | AWS will make you a deal. But maybe now AWS is just so big and | already growing so fast, they don't want to make exceptions and | lower their profitability. | belval wrote: | They got a 50% deal. From the article: | | "At least NASA seems to have bagged a good deal from AWS: The | Register used Amazon's cloudy cost calculator to tot up the | cost of storing 247PB in the cloud giant's S3 service. The | promised pay-as-you-go price for us on the street was a | staggering $5,439,526.92 per month, not taking into account | the free tier discount of 12 cents. The audit, meanwhile, | suggests an increased cloud spend of around $30m a year by | 2025, on top of NASA's $65m-per-year deal with AWS." | jleahy wrote: | $5.4m/mo * 12 mo/yr = $65m/yr. My guess is the "$65m/year | deal with AWS" is actually the S3 cost and the extra | $30m/year of 'increased cloud spend' is the egress costs | found by the audit. Otherwise it's a coincidence of the | numbers. | Karunamon wrote: | Cue the cloud detractors that "a failure to do due diligence | (in this case: 15 minutes on the pricing calculator) on your | computing platform should be held against the whole platform". | | Snark aside, it entirely depends on what you're doing. AWS | probably has better engineers, better processes, and more of | them than your company. | falcolas wrote: | None of which will _really_ help you, since AWS priority is | AWS, not the uptime of your business. And no number of those | better engineers or processes have prevented downtime and | service interruptions on AWS. | unethical_ban wrote: | Oh, man. | | Better run your own Internet, after all, you care more | about connectivity to your friends than your ISP does! | | Dogmatism is passe. There are good uses for cloud, and good | times for on-premise, depending on what you need, what your | skillsets are as an organization, the kinds of workloads | and length of time required for that workload. | | AWS and others have absolutely outstanding amounts of | infrastructure and tooling. Their reliability is off the | charts in the past few years, and (once it actually gets | figured out by your engineers) the cloud concept of IAM is | incredibly secure. | | There are pitfalls - cost, up-front complexity and several | other things - but I no longer rag on "the cloud". | toomuchtodo wrote: | Amazon has outages all the time, hidden on their status | board with a green triangle, and you still lose S3 | objects once you're operating at a large enough scale. | | A quick google search for "amazon outages" lists the | numerous extended outages they've experienced. | Karunamon wrote: | How many of those outages were multi-region and would | have taken down a properly distributed application? How | many outages and instances of lost data would the average | enterprise, likely without their own datacenters, | redundant power, hardware staff, etc have taken in the | same period? | toomuchtodo wrote: | Most applications will never be architected to be | "properly distributed" because of cost. Many popular web | properties (Reddit) still have outages on AWS even when | architected properly. Netflix still distributes content | from their own CDN with their OpenConnect appliances, and | only uses AWS for non streaming use cases (jedberg will | correct me on both Netflix and Reddit points if I'm | missing something and comes across this comment). | | https://www.usatoday.com/story/tech/news/2017/02/28/amazo | ns-... | | If my app is architected for reliability, I'll run it on | bare metal and keep the costs savings. Why pay twice by | building it for cloud durability _and_ running it on | expensive cloud resources? Clearly the AWS marketing is | working ("you're just building it wrong"). | | We'll see what happens when CFOs take the reins from CTOs | and CIOs and start putting cost controls in place during | this recession ("why exactly are we paying so much in | opex when this could be capex we can depreciate?"). | Karunamon wrote: | Ok, so we replace a lot of opex with a little capex and a | lot more opex. You only need devops types if your | business runs on a cloud provider, now you need to employ | facilities, sysadmins, security, etc. It's not just the | cost of the hardware we're talking about, your labor | budget will necessarily increase as well. | toomuchtodo wrote: | Devops types are sysadmins that cost more for mostly the | same skillset (you know cloud primitives, you know infra | as code, you know some python/bash or powershell | depending on the underlying OS). Facilities, security, | etc are usually covered by your hardware hosting | provider, or colocation provider. Still a lower cost than | cloud. You are still paying similar labor costs | regardless if you're in the cloud or have your own metal. | | Disclaimer: Previously a devops/infra guy, before that | ops/networking/sysadmin, built out colo | facilities/datacenters/hosting companies before cloud. | Have done a lot of cost models for storage and compute, | still do on the side. | Karunamon wrote: | So who takes care of the non-development tasks that AWS | (or any cloud provider, really) is handling on the | backend? Schlepping the hardware around, swapping failing | drives, hardware monitoring, actually speccing out and | running a datacenter, physical security, and so forth? | | It's generally not the same people who are going to be at | their computers running awscli (or if it is, now we get | to figure in how much time they're spending on tasks that | are not their primary job and how many extra of them we | get to hire to maintain the same velocity, not to mention | the occasional bit of firefighting) | falcolas wrote: | On a tangent from the sibling comments (which are spot | on), colocation does exist. They handle the network | drops, power, security, cooling, and you just have to | ship them servers. Before AWS, this is how most | businesses ran (including Amazon). | | Few businesses ever get to the point where they need to | run their own datacenter. And when they do, the costs | would be roughly even or lower to AWS due AWS' markup | (for handling those DC-related things for you, plus | profit). | toomuchtodo wrote: | Due diligence only somewhat mitigates the damage done by | having a generation of engineers who believe going straight | to AWS or another expensive cloud provider is the first and | or best course of action, when you have engineers scoff at | building a cheaper, more efficient solution better fit for | purpose. Backblaze proves it can be done, and I argue they | are just as competent, if not more, than Amazon. They've | provided a similar object storage system as S3 at a | drastically lower cost. | | In most scenarios, it's not my money, and I don't care if | it's not my money. In this case, as a taxpayer, it's my money | ( _our money_ to be specific) and I care. I intend to contact | my representatives about this failure, and have already fired | off a FOIA request for AWS NASA contract details. | Havoc wrote: | Can't they just use the current DAACs as a caching layer? Seems | like the least ugly way out of this mess. | | Also - can't they use torrent tech? I wouldn't mind helping out a | bit on space & data | Dunedan wrote: | > "However, when end users download data from Earthdata Cloud, | the agency, not the user, will be charged every time data is | egressed. | | Not necessarily, depending on how the users access the data. If | users access the data through their own AWS accounts, NASA could | leverage S3's "Requester Pays" feature [1], to let the user pay | for downloading the data. | | 1: | https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPay... | Bedon292 wrote: | I wonder if there is a problem with this because it requires | you to have an Amazon account and such to do it. There is now a | much higher barrier of entry for random people to access small | amounts of data. And no longer have direct http links. You have | to use the CLI / SDKs once requester pays is on there. | dpcx wrote: | I immediately thought about this as well, however I seem to | recall reading somewhere (and I could be entirely wrong here) | that NASA has a requirement to give away freely their science | data. | teruakohatu wrote: | Isn't requestor pays just like I pay for gas to drive to my | local library, when I can't bike because I want to borrow so | many books, but the books are free to loan. | NikolaeVarius wrote: | I'm pretty sure its like when I buy a book, and than I pay | for it. | [deleted] | SteveNuts wrote: | It's more like we both have a library, the books are free, | but if I want to take some of your books I have to pay for | shipping. | wikiman wrote: | I'm not an expert, but most government agencies are allowed | to charge reasonable fees for access to their data. I don't | know if this qualifies, but it at least seems like a | possibility, especially if it's transparently just passing | along their costs in the form of AWS' own cost structure | dylan604 wrote: | While the data is free, the cost of getting the data to you | can be charged. Originally, it was to cover the expense of | someone pulling the data, making copies, and then mailing | that data out to you. If it was photographic, you'd be | charged for the prints. I'd see using Requester Pays in the | same vein. They are not charging you for the data, but any | fees incurred to obtain the data would be at your expense. | TallGuyShort wrote: | It's required to be public domain. IMO it's comparable to | FOIA requests still requiring the requester to attach a stamp | to the envelope their request goes in. Or at most, include a | self-addressed stamped envelope too. | | Requiring you to pay S3 is little different than requiring | you to have Internet access, and thus pay whichever company | includes you in THAT monopoly, IMO. | macintux wrote: | To me it feels very different. | | Imagine for a moment that in order to access NASA data sets | you had to have a Fastmail email account. Gmail won't work, | Outlook won't work, it has to be Fastmail alone. | | That would be very objectionable (as much as I adore | Fastmail). | | Ability to pay one specific cloud provider should not be a | gate for public domain government data. | TallGuyShort wrote: | The alternative here, though, to get comparable | distribution / durability, etc. by spending _way_ more of | the public 's money upfront regardless of who wanted it. | I get the purist / idealistic argument here, but it feels | a bit like cutting off one's nose to spite their face. | somethingwitty1 wrote: | I don't think this analogy works. For Fastmail, there is | a cost regardless of whether you want to access | government data. You have to pay for the account itself. | For most cloud providers, there is zero cost for having | an account. Even if they hosted this themselves, they | could just as likely charge for data transfer costs...and | get to choose how to collect that. They could choose | PayPal and you have to create an account. Or they take | credit cards...and you must have a card belonging to one | of the networks they support. The barrier to entry | doesn't change regardless of how many cloud providers | there are, all it does is increase infrastructure costs | unnecessarily. | jfk13 wrote: | If there's a marginal cost for each copy of the data that's | transferred to a user, I don't think asking the user to cover | that cost conflicts with a requirement to "give away the | data". | | (If they distributed their science data in printed form, | surely they'd be allowed to charge people for the cost of | printing & mailing the paper copies; that's quite different | from charging for the data itself.) | dragonwriter wrote: | > If there's a marginal cost for each copy of the data | that's transferred to a user, I don't think asking the user | to cover that cost conflicts with a requirement to "give | away the data". | | Charging the user for data, even if it is on a marginal | cost basis, conflicts with a mandate to give data away | freely. Because "at the marginal cost of delivery" is not | "free". | | (It's true that it is common for mandates to specify | something like at marginal cost of delivery rather than | free--sunshine laws providing copies of public records | often work that way--but that's not the applicable mandate | here; in fact, since without the separate mandate here the | data would be available on a marginal cost basis under | FOIA, the main reason for a separate mandate is to negate | that cost.) | jfk13 wrote: | Do you have a citation for the "mandate to give data away | freely"? | | I found https://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPD | &c=2230&s=1, which mentions things like "Ensure public | access...", but I don't see anything there mandating such | public access to necessarily be at zero cost. | 3pt14159 wrote: | Also, public access can mean that once someone gets a | copy of the data they can host it for free as well. It's | not as if it's under a commercial license. | elcritch wrote: | Why the downvotes? This isn't uncommon or unreasonable if | you're downloading TB's of data. Also the data would be | freely redistributable if someone took the data and put up | a torrent. Still I'd rather see NASA host their own data. | Put up an FTP server, torrent server and save a lot of | money on hosting fees. | harlanji wrote: | Records departments always charge for copies, and that is | the use I thought of immediately when I learned of | Requester Pays. I'd be surprised if NASA couldn't use it. | topkai22 wrote: | While proxying through a torrent system is a good idea. I | doubt it would get well seeded outside a few popular | datasets- the agency would end up the sole seeder of the | long tail. | | I'm willing to bet NASA saves a ton of money by going to | a cloud provider- US government storage setups are | insanely expensive. I remember a project I was on got a | quote of over $10,000/TB in 2014, and there is no way | egress is actually free right now- they are paying for a | government regulation compliant internet connection one | way or another. | | I do worry about vendor lock in to a degree, but I'm | confident the agency and tax payers would save money | going to any major cloud provider. | Aeolun wrote: | What causes a cost of $10000/TB? Even with multiple | redundant failsafes I just cannot see how the cost could | run up to that. | Spooky23 wrote: | In 2014? | | You'd be buying something like an EMC vMax that can | sustain 1M+ IOPS on lots of 15K spinning drives, with | caching tiers on crazy expensive flash. | | To support that, you need a fibre channel network layer | and a bunch of FTEs to attend to it. Usually compliance | requirements require segmentation of roles, which | increases cost. If you're a federal government entity, | those FTEs are most likely contractors billed out at | $125-300/hr. Figure $3-5M/year on labor costs alone, | although that may be divided out over multiple systems. | | This happens in commercial business too. I had a buddy | who was making about $150k in NYC to zone luns on a SAN. | Basically he kept a spreadsheet and updated a specific | configuration setting 2-3x a day and spent about 60-90 | minutes/day doing that. The rest was waiting or studying | for his MBA. | | It's pretty wacky to compare S3 to this type of storage. | elcritch wrote: | Wow! That's good to know, if a bit disheartening. I guess | I was thinking costs for small startup costs with some | cheap-ish linux raid setups and likely massive fiber taps | NASA must surely already have. Not government/big | business costs. | Spooky23 wrote: | Sounds like there is a bigger story there and it's | probably a managed SAN. | | I've operated pretty significant government shared | infrastructures like this in the past... we were offering | fast, flash-cached disk in 2010 for about $5,000/TB. | $10k/TB is not unreasonable for highly available Tier-1 | storage for something like SAP, especially in that era | where you couldn't use all flash in most case. | | Today, cost structures can be very different. You can | land high-iop storage for a fraction of the cost without | the overhead of a big SAN. If you need capacity focused | storage, that is also much cheaper. | | An agency like NASA gets hosed on services, and cloud is | no different. AWS is probably a net savings for | operational workloads whose characteristics are known. | Backup is a no-brainer. But for a high-volume, | operationally highly variable thing like a public archive | of data, AWS a square peg in a round hole because of the | metered access. | mentat wrote: | By the way, depending on where it's hosted, S3 can seed | torrents automatically: https://docs.aws.amazon.com/Amazo | nS3/latest/dev/S3TorrentRet... | xxs wrote: | Why FTP - torrent it all the way, perhaps have the AWS as | nodes... | NikolaeVarius wrote: | https://docs.aws.amazon.com/AmazonS3/latest/dev/S3Torrent | .ht... | angry_octet wrote: | And this would be an even worse outcome. | ben509 wrote: | Why? | advisedwang wrote: | This then requires that everyone have an AWS account and | billing relationship with Amazon to access public data. | VikingCoder wrote: | I'm a huge fan of requester pays, and I frankly don't | understand why we haven't switched more of the internet to it. | | I'm also a liberal, so then I also think government should give | everyone a monthly quota of internet usage allowance. Universal | Basic Internet Income, or something. | [deleted] | 7777fps wrote: | I assume the data accessed is a heavily skewed pareto | distribution. | | Given that, it's maybe still cheaper to build their own serving / | caching layer in front to save egress costs than to have | constructed the whole storage solution themselves. | vidarh wrote: | Putting a caching layer in front of AWS is often very cost | effective even without much skew in the access pattern. It | tends to take a very low hit rate before it pays for itself. | mensetmanusman wrote: | This seems like a good use of torrenting? | maerF0x0 wrote: | It looks like they were aware of Bit torrent as recent as Oct | 2011 | | https://web.archive.org/web/20111024223108/https://visibleea... | caymanjim wrote: | Torrents are only helpful when there's a large number of people | who download the data and are willing to share it. There's not | a large userbase for the vast majority of NASA data. It | wouldn't be distributed in any meaningful way. | unhammer wrote: | YOU ARE NOT AFRAID? 'Not yet. But, er...which way to the | egress, please?' There was a pause. Then Death said, in a | puzzled voice: ISN'T THAT A FEMALE EAGLE? | | I've been reading A Hat Full of Sky to my daughter these days, | and there's a running joke that "supposedly intelligent people" | don't know the meaning of the word "egress", mixing it up with | things like egret, ogress or eagles. | | (See also the inspiration for the joke: | https://unrealfacts.com/pt-barnum-would-trick-people-with-a-... ) | slowhand09 wrote: | Wow! I worked on EODSIS in 93-96. We estimated 16 petabytes, at | the time it would be one of the worlds largest databases. We | changed horses midstream moving our user interfaces from | X-windows Motif to WWW. And built a very early Oracle DB | accessible via WWW. There was no cloud then except missions | studying atmospheric water vapor. When this was originally | designed there were to be several (6-7) DAACs - Distributed | Active Archive Centers (https://earthdata.nasa.gov/eosdis/daacs) | to store data near where it was needed or captured. Now they have | 12 and are storing on AWS. Amazon didn't exist when this was | originally built. | acruns wrote: | I was at ESRI when we were going to host this data, then | Congress got involved and blocked it. | [deleted] | [deleted] | ackbar03 wrote: | Oh but aws didn't forget. Aws never forgets | gonzo41 wrote: | This is kinda bad press for AWS. If I were NASA I'd be shitty | about the relationship manager not hinting and trying to help | architect for lowest cost. | NikolaeVarius wrote: | Since when the hell does NASA actually care about bad press | regarding costs these days. | johnmaguire2013 wrote: | NASA or AWS? Parent said AWS. | badwolf wrote: | NASA is spending over $1B for a launch pad that will be | used no more than 4 times. | | https://spacenews.com/report-finds-delays-and-cost- | overruns-... | whatshisface wrote: | How much would a launch pad that will be used four times | normally cost for what they're planning to launch? | Without knowing that I can't say if they overpaid 10x, | 2x, got it exactly right or got an amazing bargain. | yborg wrote: | In 1965, the Vertical Assembly Building, which was at | that time the largest enclosed volume in the world, cost | $117M (on a $23.5M original construction contract). That | would be about a billion dollars in 2020, but it was | completed in 3 years and was used to stack 13 Saturn Vs. | It was later used for the 100+ Shuttle missions as well, | but there were additional costs to modify the building | for this purpose. The VAB is still planned for use for | future missions. | | https://www.popsci.com/blog-network/vintage-space/nasas- | vab-... | TomMckenny wrote: | Judging from the random yet inexplicable 42 cent bill on a free | account I set up years ago, I'd say their memory is positively | unbelievable. | ph2082 wrote: | 1 Terabyte of hard disk cost ~50USD. | | 247 Petabyte ~ 247000 Terabyte > 50000 USD. | | Network cards, bandwidth, electricity cost > I can't guess. | | Couple of good engineers (hardware and software ones), which they | definitely have. | | May be they could have built their own cloud in < ~10-15 million | USD. And that won't be recurring cost. | | May be they missed article about Bank of America saving ~2 | Billion USD, by building their own cloud. | SEJeff wrote: | You realize that the entire openstack project came from the | opensourcing of NASAs opennebula project, right? They've got | one of the biggest infiniband networks in the world | underpinning it. | ph2082 wrote: | I didn't knew that. Thank you for telling. | | Now I am more curious why go along with AWS instead of using | Openstack. Need to find some case study of openstack vs rest | of cloud provider. | duskwuff wrote: | Because OpenStack is a piece of software, not a provider. | And it's instructive to consider why none of the major | cloud providers use it... | rmrfstar wrote: | In addition to saving money, they will also make the US more | resilient by helping avoid a concentration of expertise and an | infrastructure mono-culture. | | I suspect that ideas like this will become more popular as the | US asks itself "what happened to our resilience?" | | [1] https://en.wikipedia.org/wiki/Self-Reliance | supdatecron wrote: | Your numbers are way off, as you didn't account for redundancy | of the drives (any failure or bit flips of 1 of those 2,470 | drives will cause corruption of likely the entire data set). | | > Network cards, bandwidth, electricity cost > I can't guess. | | This is where a huge amount of cost is. | | > And that won't be recurring cost. | | Maintenance, humans, cooling, drive replacements, property, | building, land tax, payroll tax are all recurring costs. | ph2082 wrote: | > Your numbers are way off, as you didn't account for | redundancy of the drives (any failure or bit flips of 1 of | those 2,470 drives will cause corruption of likely the entire | data set). | | Let take another setup of same count as backup. Then another | setup as back up of back up. ~150K | | > This is where a huge amount of cost is. | | Maintenance, humans, cooling, drive replacements cost > can't | be greater than first time set up cost. | | > property, building, land tax, payroll tax | | Nasa runs on Government budget, I am sure they can claim some | tax break there. | | The point I am trying to make is, it may be cheaper to do in- | house with the level of engineering talent they have. | sitkack wrote: | The government should be running its own object store. And | by government, I mean coordinated by Internet2/NSF with | federation across all member orgs. | | https://en.wikipedia.org/wiki/Internet2 | | Use backblaze pods, demand off peak bandwidth of gilded age | megacorps that own said fiber for sync/replication. | | https://www.backblaze.com/b2/storage-pod.html 480TB/4U | | Have 3x sites around the US the build the pods, each new | pod gets preloaded with a smattering of rarely requested | and low replication count objects (as a redundant backup). | Then shipped to the site where it will be used. Local | writes go directly to pods which are then kept in sync with | the rest of the cluster. | | __edit, from the TFA | | ``` And to put a cherry on top, the report found the | project's organizers didn't consult widely enough, didn't | follow NIST data integrity standards, and didn't look for | savings properly during internal reviews, in part because | half of the review team worked on the project itself. ``` | Neil44 wrote: | His numbers can be out by many mulitples and still beat AWS's | 5 Mil a month with no egress. | knorker wrote: | This surely was entirely known to AWS, where they were rubbing | their hands at the fact that every user of this data has to | process it using EC2 on site. | | This is Cloud lock-in using data location. | Spooky23 wrote: | Using AWS for this type of use case is dumb for an org as large | as NASA, if cost savings is a goal. It's cheaper to just land | capacity at a datacenter. | toyg wrote: | I guess they have additional legal constraints that don't allow | them to just "land space" here or there - the vendor must | probably be security-vetted, compliant to a hundred government- | produced checklists, and willing to go through extra-long sales | and support cycles. It will inevitably push up prices | significantly. | | In fact, I can imagine ops-teams at Nasa licking their lips at | the idea of doing away with a lot of that bureaucracy once they | switch to AWS... note how the report mentions that some of the | controllers are actual sponsors of the move: it's obviously a | conflict of interest, but it might well arise when the org as a | whole is a bit too happy to steer away from a suboptimal | situation. | | This said, AWS will rob them blind, simply because they can. | Like all outsourcers (which is effectively what they are), they | get in with the simplicity argument, then boil that frog up | with extra charges. It's good that somebody pointed out one of | those charges, but I doubt anything will change substantially- | Amazon will probably cut them a discount and that will be it. | And once you're invested in a cloud env to the tune of hundreds | of petabytes, you'll likely not switch away for decades. | Karunamon wrote: | > _..then boil that frog up with extra charges._ | | That implies a level of dishonesty or nontransparency that | AWS doesn't have. Their pricing is disclosed, up front, and | they offer a calculator to model your costs out. Knowing how | much data egress you're going to have is not some arcane art, | NASA just plain forgot to do it. | | It may be _complicated_ , but so is any workload at this | size. Figuring the cost is part of due dilligence, and | they've made it as straightforward as possible. | toyg wrote: | _> That implies a level of dishonesty or nontransparency | that AWS doesn 't have._ | | Have you ever been part to an enterprise-level sales cycle? | Things like the official calculator are waved away, since | the customer is on a special deal, so "of course is not as | much as that!". The customer asks for a quote with a | certain degree of detail, the vendor provides an answer | with the degree of accuracy required to get them in the | door. If it turns out after a year that the customer ended | up paying 2x, well, too bad - clearly they must have had | higher requirements than forecasted! "Did you record all | your traffic? No? Well, _we_ did, and the result is this | bill, sorry. Alright, alright, I hear your complaint, I | tell you what - I 'll give you a big discount on your next | order, what about that?" Rinse, repeat. This is not | dishonesty and I'm not alleging malfeasance or anything | like that, it's just how that world works in my experience. | | In order to figure out the real cost of outsourcing, you | need an adversarial attitude that most shops simply lack, | because they've fundamentally made the choice to abandon | the previous solution even before they've entered the sales | cycle. This is particularly clear in a case where some | controller is also part of the group promoting the switch. | It's surprising it was flagged up, there must be a | competing group somewhere that is desperately trying to | fight on - maybe some Oracle-friendly "japanese in the | jungle" or something. Or maybe bureaucratic procedures to | safeguard the institution are actually working as they | should, for once, but that would be pretty exceptional in | itself. | Spooky23 wrote: | That's a half-truth. | | All of the cloud vendors de-empathize network egress costs. | It's similar to products that depend on Microsoft licensing | who will always omit those types of costs. (Oh, so you | needed to spend another $500k in SQL Server Enterprise?) | | Many organizations lack the operational metrics to allow | them to effectively measure their egress needs. And | AWS/GCP/MS salesmen arent in the business of slowing down | deals with awkward questions. | | This is especially true where an org like NASA probably | contracts out things like network services. Going from a | model where you make fixed capital investments to paying | for the byte is difficult to measure. | Karunamon wrote: | I'm not sure what you mean by "de-empathize". | | Here's the official pricing calculator[1] - note that | ingress and egress costs are included in all relevant | services. Also note that for something like S3 (which is | probably what the article mentions the "earthdata cloud" | is based on), the pricing details are right there on the | description page[2]. | | There is no evidence of any malfeasance by AWS here, just | lots of casting aspersions. What _specifically_ do you | want that was not provided? | | [1]: https://calculator.s3.amazonaws.com/index.html | | [2]: https://aws.amazon.com/s3/pricing/ | tehalex wrote: | I wonder if this includes or if they can use Direct Connect? [1] | | Cloud data transfers are too expensive, personally I assume that | it costs more to measure and bill for bandwidth than the usage | itself... | | 1: https://aws.amazon.com/directconnect/ | angry_octet wrote: | They could use direct connect, from each of their data centres, | essentially turning AWS into a giant NAS. However this gives up | the idea of using AWS compute to provide value added analysis. | Wheaties466 wrote: | at that point why not just use a P2P based system. | ralusek wrote: | I wonder why they wouldn't use Wasabi: | | https://wasabi.com/cloud-storage-pricing/ | | Looks like egress is free. | | Maybe because it's comparably untested? Does anyone here have any | experience with it? | Eikon wrote: | I wouldn't rely on that. Wasabi does not | charge for egress but our pricing model is not suitable for use | cases involving the hosting of videos in a manner where the | ratio of egress downloads exceeds the amount of storage. | | https://wasabi-support.zendesk.com/hc/en-us/articles/3600004... | alexfromapex wrote: | Probably need assurances or regulatory solutions that only a | cloud giant like AWS could address | pixelbath wrote: | Unless my numbers are _way_ off, I got around $15.5 million per | year using Backblaze 's calculator: | https://www.backblaze.com/b2/cloud-storage-pricing.html | | Numbers used: Initial upload: 258998272 GB | (1024*1024*247) Monthly upload: 100 GB (default) | Monthly delete: 5 GB (default) Monthly download: 1048576 | GB (1 PB) Period of Time: 12 months (default) | adtac wrote: | It'll take 215,000 years to reach 247 petabytes if you averaged | 100 GB of upload a month. | bhandziuk wrote: | I think they're saying NASA would add ~100GB of new data to | this dataset every month. | adtac wrote: | I know. And I'm saying if that was the rate they've | historically added data to their dataset, it would've taken | them 200,000+ years to get here. Which is why 100GB/mo is | virtually nothing for NASA -- it doesn't match with their | historical throughput. | kylebarron wrote: | The initial upload is 247 petabytes | chx wrote: | If you are facing similar problems you should know traffic via | Cloudflare from B2 is free. I am not 100% CF would be happy if | NASA picked the CF free tier but probably their quote would be | magnitudes lower than Amazon's. | anthonylukach wrote: | This article seems short sighted. | | 1. Using the AWS cost calculator is pointless, naturally an | entity the size of NASA would get heavily discounted rates. 2. As | data volume grows, the complexities of working with that data | expands. NASA appears to be embracing cloud computing by | embracing a paradigm where scientists push computation to where | the data rests rather than downloading data [1], [2], [3], | thereby paying egress on only the higher order data products. 3. | The report notes that NASA has tooling to rate limit and throttle | access to data. This, in itself, proves that NASA didn't | "[forget] about eye-watering cloudy egress costs before lift- | off". | | People may scream about vendor lock in, which is a fair | complaint; but acting like NASA just didn't think about egress is | misleading. | | NASA is ultimately a science institution, I think diverting | effort away from infrastructure management and towards studying | data is likely a wise decision. | | [1: | https://www.hec.nasa.gov/news/features/2018/cloud_computing_...] | [2: https://link.springer.com/article/10.1007/s10712-019-09541-z] | [3: | https://ui.adsabs.harvard.edu/abs/2017AGUFMIN21F..02P/abstra...] | matchagaucho wrote: | Yeah, this looks like a FUD hit job, possibly by entities made | obsolete by a move to AWS. | | There are just too many solutions to egress optimization to | mention (CDN edge caching, rate limit, throttling, tiered | discounts, multi-year agreements). | | No gov procurement deal at this scale gets sticker shock from | retail prices. | kempbellt wrote: | >NASA is ultimately a science institution, I think diverting | effort away from infrastructure management and towards studying | data is likely a wise decision. | | Indeed. I am glad to see them leveraging the power of an | already proven infrastructure provider rather than spending X | billions of dollars trying to build and maintain their own. | sgt wrote: | No, that is just ridiculous. NASA is more than capable of | running their own server infrastructure. They've got | expertise, they've got DC's and they don't need 99.999% | uptime for most of their services. Cloud providers can turn | out to be insanely expensive. I am not against cloud - mostly | I would recommend it for businesses but when reaching a | certain size you have to consider doing your own cloud | infrastructure. | Supermancho wrote: | > NASA would get heavily discounted rates | | Having spent a lot of money with AWS, that's giving Amazon more | credit than I think is warranted. | [deleted] | ganstyles wrote: | +1 to this. I've been on teams that spent $75k/mo and didn't | get any hint of a discount. Though we got our own on call rep | to handle issues. | soared wrote: | $75k/mo is tiny in the enterprise world. At Oracle they'd | give a 22 year old fresh out of school ~30 accounts that | size, for reference. I worked on a team of 9ish on a | ~$5MM/mo account. (Not cloud, but a comparable business | unit) | bosswipe wrote: | At which level do you start having real negotiation | power? | pathseeker wrote: | >NASA is ultimately a science institution, I think diverting | effort away from infrastructure management and towards studying | data is likely a wise decision. | | True, but once you're a certain scale, outsourcing everything | just because it's not your competency isn't a good excuse. You | can afford to hire enough people for it to become your | competency. | tda wrote: | Probably there is even good competition between the cloud | providers, because hosting the data means that you can sell a | lot of compute time to all the users of the data. NASA choosing | for AWS means that any IO intensive analysis on that data will | run faster/better/cheaper on AWS. | eeZah7Ux wrote: | No, there isn't, and going for the quasi-monopolist only | encourages lock-in. | tspike wrote: | Microsoft especially has been aggressive in courting large | companies away from AWS for cloud needs. | [deleted] | X6S1x6Okd1st wrote: | > NASA also knows that a torrent of petabytes is on the way. | | Oh that sounds like a potential solution. | | /s | oh_hello wrote: | "The audit, meanwhile, suggests an increased cloud spend of | around $30m a year by 2025" | | Isn't this a rounding error for NASA? | beastman82 wrote: | Torrent FTW | djrogers wrote: | I'm not saying this won't be a financial cluster - it likely will | cost many times more than planned - but the headline here is just | a flat-out lie. | | TFA says: | | "a March audit report [PDF] from NASA's Inspector General noticed | EOSDIS hadn't properly modeled what data egress charges would do | to its cloudy plan." | | 'Hadn't properly modeled' is very different from 'forgot about'. | And if you actually read the linked report, it says things like: | | "ESDIS officials said they plan to educate end users on accessing | data stored in the cloud, including providing tools to enable | them to process the data in the cloud to avoid egress charges." | and "To mitigate the challenges associated with potential high | egress costs when end-users access data, ESDIS plans to monitor | such access and "throttle" back access to the data" | | Neither of those statements would be _in the audit_ if the entire | topic had been a surprise. | tyingq wrote: | From that linked report... | | _" In addition, ESDIS has yet to determine which data sets | will transition to the cloud nor has it developed cost models | with the benefit of operational experience and metrics for | usage and egress."_ | | That sounds fairly close to the headline. | OzzyB wrote: | Looks like even the big boys get bitten by the Cloud Meme when | forgetting about bandwidth costs; glad I'm not the only one. | vnchr wrote: | Cloud VERSUS Space. Who will come out on top? | movedx wrote: | It's The Register, people. Don't take it seriously. It's | practically The Onion of the IT industry, especially the comments | sections. | | I've written two articles for them and the comments are a joke. | They're all anti-Cloud, anti-progressive. Try selling them | Kubernetes has a solution to their problems: they'll think you've | come to steal their children. I know, I've tried. | | In short: this never happened. NASA didn't forget anything. It | does, however, make for a great eye catching headline! | | Sorry to be bitter about this, but publications like The Register | serve little purpose these days. It caters to a specific kind of | IT personality that can't let go of their physical tin and they | think public Cloud has no place or use at all. Again I know, I've | tried convincing these people of such things. | szczepano wrote: | To sum up no matter how big the hard drives or data center we | produce we will always have problem with storage capacity. | ghostpepper wrote: | There's a joke around here somewhere about AWS pricing being too | difficult even for rocket scientists. | leni536 wrote: | Apparently AWS pricing is not rocket science | api wrote: | This is exactly why the costs are set up that way. The first time | I saw AWS pricing I chuckled and thought "roach motel." Data goes | in but it doesn't come out. Its one of many soft lock in | mechanisms cloud hosts use. | jka wrote: | What's the opposite of AWS Snowmobile[0]? | | [0] - https://aws.amazon.com/snowmobile/ | chickenpotpie wrote: | Downloading no data extremely fast | Mave83 wrote: | just build your own storage and save an incredible amount. | | It's hard you might think, but it's not. croit.io provides all | you need to deploy a scalable cluster even on multiple geographic | regions. | | Price for 1 PB sized cluster including everything from rack to | hardware to license to labor for below 3EUR/TB/Month or at the | Amazon Glacier price tag but with the S3-IA access. | driverdan wrote: | Are you seriously suggesting that NASA didn't consider | alternatives, like their current self-hosted solutions? | GordonS wrote: | Given they "forgot" about egress bandwidth costs, I think the | parent's comment was fair. | mister_hn wrote: | What about maintenance? Some forget about that... Broken | drives, broken RAID, broken NAS. | | A 120TB SSD NAS might cost over 200kEUR ..imagine a 250PB one | RandomTisk wrote: | Seems like a poor choice. If they're getting an incredible deal | with AWS, then fine, but I would be utterly shocked if most | seasoned and competent IT professionals couldn't design and | build a multi-region storage array for far less than Amazon | will charge them. | dna_polymerase wrote: | Do you want to add your contact details to your post so NASA | can get in touch or what is going on here. Add a little | disclaimer that you work for/are croit.io so people can | instantaneously see why you would argue for the space agency of | the U.S. to run their own data storage. | tzm wrote: | $5,439,526.92 per month | [deleted] | NikolaeVarius wrote: | Senator Shelby should get AWS to launch a new region in Alabama | for NASA at this rate. ___________________________________________________________________ (page generated 2020-03-19 23:00 UTC)