[HN Gopher] MinIO: A Bare Metal Drop-In for AWS S3 ___________________________________________________________________ MinIO: A Bare Metal Drop-In for AWS S3 Author : todsacerdoti Score : 203 points Date : 2021-08-10 13:22 UTC (9 hours ago) (HTM) web link (tech.marksblogg.com) (TXT) w3m dump (tech.marksblogg.com) | sebyx07 wrote: | https://www.storj.io uses minio underneath, but they treat minio | with respect, as a partner and pay their cut | 0x000000001 wrote: | Be warned that their code quality is pretty bad. There was a bug | I was dealing with last year where it did not delete objects but | returned the correct HTTP response code indicating it did. This | was widespread, not just some edge case I encountered. Their | broken test suite doesn't actually verify the object on disk | changed. I tried to engage them but they blew me off. | tobias3 wrote: | Minio isn't durable. Any S3 operation might not be on disk | after it is completed successfully. They had an environment | variable MINIO_DRIVE_SYNC for a while that fixed some cases. | Looking at the current code this setting is called | MINIO_FS_OSYNC now (for some reason) | https://github.com/minio/minio/pull/9581/commits/ce63c75575a... | (but I wouldn't trust that... are they fsyncing directories | correctly? Making sure object metadata gets deleted with the | data in one transaction etc.). Totally undocumented, too. | | I guess this makes minio "fast". But it might eat your data. | Please use something like Ceph+RadosGW instead. It might be | okay for running tests where durability isn't a requirement. | Areading314 wrote: | Is Ceph with Rados Gateway a better alternative to this? | 0x000000001 wrote: | I have a 500PB Ceph setup @ work, but I don't maintain it. | It's been solid. | nateoda wrote: | I would say no in production. I was recently testing a ceph + | rgw as an on prem s3 solution, but high throughput puts + ls | caused an index corruption that "lost" files according to | future LS's, the file was still there if you directly get it. | When this was reported it was already found multiple years | ago, and never fixed | marbu wrote: | Could you reference a bug url? I tried to find it via | tracker.ceph.com but failed to do so (I don't claim that | the problem doesn't exist). That said referencing a bug url | would be nice if you want to increase credibility of your | claim. | etaioinshrdlu wrote: | I had issues with frequent crashes due to various panics a | while a go. It eventually went away after a version upgrade. | But now reading this I don't feel terribly confident in using | minio long term. | mtalantikite wrote: | I also hit a very frustrating issue in minio where CORS headers | weren't being set properly and there were many similar cases in | their issues history. Their response was basically "works for | me, sorry". | | I'm pretty sure there was something weird going on with how | minio was reading the config state, as I definitely was not the | only one hitting it. Luckily I only had to use it for local | testing in the project, but the whole thing didn't leave me | feeling good. | | [1] https://github.com/minio/minio/issues/11111 | pbadenski wrote: | We tried to use it a year ago or so, because of the performance | promise. We were getting random glitches every few thousand | files during operations. There was no obvious pattern, so | difficult to reproduce, and as far as I remember there was | mentions of it in github. Hopefully they acknowledge and get | over this hump, as it seems like a promising project all | together. | merb wrote: | what would be a better way to export a nfs storage to s3 than? | swift, like it does for glusterfs? | kapilvt wrote: | Github issue link? They seem to have a solid ci setup, and I | know several large enterprises using. But I found a bug for my | usage != bad code quality. | 0x000000001 wrote: | My usage was "setup a basic single node for testing, upload a | file with mc client, delete a file with mc client". They | failed that test. It was responding with 200s but the file | was never deleted. | | There are loads of issues like this on their github: | https://github.com/minio/minio/issues/8873 | jeffbarr wrote: | Amazon S3 on Outposts (more info at | https://aws.amazon.com/s3/outposts/ ) runs on-premises, offers | durable storage, high throughput, the S3 API, currently scales to | 380 TB, and doesn't require you to watch for and deal with | failing disks. | | I believe that it addresses many of the OP's reasons for deciding | against S3. | Nullabillity wrote: | At least disclose your conflicts of interest when writing spam | like this. | GiorgioG wrote: | And only costs $169,000 to start | gunapologist99 wrote: | Hey, let's be fair. The storage-optimized instances start at | only $425,931. ;) | oneplane wrote: | It's not like buying hardware, support personnel hours and | write-off administration is that much cheaper, unless you're | willing to discard some features, but at that point you're no | longer comparing things equally. | rodgerd wrote: | Have Outposts fixed their "always-on, lose connectivity, lose | your outpost" problem that they had when I first asked about | them? | | Can they scale down to "I need to spin up an S3 thing for local | testing" for the cost of the storage and CPU? | | Am I locked into a multi-year agreement, or can I just go and | throw it away in a month and stop paying? | speedgoose wrote: | I'm not going to contact AWS sales when I can easily use minio | on Docker or Kubernetes. | tyingq wrote: | The gateway feature, where Minio works as a local cache for | actual AWS S3 buckets, looks pretty nice. | | https://docs.min.io/docs/minio-gateway-for-s3.html | eropple wrote: | I've used this, years ago, at a company for exactly this, and | it's really solid, I've also used it in a developer environment | as a more expansive "fake S3" than the simpler ones I'd run | across at the time. Good stuff. | didip wrote: | Anyone compared MinIO vs Ceph? I like MinIO because it seems | exponentially simpler to setup but I don't know about its | distributed and scalability stories. | opyh wrote: | While I can't say much about its handling when using it | distributed, I have had some negative experiences with | MinIO/ceph when handling files > 10G. | | One example: missing error handling for interrupted uploads | leading to files that looked as if they had been uploaded, but | had not. | | Both ceph and MinIO's implementations differ from AWS original | S3 server implementation, in subtle ways. ceph worked more | reliably, but IIRC, both for MinIO and ceph, there is no | guarantee that a file you upload is readable directly after | upload. You have to poll if it is there, which might take a | long time for bigger files (I guess because of the hash | generation). AWS's original behavior is to keep the socket open | until you can actually retrieve the file, which isn't necessary | better, as it can lead to other errors like network timeouts. | | I got it working halfway reliably by splitting uploads into | multiple smaller files, and adding retry with exponential | backoff. Then I figured out that using local node storage and | handling distribution manually was much more efficient for my | use case. | | So for larger use cases, I'd take the 'drop in' claim with a | grain of salt. YMMV :) | dzonga wrote: | Something not really discussed a lot, FoundationDB works well as | a blob / object store. | heipei wrote: | I don't know when this was written, but MinIO does not have a | great story (or really any story) around horizontal scalability. | Yes, you can set it up in "distributed mode", but that is | completely fixed at setup time and requires a certain number of | nodes right from the beginning. | | For anyone who wants HA and horizontal elastic scalability, | checkout SeaweedFS instead, it is based on the Facebook | "Haystack" paper: https://github.com/chrislusf/seaweedfs | chrislusf wrote: | Thanks! SeaweedFS has a linearly scalable architecture and | performs much better. | | It is also very easy to run. Just run this: "docker run -p | 8333:8333 chrislusf/seaweedfs server -s3" | Already__Taken wrote: | wow that's got everything and almost the kitchen sink. | chrislusf wrote: | Actually SeaweedFS already supports asynchronous backup to | data sinks such as S3,GCP,Azure,etc. | | I did not heard of this "kitchen" sink before. :) | | But adding one more sink should be trivial. | killingtime74 wrote: | I agree it's great (and mostly used) for integration testing, | do significant number of users actually use it for storage? | gunapologist99 wrote: | Seaweedfs has made some questionable security decisions | (https://github.com/chrislusf/seaweedfs/issues/1937). | Proven wrote: | Drop-in as in drop-in _replacement_? | | Nope. | | No S3 s/w is a drop-in replacement for AWS S3. | | I am not going to read the post; maybe differences in API | implementation and behavior are mentioned somewhere in the post, | but if that's the case the title shouldn't talk about "drop-in". | tinco wrote: | Is HDFS nice? I did a lot of research before settling on Ceph for | our in-house storage cluster, and I don't remember even | considering HDFS and I don't really know why. Ceph also is a | drop-in for S3 for bare metal clusters. | | I've been running Ceph for about a year now, and the start up was | a bit rough. We are actually on second hand hard drives, that had | a lot of bad apples, and the failures weren't actually very | transparent to deal with, which was a bit of a disappointment. | Maybe my expectations were too high, but I was hoping it would | just sort of fix itself (i.e. down the relevant drive, send me a | notification, and ensure continuity). I feel I had to learn way | too much about Ceph to be able to operate it properly. Besides | that the performance is also not stellar, it apparently scales | with CPU frequency, which is a bit astonishing to me, but I've | never designed a distributed filesystem so who am I to judge. | | I was looking for something that would scale with the company. | Now we've got 70 drives, maybe next year 100 and the next year | 200. Now all our drives are 4TB, but I'd like to switch them out | for 14TB or 18TB drives as we go along. We're not in a position | to just drop 100k on a batch of shiny state of the art machines | at once. Many filesystems assume the number of drives in your | cluster never changes, it's crazy. | Cixelyn wrote: | Curious -- any reason you didn't just go with a single machine | export + expansion disk shelves on something like ZFS? | Installing a MinIO gateway would also act as a bare drop-in for | S3 too. | | Asking since we're in the same position as yourself w/ high | double-digit disks trying to figure out our plan moving | forward. Right now we're just using a very large beefy node w/ | shelves. ZFS (via TrueNAS) does give us pretty good guarantees | on failed disks + automated notifications when stuff goes | wrong. | | Obviously a single system won't scale past a few hundred disks | so we are looking at alternatives including Ceph, GlusterFS, | and BeeGFS. From the outside looking in, Ceph seems like it | might be more complexity than it's worth until you hit the 10s | of PB range with completely standardized hardware? | tinco wrote: | Some of our rendering processes take multiple days to | complete, and the blackbox software we use doesn't have a | pause button. So it's not that we're in need of 99.99999% | uptime, but there's actually never a moment where rebooting a | machine would be convenient (or indeed cost us money). Being | distributed over nodes means I can reboot them and the | processes are not disrupted. | merb wrote: | for k8s there is also kadalu btw. which is based on | glusterfs, but simplified. | gpapilion wrote: | HDFS doesn't really work as a normal filesystem. I think some | other commenters pointed out the challenges with FUSE. | | If I recall correctly there isn't really a way to modify an | existing file via HDFS, so you'd have to copy/edit/replace. | Append used to be an issue, but that got sorted out a few years | back. | | Erasure coding is available in the latest versions. Which helps | with replication costs. | | I think, HDFS may just be a simpler setup than other solutions. | (which is to say its not all that simple, but easier than some | other choices). And I wouldn't use HDFS as a replacement for | block storage, which is something I've seen done with Ceph. | tinco wrote: | Thanks, we actually use Ceph as a straight up filesystem, | that gets mounted on linux machines and then exposed to our | windows based processing nodes (they are human operated) over | SMB. I think that explains why HDFS is not a good fit for us. | jeffbee wrote: | HDFS has pretty much all of Ceph's flaws plus it has a non- | scalable metadata server, the "NameNode". If you're already up | and running with Ceph I can think of no reason to abuse | yourself with HDFS. | pram wrote: | You have to use something like FUSE to mount HDFS, if that is | your intention. It's not really like Ceph. Unless your app is | written to use the HDFS API directly it's going to be a bigger | rigmarole to store stuff. | justinholmes wrote: | Did you not evaluate linstor? | tinco wrote: | Thanks, I didn't but it looks interesting, I'll research it | later. | ryanmarsh wrote: | What about S3 didn't meet your use case? I don't work for AWS. | I don't care if they lose business, I am interested in how | different companies parse their requirements into manage vs. | rent. | tinco wrote: | One aspect is that we have a lot of data that has PII in it, | and we feel safer if we anonymise that locally before sending | it into the cloud. Once the data is cleaned up it's actually | sent to GCS for consumption in our product). Another aspect | is that this data has to be accessible as windows fileshares | (i.e. SMB) to our data processing team. The datasets are in | the range of several 100's of GB to several TB, each of the | team members works on several of those datasets per day. This | would strain our uplink too and maybe the bandwidth would be | costly as well. | wingmanjd wrote: | We're spinning up a medium sized Proxmox clusters (~50 nodes in | total) to replace our aging Xen clusters. I saw Ceph is | available on the Proxmox platform, but was hesitant to make all | the VM storage backed by Ceph (throwing all the eggs into a | single basket). | | What were some of the other hurdles you faced in your Ceph | deployment? | tinco wrote: | We've been playing around with migrating our bare metals to | proxmox as well. Though one main argument, being able to | reboot/manage crashed GPU accelerated nodes, was invalidated | by proxmox (KVM?) itself crashing whenever the GPU would | crash, so it didn't solve our core problem. This is of course | also due to that we're not using industrial components, but | it is what it is. | | I found Ceph's error messages very hard to debug. Just google | around a bit for the manuals of how to deal with buggy or | fully defective drives. There's a lot of SSH'ing in, running | vague commands looking up id's of drives and matching them to | linux device mount points and reading vague error logs. | | To me as a high level operator it feels it should be simple. | If a drive supplies a block of data, and that data fails its | checksum, it's gone. The drive already does its very best | internally to cope with physical issues, if the drive | couldn't come up with valid data, it's toast or as close to | toast as anyone should be comfortable with. So it's simple, | fail a checksum, out of the cluster, send me an e-mail, I | don't get why Ceph has to be so much more complicated than | that. | tasqa wrote: | I found proxmox not to be very user friendly growing to such | cluster sizes. Proxmox itself has been very stable and | supports pretty much anything but the GUI is not that great | if you have many nodes and VMS, and the API can be lacking. | However, using ceph as a backing store for VM images is | pretty easy in proxmox. I have not used the cephFS stuff. I | used it in a separate cluster both physically and standalone | (not using proxmox integration). | | So RBD is easy, S3 is somewhat more complicated as you need | to run multiple gateways, but still very doable. The FS stuff | also needs extra daemons, but I have not yet tested it. | sam0x17 wrote: | > Unfortunately, none of the above are available to the public, | let alone something outsiders can run on their own hardware. | | This is misleading. While there are no bare metal projects I'm | aware of, there are 10+ S3-API compatible S3 alternatives, such | as Wasabi, Digital Ocean Objects, etc., to name a few. | neilv wrote: | MinIO is something I'll look into. And, as another example to the | article's, it might also come in handy for some data needs for | factories with imperfect Internet reliability (e.g., when the | main submarine cable between the factory and AWS Singapore gets | severed :). | | This first example from the article sounds very valid, but is | still personally funny to me, because it's related to the first | use I made of S3, but in the opposite direction (due to different | technical needs than the article's): | | > _If an Airline has a fleet of 100 aircraft that produce 200 TB | of telemetry each week and has poor network connectivity at its | hub._ | | Years ago, I helped move a tricky Linux-filesystem-based storage | scheme for flight data recorder captures to S3. I ended up making | a bespoke layer for local-caching, encryption (integrating a | proven method and implementation, not rolling own, of course), | compression, and legacy backward-compatibility. | | That was a pretty interesting challenge of architecture, | operations, and systems-y software development. And the | occasional non-mainstream technical requirements we encounter are | why projects like this MinIO are interesting. | siscia wrote: | I wonder if this kind of article push business into the | consulting funnel. | | Can the author let us know if he is happy with the business | results of these articles? | mdaniel wrote: | One will wish to be cautious, as they recently changed their | license to AGPL-3.0: | https://github.com/minio/minio/blob/master/LICENSE because | they're afraid of AWS offering Minio as a hosted service, I guess | gunapologist99 wrote: | Pretty sure AWS would like to have something that at least | looks and feels sorta like S3. | | And, being S3-compatible at an API level would be a big bonus | for a company the size of AWS, especially if it had nearly | native compatibility with the aws-cli tool. | syshum wrote: | Why would AWS offer Minio, a clone of an AWS Service, as a | service? | | That seems very confusing | MonkeyClub wrote: | So that AWS can still get people to pay subscription fees to | them, instead of using their own hardware with a FOSS | solution, if MinIO becomes too popular. | remram wrote: | I don't understand. Aren't you describing S3? Why would | Amazon offer a second version of S3? | cyberge99 wrote: | Self hosted as a feature. Akin to managed vs colo | hosting. | remram wrote: | By "self-hosted" you still mean still running on AWS | hardware? I don't understand. Why would anybody pay EBS | rates instead of S3 rates, to get data stored in the same | place by the same people? | motives wrote: | They pretty much already offer this with outposts (its | technically their hardware but its on your premises). | [deleted] | hdjjhhvvhga wrote: | I understood it as a joke referencing Aamazon's tendency | to take just any open source product that happens to gain | enough popularity, rename it and offer it as a shiny new | feature of AWS. | remram wrote: | Amazon wouldn't, but another cloud service might decide to | run this rather than implementing their own S3-compatible | object storage from scratch. Or they might use part of | Minio's code to make their existing object storage solution | compatible with S3. | adolph wrote: | IIRC from this podcast with Anand Babu Periasamy [0], they | already do. | | https://www.dataengineeringpodcast.com/minio-object- | storage-... | marcinzm wrote: | They more likely afraid of smaller and upcoming cloud providers | offering it as an S3 drop in. | corobo wrote: | This plus CDNs I'd imagine too. S3 protocol is the new FTP | and minio ticks the box quickly, they want their share of | that (and deserve it imo) | [deleted] | [deleted] | benlivengood wrote: | Unless you're running a locally patched version AGPL is | indistinguishable from GPL. | kkielhofner wrote: | While this is true a word of warning: if you ever end up in a | due diligence situation or a source code audit AGPL can | really freak people out and hang up/derail the process until | you get them to understand this point. If you can at all. | OJFord wrote: | I'm pretty sure that - the prospect of AWS hosting and | rebranding minio - was a joke. | [deleted] | tyingq wrote: | That seems okay, since you can use any S3 client library. So, | good advice, but probably very few folks would have a need to | touch the server side source. | | Minio's client side libraries appear to be packaged separately, | and Apache licensed: https://github.com/minio/minio-go | | https://github.com/minio/minio-js | | (Etc) | whateveracct wrote: | And if you do touch the server-side code..do it in an open | source fork? | toolslive wrote: | don't use minio for object storage. Use it because you need an s3 | interface (and the object store you want to use doesn't provide | it). It's actually pretty straight forward to build an | integration if minio doesn't provide it. Implementation tip: make | the minio side stateless. Have fun. | bambam24 wrote: | Guys, don't store your data in minio. Its a sandbox, not an | actual object storage. Companies uses minio to store their | temporary data not the actual critical data. | | For example if you have a project which you store objects to S33, | At CI pipeline you don't want to store temp files into S3 for | cosr purposes. So instead you store at minio. A company must be | crazy to use minio as their real data storage. | xrd wrote: | It's amazing. I use it everywhere. | | The only limitation is that you don't have all the IAM access | rules that you get with AWS. | | Oh wait, that's exactly why I love it. | xrd wrote: | I feel like generating my own "S3 signed URLs" from Minio (as a | node script) is a much better way to layer security than that | IAM mess. | | And, the mc command line client is awesome. | | And, it all runs inside dokku which is incredible. | andrewxdiamond wrote: | What specifically do you have issues with when it comes to | IAM? | | It's a complicated tool for sure, but it comes from the | natural complication of dealing with auth in a very flexible | way. | rhizome wrote: | You answered your own question. "Very flexible" is a plus | for Amazon because they can cover everybody's use-cases | with a single concept. "Very flexible" is a minus for end | users because they only need to take care of their own use- | case. | | So you can say it's a "natural" complication, and you'd be | right, but that says nothing about usability, which is | where "issues" tends to come in. | lazide wrote: | Not the parent, but IMO it is an awkward way of thinking of | permissions in an automated environment. It fits a human | model much better, where you have a long lived entity which | is self contained and expected to be trustworthy. Alice | should have access to all finance data in this bucket | because she works in finance, or Bob should be able to | access these EC2 instances because he admins them. | | It causes weird and overly broad privileges though usually, | because you need to give permission to do any possible | thing the job or user of the credentials COULD need to do, | all the time. | | This happens because any action to limit the scope usually | causes more human friction than it is worth. | | Ideally, when it is requested they do something, they get | handed a token for the scope they are doing it in, which | only gives them access to do the specific things they will | need to do on the specific thing they need to do it, and | only for the time they plausibly will need to do it for. | This is a huge hassle for humans, and adds a lot of time | and friction. For machines, it can be as simple as signing | and passing along some simple data structures. | | So for example, Alice would get a token allowing access to | Q4 '20 only if that was plausibly correct and necessary, | and then only for how long it took to do whatever she was | doing. Bob would only get a token to access the specific | EC2 instance that needs him to log into it because of a | failure of the management tools that otherwise would fix | things - and only after telling the token issuing | tool/authority that, where it can be logged. | | It makes a huge difference in limiting the scope | compromises, catching security breeches and security bugs | early on, identifying the true scope and accessibility of | data, etc. | | Also, since no commonly issued token should probably ever | provide access to get everything - where the IAM model | pretty much requires that a job that gets any random one | thing, has to be able to get ALL things, then you also end | up with the potential for runtime performance | optimizations, since you can prune in advance the set of | possible values to search/return. | OJFord wrote: | > It causes weird and overly broad privileges though | usually, because you need to give permission to do any | possible thing the job or user of the credentials COULD | need to do, all the time. | | Not really, unless you mean it needs permission to assume | all the roles it could need in order to have the | permissions it requires. | lazide wrote: | That is exactly what I mean. A web server that makes | database requests needs permission to do any query a web | request would need to be able to trigger - not just | permission for the specific query that makes sense for | the specific request it is serving at the time. | | It's the difference between 'can query the database' and | 'can retrieve user Clarice's profile information because | she just made a request To the profile edit page' | | Does that make sense? | OJFord wrote: | Yes, I understand, but the point I'm making is that it | _does_ support 'roles', 'assuming' them for a period of | time, then dropping those privileges again, or 'assuming' | a different one, etc. | | The 'because' isn't there, but I'm not really sure what | that would look like, at least in a meaningful (not | trivially spoofable) way. | lazide wrote: | But no one is creating a role For read, modify, write for | every distinct user bit of data no? Or at least I hope | not and I doubt the system would function if they tried. | | Tokens can do that. | OJFord wrote: | But don't you just move the problem to the token-granting | authority? | | Don't get me wrong, I do see the hypothetical benefit, | I'm just having trouble envisaging a practical solution. | Is there something else not on AWS (or third-party for | it) that works as you'd like IAM to? | killingtime74 wrote: | Probably learning curve | wooptoo wrote: | MinIO is great. I use MinIO together with FolderSync (Android | only) to automatically backup the photos from my phone to my | local NAS. It runs a scheduled job every night and they're saved | in the original HEIC format. | | I've also used MinIO to mock an S3 service for integration tests, | complete with auth and whatnot. | jpeeler wrote: | Were you already using MinIO? As somebody who wants to | eventually backup photos on my phone, I'm curious why not just | use Syncthing for that? | wooptoo wrote: | Tbh I wasn't aware of Syncthing. In this use case it would | work just as well I suppose. | | One of the advantages of MinIO would be the wide | compatibility with other S3 storage services. If my NAS had | downtime while on holiday I could spin up a new bucket on | S3/Backblaze/Wasabi and backup everything in a few minutes. | joeyrobert wrote: | I know Minio can be used for production workloads, but Minio as a | localhost substitute for an S3-like store is underrated. S3 | dependent projects I work on spin up a Minio instance via docker- | compose to get a fully local development experience. ___________________________________________________________________ (page generated 2021-08-10 23:01 UTC)