[HN Gopher] Segment Anything Model (SAM) can "cut out" any objec... ___________________________________________________________________ Segment Anything Model (SAM) can "cut out" any object in an image Author : crakenzak Score : 322 points Date : 2023-04-05 15:11 UTC (7 hours ago) (HTM) web link (segment-anything.com) (TXT) w3m dump (segment-anything.com) | swframe2 wrote: | The best solution I've seen is https://github.com/xuebinqin/DIS. | You should try the DIS example images at the SAM site. | | The main issue I have with DIS is that creating the labels of my | own dataset is super expensive (I think it might be easier to | generate the training data using stable diffusion rather than | human labelling) | jonplackett wrote: | This would make a great input for ControlNet | fzliu wrote: | Computer vision seems to be gravitating heavily towards self- | attention. While the results here are impressive, I'm not quite | convinced that vision encoders are the right way forward. I just | can't wrap my head around how discretizing images, which are | continuous in two dimensions, into patches is the most optimal | way to do visual recognition. | | What's preventing us from taking something like convnext or a | hybrid conv/attention model and hooking that up to a decoder | stack? I fee like the results would be similar if not better. | | EDIT: Clarifying that encoder/decoder refers to the transformer | stack, not an autoencoder. | neodypsis wrote: | > What's preventing us from taking something like convnext or a | hybrid conv/attention model and hooking that up to a decoder? I | fee like the results would be similar if not better. | | You mean like in an U-Net architecture? | yeldarb wrote: | Wow, this is pretty epic. I put it through its paces on a pretty | wide variety of images that have tripped up recent zero-shot | models[1] and am thoroughly impressed. | | We have a similar "Smart Polygon" tool[2] built into Roboflow but | this is next level. Having the model running in the browser makes | it so much more fun to use. Stoked it's open source; we're going | to work on adding it to our annotation tool ASAP. | | [1] Some examples from Open Flamingo last week | https://news.ycombinator.com/item?id=35348500 | | [2] https://blog.roboflow.com/automated-polygon-labeling- | compute... | syntheweave wrote: | Finally, I'll be able to fill line art with flat colors without | fussing around with thresholds and painting in boundaries. | | (It does have difficulty finding the smallest possible area, but | it's a significant advance over most existing options since in my | brief test, it can usually spot the entire silhouette of figures, | which is where painting a boundary is most tedious). | MrGando wrote: | This is great, runs very fast in Chrome for me. | benatkin wrote: | They seem to avoid using their own brand a lot. They have a | zillion domain names and they register a new one and don't use | the logo except in the favicon and footer. I've seen similar | stuff including divesting OSS projects like PyTorch and GraphQL | which Google wouldn't. To me that's tacit admission that the | Facebook and Meta names are tarnished. And they are, by the | content they showed users in Myanmar with the algorithmic feed, | and by Cambridge Analytica. Maybe the whole "Meta" name is no | different from the rebranding of Philip Morris. | smoldesu wrote: | On the one hand, sure. Facebook's brand is about as hip as a | bag of Werther's Originals. | | On the other hand, this is one of those things (like VR) that | is a distinctly non-Facebook project. It makes no sense to | position or market this as "Facebook" research. The Homepod | isn't called the iPod Home for obvious reasons, so it stands to | reason that Facebook execs realized selling someone a "Facebook | Quest" sounds like a metaphor for ayahuasca. It's not entirely | stupid to rebrand, especially considering how diverse (and | undeniably advanced) they've become in fields like AI and VR. | oefnak wrote: | I actively avoid everything that has anything to do with | Facebook, and I can't be the only one. | smoldesu wrote: | Yeah, me too. I also avoid everything Apple and Google | makes, but I'm not going to pretend like the Alphabet | rebranding is their attempt at hiding who they are. | xiphias2 wrote: | Alphabet wasn't a rebranding: the founding billionaires | got bored of Google, and wanted to take out a few billion | dollars per year out of it to create new toys without | sharing it with Google. | throwaway290 wrote: | Ever used React or PyTorch? Well, this is same. Developers | make good stuff regardless of where they work, and good on | FB for contributing | | But yeah if you do open source adding an element of | corporate branding is a sure way to kill the project. | That's why it's not called "Apple Swift" or "Microsoft | TypeScript". | [deleted] | thanatropism wrote: | I was looking into GPU nearest neighbors libraries today and | turned Faiss down because it said "Facebook". Completely | irrational, I know. | aftbit wrote: | You should use Faiss though, it's good. | renewiltord wrote: | I actually have a much more positive impression of Meta because | of this work. It's hard to describe, but they feel very | competent. My instant reaction to something being by Meta | Research is actually to think it's probably going to be | interesting and good. | eminence32 wrote: | The page says at the very top, in a fixed header that is always | visible (even as you scroll, or browse to other pages): | "Research by Meta AI" | | To me, this feels like they are not avoiding the "Meta" brand | at all. | benatkin wrote: | See my other comment. Of course they needed to have it | somewhere to score points. These probably weren't people who | were about to quit it, probably just with a lowered | perception of it compared to a company people are mostly | proud to work at like Google... | https://news.ycombinator.com/edit?id=35458445 | blululu wrote: | What are you talking about? There is a Meta Logo Favicon, "Meta | AI" appears in the header and "Meta AI" is purposefully | centered in the ABF text. Registering a new domain costs $10 | compared to the massive pain of involving legal with the | permissions to repurpose a new domain. It's a new project so | why not make a clean start and just get a new website instead | of going through the full FB/Meta approval process on branding. | benatkin wrote: | I mentioned the logo. I didn't mention the text because | perhaps they still want to score points for Meta, so hiding | it entirely wouldn't make sense. But they avoid the larger | immediate hangups of the big logo and the domain name. | fortissimohn wrote: | They likely meant that Meta was established in part due to | the Facebook name being tarnished in the first place. | aftbit wrote: | I'm out of the loop, what happened in Myanmar? | aix1 wrote: | From an Amnesty International report: | | Beginning in August 2017, the Myanmar security forces | undertook a brutal campaign of ethnic cleansing against | Rohingya Muslims. This report is based on an in-depth | investigation into Meta (formerly Facebook)'s role in the | serious human rights violations perpetrated against the | Rohingya. Meta's algorithms proactively amplified and | promoted content which incited violence, hatred, and | discrimination against the Rohingya - pouring fuel on the | fire of long-standing discrimination and substantially | increasing the risk of an outbreak of mass violence. The | report concludes that Meta substantially contributed to | adverse human rights impacts suffered by the Rohingya and has | a responsibility to provide survivors with an effective | remedy. | | https://www.amnesty.org/en/documents/ASA16/5933/2022/en/ | | See also | | https://en.wikipedia.org/wiki/Rohingya_genocide | | https://en.wikipedia.org/wiki/Rohingya_genocide#Facebook_con. | .. | iambateman wrote: | Welcome to the wild world of corporate IT. Their VP has | authority to make a new website if she wants, but has to go | through a 3 month vetting process to put on a subdomain. | lacker wrote: | As someone who used to work on Facebook open source, that | makes sense! After all, an insecure subdomain could lead to | all sorts of problems on facebook.com. Phishing, stealing | cookies, there's a lot of ways it could go wrong. | | Whereas, if one engineer spins up some random static open | source documentation website on AWS, it really can't go wrong | in a way that causes trouble for the rest of the company. | benatkin wrote: | Meta isn't a typical corporation, though. Ordinary big | company red tape could have stopped them from indirectly | displacing thousands based on their religion. (That isn't an | outlandish claim but is something they actually got sued for, | though it was dismissed without absolving them of it) | herval wrote: | It very much is a typical big corp, and OP is correct. It's | easier to ship something on a new domain, using AWS and a | bunch of contractors, than to add a subdomain to | facebook.com or some other top-level domain | smoldesu wrote: | Not to mention, the "Ordinary big company red tape" | didn't stop Coca Cola from hiring Colombian death squads, | Nestle from draining the Great Lakes and selling it back | to it's residents, nor Hershey's from making chocolate | from cacao farmed with child slave labor. | | Relative to the rest of FAANG (or even Fortune 500), | Facebook might have the least blood on their hands when | everything is said and done. | killerdhmo wrote: | um... did you sleep through the last 8+ years of | handwringing about election interference, Russian / state | propaganda, live streaming massacres, addiction / mental | health effects of social media, particular for kids? I | can't imagine the other FAANGs come close | smoldesu wrote: | If platforming disinformation and enabling internet | addiction is equivalent to criminal complacency, then | Microsoft, Apple, Amazon and Google all have crimes to | answer for. Facebook has shit the bed more times than | they can count on two hands, but unfortunately that's | kinda the table-stakes in big tech. | reaperman wrote: | Multiple block diagrams and the paper note that one of the inputs | is supposed to be "text", but none of the example Jupyter | notebooks or the live demo page show how to use those. I'm | assuming just run the text into CLIP, take the resulting | embedding, and throw it directly in as a prompt, which then gets | re-encoded by the SAM prompt encoder? | | > "Prompt encoder. We consider two sets of prompts: sparse | (points, boxes, text) and dense (masks). We represent points and | boxes by positional encodings [95] summed with learned embeddings | for each prompt type and free-form text with an off-the-shelf | text encoder from CLIP [82]. Dense prompts (i.e., masks) are | embedded using convolutions and summed element-wise with the | image embedding." | | Edit: Found the answer myself: | https://github.com/facebookresearch/segment-anything/issues/... | [deleted] | AdilZtn wrote: | That's amazing! This model is a huge opportunity to create | annotated data (with decent quality) for just a few dollars. | People will iterate more quickly with this kind of foundation | model. | justinator wrote: | Demo is running slow - cutting out is an impressive ability - I'm | to assume it also fills in the background? If so: that's next | level. Maybe that Photoshop monthly subscription will be worth it | (providing this sort of ability is going to be baked in with | AdobeAI's version soon) | bobsmooth wrote: | Seeing it run on a headset is the coolest part. Lots of | applications for AR. | ren_engineer wrote: | what do you think facebook's gameplan is here? Are they trying to | commoditize AI by releasing this and Llama as a move against | OpenAI, Microsoft, and Google? They had to have known the Llama | weights would be leaked and now they are releasing this | jayd16 wrote: | Well there's some patent offense and defense in making and | releasing research papers. There's some recruiting aspects to | it. Its also a way to commoditize your inverse if you assume | this sort of stuff brings AR and the metaverse closer to reach. | high_derivative wrote: | n=1 (as a mid-profile AI researcher), but for me it's working | in terms of Meta gaining my respect by open sourcing (despite | the licensing disasters). They clearly seem to be more | committed to open source and getting things done now in | general. | dragonwriter wrote: | I think cranking out open source projects like this raises Meta | AI's profile and helps them attract attention and people, and I | don't think selling AI _qua_ AI is their business plan, selling | services built on top is. And commoditized AI means that the AI | vendors don't get to rent-seek on people doing that, whereas | narrowly controlled monopoly /oligopoly AI would mean that the | AI vendors extract the value produced by downstream | applications. | vagabund wrote: | I've always half-believed that the relatively open approach | to industry research in ML was a result of the inherent | compute-based barrier to entry for productizing a lot of the | insights. Collaborating on improving the architectural SoTA | gets the handful of well-capitalized incumbents further ahead | more quickly, and solidifies their ML moat before new | entrants can compete. | | Probably too cynical, but you can potentially view it as a | weak form of collusion under the guise of open research. | dragonwriter wrote: | This particular model has a very low barrier; the model | size is smaller than Stable Diffusion which is running | easily on consumer hardware for inference, though | _training_ is more resource intensive (but not out of reach | of consumers, whether through high-end consumer hardware or | affordable cloud resources.) | | For competitive LLMs targeting text generation, especially | for training, a compute-based barrier is more significant. | vagabund wrote: | Yeah that's fair. I intended my comment to be more of a | reflection on the culture in general, but the motivations | in this instance are probably different. | herval wrote: | Their main use case for these models seems to be AR. Throwing | it out in the open might help getting external entities to | build for them & attract talent, etc. Not sure they're that | strategic but it's my guess | _the_inflator wrote: | I think Meta's gameplan is complex. Inspiration as well as | adoption, not stepping on the toes of regulators prolly another | intention. Have a look at PyTorch for example. Massively | popular ML framework, with its lots of interesting projects | running. | | If Meta frequently shares their "algorithms" they take the | blame out of its usage. After all, who is to blame when | everybody does "it" and you are very open about it. | | Use cases, talent visibility as well as attraction also plays a | role. After all, Google was so fancied, due to its many open | source projects. "Show, don't tell". | geenew wrote: | That's it for me | crakenzak wrote: | This is going along with the new Segment Anything Model paper | Meta AI just released: | | Paper: https://scontent- | sea1-1.xx.fbcdn.net/v/t39.2365-6/10000000_6... | | Announcement: https://ai.facebook.com/blog/segment-anything- | foundation-mod... | | Code & Model Weights: | https://github.com/facebookresearch/segment-anything | lofaszvanitt wrote: | Why can't they give proper filenames to these research papers. | This drives me nuts. | ftxbro wrote: | if Tim Berners-Lee saw that paper link he would have never | allowed the url to be invented | LoganDark wrote: | I'm so shocked by how almost every query parameter is | required and there's even a freaking signature for validating | the URL itself. | | -Emily | jauer wrote: | That paper link is a CDN URL that is dynamically generated to | point to your closest POP when you load the abstract. It will | be different for many people and will break eventually. | | Abstract: | https://ai.facebook.com/research/publications/segment-anythi... | code51 wrote: | It's interesting that (clearly visible) text parts that cannot be | handled properly by most OCR approaches also get left out by SAM | in auto-predictions. | dimatura wrote: | The network architecture and scale don't seem to be a big | departure from recent SOTA, but a pretty massive amount of | labeled data went into it. And it seems to work pretty well! The | browser demo is great. This will probably see a lot of use, | especially considering the liberal licensing. | bjacobt wrote: | I apologize if this is obvious, but are both the model and | checkpoint (as referenced in getting started section in readme) | Apache 2.0? Can it be used for commercial applications? | dimatura wrote: | As far as I can tell, it can. The code itself has a `LICENSE` | file with the Apache license, and the readme says "The model | is licensed under the Apache 2.0 license.". Strangely, the | FAQ in the blog post doesn't address this question, which I | expect will be frequent. | phkahler wrote: | Isn't Apache 2 a free software license without some of the | GPLv3 things some don't like? | | I think a more BSD would be better, or LGPL. Either would | be more business friendly. | MacsHeadroom wrote: | LGPL is not business friendly at all. It's among the | least business friendly licenses there is. Apache 2.0 is | slightly more business friendly than BSD. | | With some caveats, software licenses from most to least | business friendly roughly go: | | Apache > BSD > MIT > MPL > LGPL > GPL > AGPL | kyle-rb wrote: | LGPL is more business friendly than GPL; it's literally | "lesser" GPL. | | You can use LGPL in commercial, closed-source projects as | long as you keep the LGPL code in a separate dynamically | linked library, e.g. a DLL, and provide a way for users | to swap it out for their own patched DLL if they wish. | (Plus some other license terms.) | | Also, you can always use LGPL code under the terms of the | GPL, so there's no way LGPL is more restrictive than GPL. | MacsHeadroom wrote: | You're right, that was a mistake. It's been fixed. LGPL > | GPL | dang wrote: | Related: | | _Meta New Segmentation Model_ - | https://news.ycombinator.com/item?id=35453625 - April 2023 (7 | comments) | richardw wrote: | Surely this changes the security camera game? No more being | fooled by clouds going overhead. | subarctic wrote: | The demo is pretty cool but it looks like you can just select | things and have it highlight them in blue - is there a way to | remove objects from the image and have the background filled in | behind them? | dymk wrote: | You could probably content-aware fill the area that SAM | identifies with another tool | neom wrote: | yikes. I went to film school in the early 2000s and spent hours | and hours on levels/HDR based masking, I've used the adobe tools | recently and they're good... this is... yikes...I wonder how | people in their mid 20s today learning photoshop are going to | deal with their graduating jobs. | jacquesm wrote: | Not. This is homing in on the SF UI that Deckard used in Blade | Runner. | | https://www.youtube.com/watch?v=hHwjceFcF2Q | | All it takes is a couple of tools glued together and you're | getting there. | marstall wrote: | "gimme a hardcopy right there." | arduinomancer wrote: | This exists as a feature on iOS | | You can long press on an image and it cuts out whatever thing it | thinks you're pressing on | | They also use it in interesting ways, like making stuff in the | photo slightly overlap the clock on the lockscreen | | Does anyone know if that works the same way as this? | hbn wrote: | It would still be nice if iOS had some kind of interface like | this where you can nudge it in the right direction if it's | confusing something like a jacket and the background. iOS gives | its best attempt which is usually pretty good, but if it didn't | get it right you're basically SOL. | neom wrote: | This is: understand everything in the image as elements, | subject or whatever. | sashank_1509 wrote: | Extremely impressive system. Blows everything else (including | CLIP from OpenAI) out of the water. We are inching closer to | solving Computer Vision! | wongarsu wrote: | It's really impressive, and better than anything I've seen, but | is it really leagues better than whatever Photoshop is using? | | Of course being on github and permissively license is huge. | vanjajaja1 wrote: | My question exactly, didn't photoshop already solve this like | 5+ years ago? | dymk wrote: | Have you used Photoshop's magic wand tool in the last 5 | years? No, it's nowhere close to this good. | RomanPushkin wrote: | Stalin's dream | https://en.wikipedia.org/wiki/Censorship_of_images_in_the_So... | cloudking wrote: | Pretty cool, Runway has a similar green screening feature that | can 1-click segment a subject from the background across an | entire video: https://runwayml.com/ai-magic-tools/ | minimaxir wrote: | You know an AI project is serious when it has its own domain name | instead of a subdomain. | syrusakbary wrote: | This is awesome. If you try the demo they provide [0], the | inference is handled purely in the client using a ONNX model that | only weights around ~8Mb [1] [2]. | | Really impressive stuff! Congrats to the team that achieved it | | [0] https://segment-anything.com/demo | | [1] https://segment- | anything.com/model/interactive_module_quanti... | | [2] https://segment- | anything.com/model/interactive_module_quanti... | nielsbot wrote: | Sigh. Does not work in Safari on macOS (ARM). Works in Chrome | though. | subarctic wrote: | It seems to work in firefox on macOS (ARM) fwiw | georgelyon wrote: | It seems like the output of this model is masks, but for cropping | you really need to be able to pull partial color out of certain | pixels (for example, pulling a translucent object out from of a | colored background). I tried the demo, and it fails pretty | miserably on a vase. Anyone know of a model that can do this | well? ___________________________________________________________________ (page generated 2023-04-05 23:00 UTC)