hngopher.com

       [HN Gopher] Segment Anything Model (SAM) can "cut out" any objec...
       ___________________________________________________________________
        
       Segment Anything Model (SAM) can "cut out" any object in an image
        
       Author : crakenzak
       Score  : 322 points
       Date   : 2023-04-05 15:11 UTC (7 hours ago)
        
 (HTM) web link (segment-anything.com)
 (TXT) w3m dump (segment-anything.com)
        
       | swframe2 wrote:
       | The best solution I've seen is https://github.com/xuebinqin/DIS.
       | You should try the DIS example images at the SAM site.
       | 
       | The main issue I have with DIS is that creating the labels of my
       | own dataset is super expensive (I think it might be easier to
       | generate the training data using stable diffusion rather than
       | human labelling)
        
       | jonplackett wrote:
       | This would make a great input for ControlNet
        
       | fzliu wrote:
       | Computer vision seems to be gravitating heavily towards self-
       | attention. While the results here are impressive, I'm not quite
       | convinced that vision encoders are the right way forward. I just
       | can't wrap my head around how discretizing images, which are
       | continuous in two dimensions, into patches is the most optimal
       | way to do visual recognition.
       | 
       | What's preventing us from taking something like convnext or a
       | hybrid conv/attention model and hooking that up to a decoder
       | stack? I fee like the results would be similar if not better.
       | 
       | EDIT: Clarifying that encoder/decoder refers to the transformer
       | stack, not an autoencoder.
        
         | neodypsis wrote:
         | > What's preventing us from taking something like convnext or a
         | hybrid conv/attention model and hooking that up to a decoder? I
         | fee like the results would be similar if not better.
         | 
         | You mean like in an U-Net architecture?
        
       | yeldarb wrote:
       | Wow, this is pretty epic. I put it through its paces on a pretty
       | wide variety of images that have tripped up recent zero-shot
       | models[1] and am thoroughly impressed.
       | 
       | We have a similar "Smart Polygon" tool[2] built into Roboflow but
       | this is next level. Having the model running in the browser makes
       | it so much more fun to use. Stoked it's open source; we're going
       | to work on adding it to our annotation tool ASAP.
       | 
       | [1] Some examples from Open Flamingo last week
       | https://news.ycombinator.com/item?id=35348500
       | 
       | [2] https://blog.roboflow.com/automated-polygon-labeling-
       | compute...
        
       | syntheweave wrote:
       | Finally, I'll be able to fill line art with flat colors without
       | fussing around with thresholds and painting in boundaries.
       | 
       | (It does have difficulty finding the smallest possible area, but
       | it's a significant advance over most existing options since in my
       | brief test, it can usually spot the entire silhouette of figures,
       | which is where painting a boundary is most tedious).
        
       | MrGando wrote:
       | This is great, runs very fast in Chrome for me.
        
       | benatkin wrote:
       | They seem to avoid using their own brand a lot. They have a
       | zillion domain names and they register a new one and don't use
       | the logo except in the favicon and footer. I've seen similar
       | stuff including divesting OSS projects like PyTorch and GraphQL
       | which Google wouldn't. To me that's tacit admission that the
       | Facebook and Meta names are tarnished. And they are, by the
       | content they showed users in Myanmar with the algorithmic feed,
       | and by Cambridge Analytica. Maybe the whole "Meta" name is no
       | different from the rebranding of Philip Morris.
        
         | smoldesu wrote:
         | On the one hand, sure. Facebook's brand is about as hip as a
         | bag of Werther's Originals.
         | 
         | On the other hand, this is one of those things (like VR) that
         | is a distinctly non-Facebook project. It makes no sense to
         | position or market this as "Facebook" research. The Homepod
         | isn't called the iPod Home for obvious reasons, so it stands to
         | reason that Facebook execs realized selling someone a "Facebook
         | Quest" sounds like a metaphor for ayahuasca. It's not entirely
         | stupid to rebrand, especially considering how diverse (and
         | undeniably advanced) they've become in fields like AI and VR.
        
           | oefnak wrote:
           | I actively avoid everything that has anything to do with
           | Facebook, and I can't be the only one.
        
             | smoldesu wrote:
             | Yeah, me too. I also avoid everything Apple and Google
             | makes, but I'm not going to pretend like the Alphabet
             | rebranding is their attempt at hiding who they are.
        
               | xiphias2 wrote:
               | Alphabet wasn't a rebranding: the founding billionaires
               | got bored of Google, and wanted to take out a few billion
               | dollars per year out of it to create new toys without
               | sharing it with Google.
        
             | throwaway290 wrote:
             | Ever used React or PyTorch? Well, this is same. Developers
             | make good stuff regardless of where they work, and good on
             | FB for contributing
             | 
             | But yeah if you do open source adding an element of
             | corporate branding is a sure way to kill the project.
             | That's why it's not called "Apple Swift" or "Microsoft
             | TypeScript".
        
           | [deleted]
        
         | thanatropism wrote:
         | I was looking into GPU nearest neighbors libraries today and
         | turned Faiss down because it said "Facebook". Completely
         | irrational, I know.
        
           | aftbit wrote:
           | You should use Faiss though, it's good.
        
         | renewiltord wrote:
         | I actually have a much more positive impression of Meta because
         | of this work. It's hard to describe, but they feel very
         | competent. My instant reaction to something being by Meta
         | Research is actually to think it's probably going to be
         | interesting and good.
        
         | eminence32 wrote:
         | The page says at the very top, in a fixed header that is always
         | visible (even as you scroll, or browse to other pages):
         | "Research by Meta AI"
         | 
         | To me, this feels like they are not avoiding the "Meta" brand
         | at all.
        
           | benatkin wrote:
           | See my other comment. Of course they needed to have it
           | somewhere to score points. These probably weren't people who
           | were about to quit it, probably just with a lowered
           | perception of it compared to a company people are mostly
           | proud to work at like Google...
           | https://news.ycombinator.com/edit?id=35458445
        
         | blululu wrote:
         | What are you talking about? There is a Meta Logo Favicon, "Meta
         | AI" appears in the header and "Meta AI" is purposefully
         | centered in the ABF text. Registering a new domain costs $10
         | compared to the massive pain of involving legal with the
         | permissions to repurpose a new domain. It's a new project so
         | why not make a clean start and just get a new website instead
         | of going through the full FB/Meta approval process on branding.
        
           | benatkin wrote:
           | I mentioned the logo. I didn't mention the text because
           | perhaps they still want to score points for Meta, so hiding
           | it entirely wouldn't make sense. But they avoid the larger
           | immediate hangups of the big logo and the domain name.
        
           | fortissimohn wrote:
           | They likely meant that Meta was established in part due to
           | the Facebook name being tarnished in the first place.
        
         | aftbit wrote:
         | I'm out of the loop, what happened in Myanmar?
        
           | aix1 wrote:
           | From an Amnesty International report:
           | 
           | Beginning in August 2017, the Myanmar security forces
           | undertook a brutal campaign of ethnic cleansing against
           | Rohingya Muslims. This report is based on an in-depth
           | investigation into Meta (formerly Facebook)'s role in the
           | serious human rights violations perpetrated against the
           | Rohingya. Meta's algorithms proactively amplified and
           | promoted content which incited violence, hatred, and
           | discrimination against the Rohingya - pouring fuel on the
           | fire of long-standing discrimination and substantially
           | increasing the risk of an outbreak of mass violence. The
           | report concludes that Meta substantially contributed to
           | adverse human rights impacts suffered by the Rohingya and has
           | a responsibility to provide survivors with an effective
           | remedy.
           | 
           | https://www.amnesty.org/en/documents/ASA16/5933/2022/en/
           | 
           | See also
           | 
           | https://en.wikipedia.org/wiki/Rohingya_genocide
           | 
           | https://en.wikipedia.org/wiki/Rohingya_genocide#Facebook_con.
           | ..
        
         | iambateman wrote:
         | Welcome to the wild world of corporate IT. Their VP has
         | authority to make a new website if she wants, but has to go
         | through a 3 month vetting process to put on a subdomain.
        
           | lacker wrote:
           | As someone who used to work on Facebook open source, that
           | makes sense! After all, an insecure subdomain could lead to
           | all sorts of problems on facebook.com. Phishing, stealing
           | cookies, there's a lot of ways it could go wrong.
           | 
           | Whereas, if one engineer spins up some random static open
           | source documentation website on AWS, it really can't go wrong
           | in a way that causes trouble for the rest of the company.
        
           | benatkin wrote:
           | Meta isn't a typical corporation, though. Ordinary big
           | company red tape could have stopped them from indirectly
           | displacing thousands based on their religion. (That isn't an
           | outlandish claim but is something they actually got sued for,
           | though it was dismissed without absolving them of it)
        
             | herval wrote:
             | It very much is a typical big corp, and OP is correct. It's
             | easier to ship something on a new domain, using AWS and a
             | bunch of contractors, than to add a subdomain to
             | facebook.com or some other top-level domain
        
               | smoldesu wrote:
               | Not to mention, the "Ordinary big company red tape"
               | didn't stop Coca Cola from hiring Colombian death squads,
               | Nestle from draining the Great Lakes and selling it back
               | to it's residents, nor Hershey's from making chocolate
               | from cacao farmed with child slave labor.
               | 
               | Relative to the rest of FAANG (or even Fortune 500),
               | Facebook might have the least blood on their hands when
               | everything is said and done.
        
               | killerdhmo wrote:
               | um... did you sleep through the last 8+ years of
               | handwringing about election interference, Russian / state
               | propaganda, live streaming massacres, addiction / mental
               | health effects of social media, particular for kids? I
               | can't imagine the other FAANGs come close
        
               | smoldesu wrote:
               | If platforming disinformation and enabling internet
               | addiction is equivalent to criminal complacency, then
               | Microsoft, Apple, Amazon and Google all have crimes to
               | answer for. Facebook has shit the bed more times than
               | they can count on two hands, but unfortunately that's
               | kinda the table-stakes in big tech.
        
       | reaperman wrote:
       | Multiple block diagrams and the paper note that one of the inputs
       | is supposed to be "text", but none of the example Jupyter
       | notebooks or the live demo page show how to use those. I'm
       | assuming just run the text into CLIP, take the resulting
       | embedding, and throw it directly in as a prompt, which then gets
       | re-encoded by the SAM prompt encoder?
       | 
       | > "Prompt encoder. We consider two sets of prompts: sparse
       | (points, boxes, text) and dense (masks). We represent points and
       | boxes by positional encodings [95] summed with learned embeddings
       | for each prompt type and free-form text with an off-the-shelf
       | text encoder from CLIP [82]. Dense prompts (i.e., masks) are
       | embedded using convolutions and summed element-wise with the
       | image embedding."
       | 
       | Edit: Found the answer myself:
       | https://github.com/facebookresearch/segment-anything/issues/...
        
         | [deleted]
        
       | AdilZtn wrote:
       | That's amazing! This model is a huge opportunity to create
       | annotated data (with decent quality) for just a few dollars.
       | People will iterate more quickly with this kind of foundation
       | model.
        
       | justinator wrote:
       | Demo is running slow - cutting out is an impressive ability - I'm
       | to assume it also fills in the background? If so: that's next
       | level. Maybe that Photoshop monthly subscription will be worth it
       | (providing this sort of ability is going to be baked in with
       | AdobeAI's version soon)
        
       | bobsmooth wrote:
       | Seeing it run on a headset is the coolest part. Lots of
       | applications for AR.
        
       | ren_engineer wrote:
       | what do you think facebook's gameplan is here? Are they trying to
       | commoditize AI by releasing this and Llama as a move against
       | OpenAI, Microsoft, and Google? They had to have known the Llama
       | weights would be leaked and now they are releasing this
        
         | jayd16 wrote:
         | Well there's some patent offense and defense in making and
         | releasing research papers. There's some recruiting aspects to
         | it. Its also a way to commoditize your inverse if you assume
         | this sort of stuff brings AR and the metaverse closer to reach.
        
         | high_derivative wrote:
         | n=1 (as a mid-profile AI researcher), but for me it's working
         | in terms of Meta gaining my respect by open sourcing (despite
         | the licensing disasters). They clearly seem to be more
         | committed to open source and getting things done now in
         | general.
        
         | dragonwriter wrote:
         | I think cranking out open source projects like this raises Meta
         | AI's profile and helps them attract attention and people, and I
         | don't think selling AI _qua_ AI is their business plan, selling
         | services built on top is. And commoditized AI means that the AI
         | vendors don't get to rent-seek on people doing that, whereas
         | narrowly controlled monopoly /oligopoly AI would mean that the
         | AI vendors extract the value produced by downstream
         | applications.
        
           | vagabund wrote:
           | I've always half-believed that the relatively open approach
           | to industry research in ML was a result of the inherent
           | compute-based barrier to entry for productizing a lot of the
           | insights. Collaborating on improving the architectural SoTA
           | gets the handful of well-capitalized incumbents further ahead
           | more quickly, and solidifies their ML moat before new
           | entrants can compete.
           | 
           | Probably too cynical, but you can potentially view it as a
           | weak form of collusion under the guise of open research.
        
             | dragonwriter wrote:
             | This particular model has a very low barrier; the model
             | size is smaller than Stable Diffusion which is running
             | easily on consumer hardware for inference, though
             | _training_ is more resource intensive (but not out of reach
             | of consumers, whether through high-end consumer hardware or
             | affordable cloud resources.)
             | 
             | For competitive LLMs targeting text generation, especially
             | for training, a compute-based barrier is more significant.
        
               | vagabund wrote:
               | Yeah that's fair. I intended my comment to be more of a
               | reflection on the culture in general, but the motivations
               | in this instance are probably different.
        
         | herval wrote:
         | Their main use case for these models seems to be AR. Throwing
         | it out in the open might help getting external entities to
         | build for them & attract talent, etc. Not sure they're that
         | strategic but it's my guess
        
         | _the_inflator wrote:
         | I think Meta's gameplan is complex. Inspiration as well as
         | adoption, not stepping on the toes of regulators prolly another
         | intention. Have a look at PyTorch for example. Massively
         | popular ML framework, with its lots of interesting projects
         | running.
         | 
         | If Meta frequently shares their "algorithms" they take the
         | blame out of its usage. After all, who is to blame when
         | everybody does "it" and you are very open about it.
         | 
         | Use cases, talent visibility as well as attraction also plays a
         | role. After all, Google was so fancied, due to its many open
         | source projects. "Show, don't tell".
        
       | geenew wrote:
       | That's it for me
        
       | crakenzak wrote:
       | This is going along with the new Segment Anything Model paper
       | Meta AI just released:
       | 
       | Paper: https://scontent-
       | sea1-1.xx.fbcdn.net/v/t39.2365-6/10000000_6...
       | 
       | Announcement: https://ai.facebook.com/blog/segment-anything-
       | foundation-mod...
       | 
       | Code & Model Weights:
       | https://github.com/facebookresearch/segment-anything
        
         | lofaszvanitt wrote:
         | Why can't they give proper filenames to these research papers.
         | This drives me nuts.
        
         | ftxbro wrote:
         | if Tim Berners-Lee saw that paper link he would have never
         | allowed the url to be invented
        
           | LoganDark wrote:
           | I'm so shocked by how almost every query parameter is
           | required and there's even a freaking signature for validating
           | the URL itself.
           | 
           | -Emily
        
         | jauer wrote:
         | That paper link is a CDN URL that is dynamically generated to
         | point to your closest POP when you load the abstract. It will
         | be different for many people and will break eventually.
         | 
         | Abstract:
         | https://ai.facebook.com/research/publications/segment-anythi...
        
       | code51 wrote:
       | It's interesting that (clearly visible) text parts that cannot be
       | handled properly by most OCR approaches also get left out by SAM
       | in auto-predictions.
        
       | dimatura wrote:
       | The network architecture and scale don't seem to be a big
       | departure from recent SOTA, but a pretty massive amount of
       | labeled data went into it. And it seems to work pretty well! The
       | browser demo is great. This will probably see a lot of use,
       | especially considering the liberal licensing.
        
         | bjacobt wrote:
         | I apologize if this is obvious, but are both the model and
         | checkpoint (as referenced in getting started section in readme)
         | Apache 2.0? Can it be used for commercial applications?
        
           | dimatura wrote:
           | As far as I can tell, it can. The code itself has a `LICENSE`
           | file with the Apache license, and the readme says "The model
           | is licensed under the Apache 2.0 license.". Strangely, the
           | FAQ in the blog post doesn't address this question, which I
           | expect will be frequent.
        
             | phkahler wrote:
             | Isn't Apache 2 a free software license without some of the
             | GPLv3 things some don't like?
             | 
             | I think a more BSD would be better, or LGPL. Either would
             | be more business friendly.
        
               | MacsHeadroom wrote:
               | LGPL is not business friendly at all. It's among the
               | least business friendly licenses there is. Apache 2.0 is
               | slightly more business friendly than BSD.
               | 
               | With some caveats, software licenses from most to least
               | business friendly roughly go:
               | 
               | Apache > BSD > MIT > MPL > LGPL > GPL > AGPL
        
               | kyle-rb wrote:
               | LGPL is more business friendly than GPL; it's literally
               | "lesser" GPL.
               | 
               | You can use LGPL in commercial, closed-source projects as
               | long as you keep the LGPL code in a separate dynamically
               | linked library, e.g. a DLL, and provide a way for users
               | to swap it out for their own patched DLL if they wish.
               | (Plus some other license terms.)
               | 
               | Also, you can always use LGPL code under the terms of the
               | GPL, so there's no way LGPL is more restrictive than GPL.
        
               | MacsHeadroom wrote:
               | You're right, that was a mistake. It's been fixed. LGPL >
               | GPL
        
       | dang wrote:
       | Related:
       | 
       |  _Meta New Segmentation Model_ -
       | https://news.ycombinator.com/item?id=35453625 - April 2023 (7
       | comments)
        
       | richardw wrote:
       | Surely this changes the security camera game? No more being
       | fooled by clouds going overhead.
        
       | subarctic wrote:
       | The demo is pretty cool but it looks like you can just select
       | things and have it highlight them in blue - is there a way to
       | remove objects from the image and have the background filled in
       | behind them?
        
         | dymk wrote:
         | You could probably content-aware fill the area that SAM
         | identifies with another tool
        
       | neom wrote:
       | yikes. I went to film school in the early 2000s and spent hours
       | and hours on levels/HDR based masking, I've used the adobe tools
       | recently and they're good... this is... yikes...I wonder how
       | people in their mid 20s today learning photoshop are going to
       | deal with their graduating jobs.
        
         | jacquesm wrote:
         | Not. This is homing in on the SF UI that Deckard used in Blade
         | Runner.
         | 
         | https://www.youtube.com/watch?v=hHwjceFcF2Q
         | 
         | All it takes is a couple of tools glued together and you're
         | getting there.
        
           | marstall wrote:
           | "gimme a hardcopy right there."
        
       | arduinomancer wrote:
       | This exists as a feature on iOS
       | 
       | You can long press on an image and it cuts out whatever thing it
       | thinks you're pressing on
       | 
       | They also use it in interesting ways, like making stuff in the
       | photo slightly overlap the clock on the lockscreen
       | 
       | Does anyone know if that works the same way as this?
        
         | hbn wrote:
         | It would still be nice if iOS had some kind of interface like
         | this where you can nudge it in the right direction if it's
         | confusing something like a jacket and the background. iOS gives
         | its best attempt which is usually pretty good, but if it didn't
         | get it right you're basically SOL.
        
         | neom wrote:
         | This is: understand everything in the image as elements,
         | subject or whatever.
        
       | sashank_1509 wrote:
       | Extremely impressive system. Blows everything else (including
       | CLIP from OpenAI) out of the water. We are inching closer to
       | solving Computer Vision!
        
         | wongarsu wrote:
         | It's really impressive, and better than anything I've seen, but
         | is it really leagues better than whatever Photoshop is using?
         | 
         | Of course being on github and permissively license is huge.
        
           | vanjajaja1 wrote:
           | My question exactly, didn't photoshop already solve this like
           | 5+ years ago?
        
             | dymk wrote:
             | Have you used Photoshop's magic wand tool in the last 5
             | years? No, it's nowhere close to this good.
        
       | RomanPushkin wrote:
       | Stalin's dream
       | https://en.wikipedia.org/wiki/Censorship_of_images_in_the_So...
        
       | cloudking wrote:
       | Pretty cool, Runway has a similar green screening feature that
       | can 1-click segment a subject from the background across an
       | entire video: https://runwayml.com/ai-magic-tools/
        
       | minimaxir wrote:
       | You know an AI project is serious when it has its own domain name
       | instead of a subdomain.
        
       | syrusakbary wrote:
       | This is awesome. If you try the demo they provide [0], the
       | inference is handled purely in the client using a ONNX model that
       | only weights around ~8Mb [1] [2].
       | 
       | Really impressive stuff! Congrats to the team that achieved it
       | 
       | [0] https://segment-anything.com/demo
       | 
       | [1] https://segment-
       | anything.com/model/interactive_module_quanti...
       | 
       | [2] https://segment-
       | anything.com/model/interactive_module_quanti...
        
       | nielsbot wrote:
       | Sigh. Does not work in Safari on macOS (ARM). Works in Chrome
       | though.
        
         | subarctic wrote:
         | It seems to work in firefox on macOS (ARM) fwiw
        
       | georgelyon wrote:
       | It seems like the output of this model is masks, but for cropping
       | you really need to be able to pull partial color out of certain
       | pixels (for example, pulling a translucent object out from of a
       | colored background). I tried the demo, and it fails pretty
       | miserably on a vase. Anyone know of a model that can do this
       | well?
        
       ___________________________________________________________________
       (page generated 2023-04-05 23:00 UTC)