[HN Gopher] Meta's new AI image generator was trained on 1.1B In... ___________________________________________________________________ Meta's new AI image generator was trained on 1.1B Instagram and FB photos Author : my12parsecs Score : 201 points Date : 2023-12-07 14:57 UTC (8 hours ago) (HTM) web link (arstechnica.com) (TXT) w3m dump (arstechnica.com) | TheCoreh wrote: | "Not available in your location | | Imagine with Meta Al isn't available in your location yet. You | can learn more about Al at Meta in the meantime and try again | soon." | | I wonder why it's region-locked? | philipov wrote: | Which region is locked? That might give a clue. | RowanH wrote: | New Zealand is locked out. (Normally we get first dibs on | things being a small test market) | avallach wrote: | I got the same from the Netherlands | K5EiS wrote: | Norway is blocked, so probably some GDPR issues. | fallensatan wrote: | Canada seems to be locked out as well. | philipov wrote: | Is there anyone outside the US that _isn 't_ locked out, or | was this a US-only release? Could this possibly have to do | with the sanctions on China? | TheCoreh wrote: | Brazil. So it's unlikely to be GDPR-related, unless they're | also treating our LGPD as a special case. | lxgr wrote: | Meta's AI stickers also only seem to be available in the US for | now (or at least not in WhatsApp in the EU). | mvdtnz wrote: | AI stickers are in my region (not USA) but imagine is not. | xnx wrote: | Link to tool: https://imagine.meta.com/ | | One of many AI updates from Meta yesterday: | https://about.fb.com/news/2023/12/meta-ai-updates/#:~:text=E... | floathub wrote: | Note that you need to "Log On" to | Facebook/Meta/WhateverTheyCallThemselvesNow to try it. Kind of | curious, but not curious enough to create yet another burner | Facebook account. | | [edit: still learning to spell] | theonlybutlet wrote: | Thanks I should've read your post before opening the link and | promptly having to close it. | misja111 wrote: | "Not available in your location yet" (Switzerland) | JumpCrisscross wrote: | > _" Not available in your location yet" (Switzerland)_ | | Have the GDPR questions around data provenance been resolved? | I thought EU/EEA is currently off limits for publicly- or | user-data-trained AI. | lxgr wrote: | ChatGPT (free and paid) are available in the EU, so I don't | think there is a blanket ban. | | Different companies might have very different | interpretations of the legality of what they're doing, of | course. I don't think there's any precedent, and no | explicit regulations - there's an "AI act" being currently | discussed in parliament, though. | mvdtnz wrote: | Not available in my region (New Zealand), darn. | tikkun wrote: | I tried it now. | | My experience: | | Took 4 minutes to log in and do one generation. (Login to FB, | then it took me through a process to merge accounts with Meta, | which didn't sound good, so I restarted with 'sign in via | email' which ended up doing the same thing anyway, I think. | Then I was logged in, did the generation.) | | My at a glance is that it's: | | For image quality | | 1. Midjourney | | 2. Dall e 3 | | 3. SDXL and this | | For overall ease of use and convenience | | 1. Dall e 3 | | 2. Midjourney | | Of course, this is all biased personal opinion, and YMMV. | whywhywhywhy wrote: | Depends what you want really, Midjourney and Dall-E 3 have | specific looks to them which kind of look cheap/tacky now its | everywhere. | | SDXL is reconfigurable and completely flexible so really its | the only tool in the game for pure creativity. | brcmthrowaway wrote: | What is the best tool wrapping SDXL | danielbln wrote: | There is no best, it depends on your usecase. Auto1111 is | popular, ComfyUI extremely flexible but complex, and | there is a myriad of other wrappers, some with a focus on | simplicity, some not so much. | loudmax wrote: | Depends what you mean by "best", but Fooocus is very | accessible for getting started with Stable Diffusion. | Ologn wrote: | I find Automatic1111 better for point and click | simplicity. ComfyUI has been good for custom flows. | | Also Automatic1111 is more centralized, so you have to | wait for something to make its way in (or a pull request | for it anyhow), whereas people put up their ComfyUI | custom JSON workflows. So I am doing Stable Diffusion | video via ComfyUI right now, whereas it has not made its | way into Automatic1111. | cowboyscott wrote: | Is training with user-generated content a way to launder | copyrighted images? That is, if I upload an image of Ironman or | whatever to my Facebook or Instagram page as a public post and | Meta trains their model on that data, is there wording in my user | agreement that says that I declare that I own the content, which | then gives Meta plausible deniability when it comes to training | with copyrighted material? | | (apologies for the run-on sentence - it is early still) | onlyrealcuzzo wrote: | > Is training with user-generated content a way to launder | copyrighted images? | | Doubt it. If you upload child porn to Instagram and they | distribute it - it's still an Instagram problem, AFAIK. | dragonwriter wrote: | Child porn is not a copyright issue, so the DMCA safe harbor | for UGC doesn't apply, and its criminal, so the Section 230 | safe harbor doesn't apply, so its very much not an applicable | example as to whether use of UGC in other contexts is a way | of leveraging safe harbor protections for content, whether | for copyright or more generally. | onlyrealcuzzo wrote: | It's still an Instagram problem if someone uploads | copyrighted info and Instagram distributes it... | fragmede wrote: | As long as Instagram follows the DMCA and takes it down, | they're covered by Section 230, do I don't know if it's a | problem per se. | whywhywhywhy wrote: | It literally/legally isn't and is one of the reasons US | is king for hosting services like IG. Read Section 230. | glimshe wrote: | I think Meta is already assuming that there will be no | liability for training with copyrighted material. I find it | very unlikely that image owners will win the AI training | battle. | lxgr wrote: | I'd be extremely surprised if the "Mickey Mouse standing on | the moon" example image was a legitimate way to "launder | copyright". | | The interesting question is just who will be liable for the | copyright violation: The party that hosts the AI service? The | party that trained it on copyrighted images? The user | entering a prompt? The (possibly different) user publishing | the resulting image? | darkwraithcov wrote: | MM will be public domain in Jan. | liotier wrote: | Some early versions will. | JohnFen wrote: | Unless Disney can engineer _yet another_ oppressive | extension to copyright durations. | RcouF1uZ4gsC wrote: | Not going to happen. | | When Disney did their copyright extension last time, they | had bipartisan influence. | | Now Disney is in the middle of the culture war, and there | is no Republican that will risk being primaried to | support Disney. | | Given that you de facto need 60 votes in the Senate, it | is not happening. | JohnFen wrote: | I guess that's some sort of silver lining to the state of | things today! | alphabettsy wrote: | Still protected by trademark depending on how it's used. | andreasmetsala wrote: | Only the first movie, the trademark is not expiring. | ryoshu wrote: | I can draw as many Disney characters as I want to and | Disney has no recourse as long as I'm not publishing them | somewhere. | JohnFen wrote: | Posting them on IG, Facebook, etc. is publishing them. | airstrike wrote: | Yes, but importantly, generating them with the AI trained | on Mickey is not. | JohnFen wrote: | True. This is why I think it's pointless to try to use | copyright law to defend yourself against AI companies. | Right now, anyway, I don't see any law (or any other | mechanism) that provides any protection. If I did, I | wouldn't have had to remove all of my websites from the | public web. | slaymaker1907 wrote: | But is publishing a model which can generate images of | Mickey a copyright violation? It's definitely a violation | if the model is overfitted to the extent that you can, | perhaps lossily, extract the original images. | airstrike wrote: | > But is publishing a model which can generate images of | Mickey a copyright violation? | | Is selling colored pencils that can draw images of Mickey | a copyright violation? | | The way I see it, the tool can't ever be at fault for its | use, unless its sole use (or something close enough to | its sole use) is to infringe in copyright. | | Besides, the safeguarding of copyright isn't the single | variable we as a society should be solving for. General | global productivity is way more valuable than | guaranteeing Disney's bottom line. | ska wrote: | > It's definitely a violation if | | That is certainly not clear, unless its only purpose was | to do that. | JohnFen wrote: | > But is publishing a model which can generate images of | Mickey a copyright violation? | | I don't think that courts have ruled on that specifically | (yet), but I seriously doubt that it would be. Taking the | image of Mickey and distributing it would certainly be, | though. | CharlesW wrote: | Clearly you're still living in a pre-Neuralink(tm) world. | beAbU wrote: | You can't make revenue off those drawings. An AI | generator will presumably make money off generating | content that violates copyright. | glimshe wrote: | Here the problem isn't that the AI was trained on Mickey, | but that it generated Mickey. The _generated images_ can | still violate copyright if too similar to copyrighted | artwork - if published. | | I think AI companies are working hard on preventing | generated images from being similar to training images | unless the user very explicitly asks the result to look | like some well known image/character. | alphabettsy wrote: | It can violate copyright, but as or equally important, | companies have trademark protection on their characters | and symbols. | JAlexoid wrote: | You can violate copyright by intentionally drawing Mickey | Mouse, the medium of drawing is not relevant (AI can be | considered a medium, as much as a digital camera is a | medium) | ska wrote: | > The interesting question is just who will be liable for | the copyright violation | | I don't think this is going to be hard for courts. If you | borrow your friends copy of a copyright text, got to kinkos | and duplicate it, then distribute the results - you are the | one violating copyright, not your friend or kinkos. | | The same will hold here I think, mutatis mutandis. This is | all completely separable from the training issue. | __loam wrote: | The person getting sued there would be the user of the | model, not meta, as much as I wish that wasn't how it is. | If you use photoshop to infringe on copyright, you're at | fault, not Adobe. | codingdave wrote: | It is in big bold letters right in instagram's terms of | service: "We do not claim ownership of your content, but you | grant us a license to use it." | | This isn't about copyright, it is about the fact that most | people don't realize that by posting photos, they are | licensing those photos. | glimshe wrote: | A lot of the content posted there isn't owned by the people | who post it, that's a big part of the problem. | __loam wrote: | Ultra shitty corporate interests win again... | gumby wrote: | I don't agree in this case. Well, maybe I agree on the | ultra shitty corporate part. But these are public photos, | and if I'd looked at one it could have some influence, | probably tiny, on my own drawings. Seems reasonable that | the same would be true of my tools. | | If they were scanning my private messages, things would be | different. | thfuran wrote: | So you think a model trained on only a single copyrighted | image would be a violation but one trained on many | copyrighted images isn't? | sp332 wrote: | They don't _own_ the copyright, but they do have a "non- | exclusive, royalty-free, transferable, sub-licensable, | worldwide license to host, use, distribute, modify, run, copy, | publicly perform or display, translate, and create derivative | works". https://www.facebook.com/help/instagram/478745558852511 | bee_rider wrote: | They user might upload something that they don't have rights | to. | | Technically the user is the one misbehaving, but we, | Facebook, and any reasonable court know that users are doing | that. | JAlexoid wrote: | That's why there is a safe harbor provision in DMCA. | grogenaut wrote: | Does that provision allow them to build derivative works, | when they get a dmca request do they retrain the AI after | removing the copyrighted work? | luma wrote: | Copyright law as it exists today allows one to create | transformative works. There is little to suggest that an | AI trained on copyrighted works is in any way violating | that copyright when inference is run. | Bjartr wrote: | If they didn't have that (or something similar) they couldn't | serve the image to other users. Well, they could, but without | something like that someone will sue them for showing a | picture they uploaded to someone they didn't want to see it | (or any number of other gotchas). | | They store the image or video (host/copy), distribute it over | their network and to users (use/run), they resize it and | change the image format (modify/translate), their site then | shows it to the user (display/derivative work), and they | can't control the setting in which a user might choose to | pull up an image they have access to (the "publically" | caveat) | | It sounds like a lot, but AFAIK that's what that clause | covers and why it's necessary for any site like them. | thfuran wrote: | It certainly does cover the needs of hosting and display to | other users, but it doesn't permit just that. It's | expansive enough to let them do just about anything they | could imagine with the pictures. | sosodev wrote: | It seems like this is still very much a legal gray area. If | it's concretely decided in court that generative AI cannot | produce copyrighted work then I assume it makes no difference | what the source of the copyrighted training material was. | KaiserPro wrote: | When an image us uploaded is it re-licensed: > | When you share, post, or upload content that is covered by | intellectual property rights (like photos or videos) on or in | connection with our Service, you hereby grant to us a non- | exclusive, royalty-free, transferable, sub-licensable, | worldwide license to host, use, distribute, modify, run, copy, | publicly perform or display, translate, and create derivative | works of your content (consistent with your privacy and | application settings). This license will end when your content | is deleted from our systems. You can delete content | individually or all at once by deleting your account. | ROFISH wrote: | So if you delete your image the entire trained data set is | invalid because they no longer have license to the copyright? | KaiserPro wrote: | Now that is a multi-million dollar question. | | How derived data is handled after copyright is revoked is a | question thats hard to answer. | | I suspect that the data will be deleted from the dataset, | and any new models will not contain derivatives from that | image. | | How legal that is, is expensive to find out. I suspect | you'd need to prove that your image had been used, and that | it's use contradicts the license that was granted. It would | take a lot of lawyer and court time to find out. (I'm not a | lawyer, so there might already be case history here. I'm | just a systadmin who's looking after datasets. ) | | postscript: something something GDPR. There are rules about | processed data, but I can't remember the specifics. There | are caveats about "reasonable" | grogenaut wrote: | s/m/tr/ | notatallshaw wrote: | If having copyright were a prerequisite of training data | this would be true. | | But in the US this hasn't been tested in the courts yet, | and there's reason to think from precedent this legal | argument might not hold | (https://www.youtube.com/watch?v=G08hY8dSrUY - sorry don't | have a written version of this). | | And the lawsuits so far aren't fairing well for those who | think training should require having copyright | (https://www.hollywoodreporter.com/business/business- | news/sar...) | JAlexoid wrote: | I would imagine if we use a very strict interpretation of | copyright, then things like satire or fan-fiction and | fan-art would be in jeopardy. | | As well as learning, as a whole. | | Unless there is literally a substantial copy of some | particular piece of copyrighted material, it seems to be | a massive hurdle to prove that analyzing something is | copyright infringement. | kjkjadksj wrote: | The difference is when writing satire its not strictly | necessary to possess the work to do so. You can merely | hear of something and make a joke or a fake story. | Training data on the other hand uses the actual material | not some derivative you gleamed from a thousand overheard | conversations. | slaymaker1907 wrote: | Most people in the fanfiction community recognize that | it's probably not strictly allowed under copyright. | However, the community response has generally been to do | it anyway and try to respect the wishes of the author. | Hence why you won't find Interview with a Vampire | fanfiction on the major sites. | | If anything, I think that severely hinders the pro-AI | argument if fanfiction made by human authors are also | bound by copyright. | | ETA: I just tested it out and you can totally create | Interview with a Vampire fanfiction with Bing Compose. | That presumably is subject to at least as strong | copyright as human authors and is thus a copyright | violation. | dragonwriter wrote: | > So if you delete your image the entire trained data set | is invalid because they no longer have license to the | copyright? | | The portion of the _training set_ might. The actual | _trained result_ -- the outcome of a use under the license | -- would, at least arguably, not. | | Of course, that's also before the whole "training is fair | use and doesn't require a license" issue is considered, | which if it is correct renders the entire issue moot -- in | that case, using anything you have access to for training, | irrespective of license, is fine. | panarky wrote: | Let's say you post an image, and I learn something by | viewing it, then you delete the image. Is my memory of your | now deleted image wiped along with everything I learned | from viewing it? | carstenhag wrote: | Yeah, derative works in this case afaik was always be meant | as "we can generate thumbnails etc" and not "we will train | our AI with it". I am pretty sure this is illegal in many | countries... | raincole wrote: | At this point all big players assume it's okay to train on | copyrighted materials. | | If you can[0]crawl materials from other sites, why can't you | crawl from your own site? | | [0]: "can" in quotes | carstenhag wrote: | Because your users have agreed to terms of service that don't | mention analyzing the images to train an AI model. | PeterisP wrote: | If their legal assumption is it's not a copyright violation | to train a model on some image, then it's logical that | their ToS doesn't mention it, as they need the user's | permission only for the scenarios where the law says that | they do. | PeterisP wrote: | It's not a legal way to "launder" copyrighted images, because | for things where copyright law grants exclusive rights to the | authors, they need the author's permission, and having | permission from someone and plausible deniability is _not_ a | defense against copyright violation - the only thing that it | can change is when damages are assessed, then successfully | arguing that it 's not intentional can ensure that they have to | pay ordinary damages, not punitive triple amount. | | However, as others note, all the actions of the major IT | companies indicate that their legal departments feel safe in | assuming that training a ML model is not a derivative work of | the training data, they are willing to defend that stance in | court, and expect to win. | | Like, if their lawyers wouldn't be sure, they'd definitely | advise the management not to do it (explicitly, in writing, to | cover their arses), and if executives want to take on large | risks despite such legal warning, they'd do that only after | getting confirmation from board and shareholders (explicitly, | in writing, to avoid major personal liability), and for | publicly traded companies the shareholders equals the public, | so they'd all be writing about these legal risks in all caps in | every public company report to shareholders. | lumost wrote: | This is almost certainly going to be used to generate actual | pictures of real people in the nude etc. | 0cf8612b2e1e wrote: | Fake celebrity nudes pre-date the internet. | rchaud wrote: | Barriers to entry were a lot higher, and distribution | capacity was a lot lower. Surely you can see how the change | in that combination could make for a significantly different | reality now. | rightbyte wrote: | I honestly don't see the problem. Especially since any | solution to the non-problem is censorship and big tech | monopoly since a FOSS model can't be censored. | | A LLM wont be able to estimate the size of my wiener. I can | always claim it's the wrong size in the picture. | btbuildem wrote: | I really really doubt that. If anything, it'll be nerfed into | complete uselessness. | delecti wrote: | Doesn't seem to be possible. I tried a variety of real people | (Tom Hanks, George Bush, George Washington) and each time got | the error "This image can't be generated. Please try something | else." It did work with some fictional characters though, | namely Santa and Mickey Mouse. I'd rather not try asking for | nudes while at work, so I can't attest to that part either way. | Though "Sherlock Holmes dancing" looked pretty clearly like | Benedict Cumberbatch (though the face was pretty mangled | looking). | wongarsu wrote: | That has been a thing since 2019's DeepNude, and the world | hasn't ended. If anything it has been relegated to obscurity. | KaiserPro wrote: | Its not obscure. There are a bunch of paid apps that allow | you to "virtually undress" any image you upload. | | Which is already causing pain for a bunch of people. | wongarsu wrote: | There are paid apps or websites for lots of obscure things, | that's not really a high threshold to clear in today's | world. | broscillator wrote: | Yeah the key take away from that sentence was the harm | caused, not the obscurity. | acdha wrote: | Is it obscure or just not in the news you follow? There have | been many reports about significant impacts on school | students: | | https://www.technologyreview.com/2023/12/01/1084164/deepfake. | .. | | https://www.cbsnews.com/news/deepfake-nude-images-teen- | girls... | | https://www.theverge.com/2023/6/8/23753605/ai-deepfake- | sexto... | | It's also showing up in elections: | | https://www.telegraph.co.uk/news/2023/05/14/turkey- | deepfake-... | | https://www.wired.com/story/deepfakes-cheapfakes-and- | twitter... | Manuel_D wrote: | I encountered the same stories of people's faces being | photoshopped onto nude models when I was a kid back in the | 2000s. Deepfakes are nothing new. | GeoAtreides wrote: | > If anything it has been relegated to obscurity. | | oh man, if /b/ could read this they would be very upset right | now | KaiserPro wrote: | For that to work, you need to have a dataset of nudes to start | with. | | Given that instagram is pretty anti nudity (well women's | nipples at least) I'd be surprised if there is enough data to | work properly. | | Its not impossible, but I'd be surprised. | soultrees wrote: | At this point who cares honestly. The more 'fake' generated | nudes out there, means it's just not going to be a novelty. And | if everyone has the ability to generate an image of everyone | naked, the value for 'real' nudes will go high but it will also | be good cover for people who get their nudes leaked. | dopa42365 wrote: | How's that any different from the gazillions of more or less | good "how would you look like older/younger", "how would your | kids look like", "how would you look like as barbie" and what | not tools? One click to generate a thousand waifus. It's not | real, who cares. | nextworddev wrote: | I tried this and was floored how good this was | nothrowaways wrote: | The title is misleading. It uses publicly available photos, which | means it uses the same image as other AI models like GPT, | midjiurney ... | miguelazo wrote: | Wow, another reason to delete my accounts. | leptons wrote: | If nothing else they've done so far hasn't convinced you to | delete your accounts, then why would this? They've done worse | before. | WendyTheWillow wrote: | Because it's trained on "real" people, will it be easier to | generate ugly people? I have a hard time convincing DALL-E to | give me ugly DnD character portraits. | wobbly_bush wrote: | Aren't Insta images heavily edited? | doctorpangloss wrote: | > Because it's trained on "real" people, will it be easier to | generate ugly people? | | In the literature, testing concepts in image generation is | asking human graders "which image do you prefer more for this | caption?," so the answer is probably no. You could speculate on | all the approaches that would help this system learn the | concept "ugly," and they would probably work, but it would be | hard to measure. | PUSH_AX wrote: | In order for a model to understand what ugly is, someone or | something has to tag training data as "ugly", I find this to be | a complete can of worms | Jerrrry wrote: | >In order for a model to understand what ugly is, someone or | something has to tag training data as "ugly", | | that is a very dated (2008) concept. | | the model "understands" that 50% of people are below/above | median. | | consequently, those that are not "OMG girl ur | BEAUTIFUL"-tagged are horse-faced. | | It understands that the girl with the profile picture with | 200 likes and 2k friends is better looking than the girl with | 4 likes and 500 friends. | PUSH_AX wrote: | I fine tuned some checkpoints this year (2023), and that's | exactly how it worked. | | Unless your model is single focus for humans and faces I | find it hard to believe there is specific business logic in | the training process around inferring beauty from social | engagement. Metas model is general purpose. | Guillaume86 wrote: | Put beautiful/pretty in the negative prompt, should get a | similar result without the need for tagging ugly in the | training set. | hbossy wrote: | Try asking for asymmetry. The more images of faces you average, | the better they look. | brucethemoose2 wrote: | > It can handle complex prompts better than Stable Diffusion XL, | but perhaps not as well as DALL-E 3 | | This is a interesting statement, as Stable Diffusion XL | implementations vary from "worse than SD 1.5" to "Competitive | with DALL-E 3." | sjfjsjdjwvwvc wrote: | It depends what you want to gen and what prompting style you | prefer. I have found SD 1.5/6 to be far more flexible than | SDXL. SDXL seems more ,,neutered" and biased towards a specific | style (like dalle/midj); but this may change as people train | more diverse checkpoints and loras for SDXL. | brucethemoose2 wrote: | See, this is totally my opposite experience. SDXL handles | styles incredibly well... With style prompting. | | Hence my point. SDXL implementations vary _wildly_. For | reference I am using Fooocus. | al_be_back wrote: | to me these innovations seem akin to Concept Cars in the Motor | industry; there's some utility, until some executive takes it | center-stage, and pisses-off most of the core users. | | the biggest value in these networks is real User-generated | content, you can't beat billions of real users capturing real | content and sharing habitually. | | even if wording in the Terms permit certain research/usage, | you've got market and political climates to consider. | jafitc wrote: | All I can say is it's really fast | junto wrote: | Before anyone tries it out from the EU, be warned that it will | push to make a Meta account and merge any Facebook/ Instagram | profiles together and once you've finally bitten that bullet, it | will tell you that it isn't available in your region. | kevincox wrote: | Same in Canada | seydor wrote: | So it's just faces? | dmazzoni wrote: | If you ask it to generate an image of Taylor Swift, it refuses. | But if you ask it to generate an image of a popular celebrity | singer performing the song "Blank Space", it generates an image | that looks exactly like Taylor Swift some fraction of the time. | RegW wrote: | I wonder what other purposes FB have used those 1.1B+ publicly | visible photos to train models for? | Havoc wrote: | Meta is asking me to log in with my facebook account. Then after | authenticating with my FB account meta says I don't have a meta | account. | | Is this all some sort of scam to get me to click accept on | whatever godforsaken ToS comes with a meta account? If the FB | account is good enough to freakin AUTHENTICATE me then just use | that ffs. | __loam wrote: | This is extremely shitty to a lot of users. ___________________________________________________________________ (page generated 2023-12-07 23:00 UTC)