Thinking 'bout computers and data --------------------------------- In slightly less angst-riddled but also sort of rambling thoughts, too loose for the actual gemlog. So I've been thinking about how, well, I think most people are getting the conversation around large machine learning models (LLMs, CLIP-likes, &c.) kinda wrong. Like there's several pieces that I think need to get disentangled. The first thing is that building for-profit systems by taking things that people have shared in good faith on the internet is kind of gross extractionism. I'm not talking about the question of "is this fair use". We're not talking about the law here we're talking about what is appropriate and respectful to do. We're talking about extractive anti-social behavior vs pro-social behavior. I think stable-diffusion being open source is a step forward but if Stability really wanted to do the right thing they really should have worked on a machine-learning specific equivalent of something like the GPL, a license what would poison any attempts to make for-profit tools. The second thing is that I think there's something to all the people who look at these large models that are trained on mass-scraped data, like in the case of the gpt-likes it's practically every bit of human written publicly available text on the internet, who say "hey I didn't want my work used for this". Like I've seen a lot of people get made fun of for talking about "stolen art" like "omg look at this capitalist who believes in IP" and, man, I don't know I think that's kind of a shitty attitude. I mean I'm pretty sure if you went back in time and told folks "hey so you can put your art on the internet, your thoughts and ideas on blogs, your code on github but eventually there's going to be big companies who build tools from including all your work in a giant dataset" I think there's a lot of folks who wouldn't have posted their work publicly at all. I think that's where I get hung up on lot. You don't even need to appeal to things like intellectual property to understand that consent and context matters. There are different gradiations of privacy and expectation around privacy. A formally published book is a different thing than a private journal is a different thing than a blog that's technically publicly accessible but that you only gave a few people the link to. Basically what I'm saying is that people have been informally managing the fact that the modern internet, by design, really has only allowed for two options when it comes to privacy: completely wide open and locked down to logged in users. Our ability to express our intentions about how we're trying to share information is something that has been intentionally limited by the platforms we've been pushed into. I think then turning around and saying "well you put it on the internet so tough luck" is, again, just a really shitty response to folks who have been frustrated that the social rules they've been operating by suddenly changed because companies with big tech sector funding decided to change the rules. It shouldn't be hard for us to figure out ways of building our tools that still respect that all art exists in a context and with an intention of who was allowed to see and experience it, even apart from any questions of money and capitalism. Anyway, I want to formalize this a bit better and less polemically but I at least wanted to get the in progress thoughts first.