Thinking 'bout computers and data
---------------------------------

In slightly less angst-riddled but also
sort of rambling thoughts, too loose for
the actual gemlog.

So I've been thinking about how, well, I
think most people are getting the
conversation around large machine
learning models (LLMs, CLIP-likes, &c.)
kinda wrong.

Like there's several pieces that I think
need to get disentangled.

The first thing is that building
for-profit systems by taking things that
people have shared in good faith on the
internet is kind of gross
extractionism.

I'm not talking about the question of
"is this fair use". We're not talking
about the law here we're talking about
what is appropriate and respectful to
do. We're talking about extractive
anti-social behavior vs pro-social
behavior.

I think stable-diffusion being open
source is a step forward but if
Stability really wanted to do the right
thing they really should have worked on
a machine-learning specific equivalent
of something like the GPL, a license
what would poison any attempts to make
for-profit tools.

The second thing is that I think there's
something to all the people who look at
these large models that are trained on
mass-scraped data, like in the case of
the gpt-likes it's practically every bit
of human written publicly available
text on the internet, who say "hey I
didn't want my work used for this".

Like I've seen a lot of people get made
fun of for talking about "stolen art"
like "omg look at this capitalist who
believes in IP" and, man, I don't know I
think that's kind of a shitty attitude.

I mean I'm pretty sure if you went back
in time and told folks "hey so you can
put your art on the internet, your
thoughts and ideas on blogs, your code
on github but eventually there's going
to be big companies who build tools from
including all your work in a giant
dataset" I think there's a lot of folks
who wouldn't have posted their work
publicly at all.

I think that's where I get hung up on
lot. You don't even need to appeal to
things like intellectual property to
understand that consent and context
matters.

There are different gradiations of
privacy and expectation around
privacy. A formally published book is a
different thing than a private journal
is a different thing than a blog that's
technically publicly accessible but that
you only gave a few people the link to.

Basically what I'm saying is that people
have been informally managing the fact
that the modern internet, by design,
really has only allowed for two options
when it comes to privacy: completely
wide open and locked down to logged in
users.

Our ability to express our intentions
about how we're trying to share
information is something that has been
intentionally limited by the platforms
we've been pushed into. I think then
turning around and saying "well you put
it on the internet so tough luck" is,
again, just a really shitty response to
folks who have been frustrated that the
social rules they've been operating by
suddenly changed because companies with
big tech sector funding decided to
change the rules.

It shouldn't be hard for us to figure
out ways of building our tools that
still respect that all art exists in a
context and with an intention of who was
allowed to see and experience it, even
apart from any questions of money and
capitalism.

Anyway, I want to formalize this a bit
better and less polemically but I at
least wanted to get the in progress
thoughts first.