= Block ChatGPT plugins from using your website

To block ChatGPT plugins from using your website, add this to your `robots.txt` file:

[source,text]
----
User-agent: ChatGPT-User
Disallow: /
----

Automated web interaction is common, and there are many legitimate uses for them.
A weather app on your phone, for instance, probably makes an API call to some meteorological website for updated information about your region.
Legitimately useful services respect the contents of a special file, named `robots.txt`, and with that file you can allow or disallow a specific user agent access to certain directories on your website.
When you disallow a user agent from a forward-slash (`/`), you disallow it from _everything_ on your site, because the forward-slash represents the "root" or "web root" of your website's file system.

The ChatGPT-User user agent is the way ChatGPT plugins identify themselves when they visit a website.
The user agent appears in the web server log.
Should you see this bot in your logs, it means someone's written a plugin for ChatGPT that's accessing your site for some reason.
According to OpenAI LLC, GPT-3 was "pre-trained on a vast amount of text from the open internet", so ChatGPT isn't [apparently] actively crawling the web to be trained.
That's been done, and it's why ChatGPT seems to be able to generate useful content.

== Why block ChatGPT?

ChatGPT can be "fine-tuned".
It can be trained on specific data sets.
Someone could, for instance, write a ChatGPT plugin to generate "opinions" about movies.
If you run a movie review website, then that could be a useful target for training ChatGPT.
If you're not interested in having your content used by generative AI, then you might choose to disallow the ChatGPT-User user agent from your website.

Disallowing ChatGPT-User isn't by any means a guarantee that your content won't be used to train generative AI.
It only disallows ChatGPT plugins.
Anyone can scrape a website and then use that as a data set for fine-tuning, ising any user agent string.
User agents are voluntary and mostly unverifiable, so disallowing ChatGPT-User only prevents a direct call from a ChatGPT plugin to your website.
It doesn't remove your website entirely from ChatGPT's access, nor from generative AI training.
For that, you'd have to take your website offline, or you could try to minimize the likelihood of genarative AI interaction by migrating to https://opensource.com/article/20/10/gemini-internet-protocol[Gemini or Gopher].

== Sharing information

In open source software and free culture alike, there's a concept of "share-alike" and "permissive" or "public domain" licensing.
A share-alike license, like GPL or CC SA-BY, ensures that someone using your work must also share what your work has been incorporated into.
It protects the spirit of "open".

Permissive licenses can be convenient for low-level libraries, because they can be put into nearly any project regardless of a project's legal structure.
However, a permissive license has a significant side-effect: It obliterates social responsibility.
Permissive licensing, on the other hand, allows anybody to take from a community without ever giving back.

Unfortunately with the rise of "generative AI", like ChatGPT, permissive licensing is having an unintended consequence.
A machine learning model can use content that doesn't require sharing or even attribution to produce text and images that appears to have been pulled from the non-existent "imagination" of artificial intelligence, with no credit to the real human being who thought up and worked on the content that the AI is using as its foundation.
As with problematic companies that take code from open source projects without giving back, generative AI licensing is likely to preclude the community from using its content with equal freedom.

This is still a developing topic, and I suspect we have yet to see the beginning of licensing battles around generative AI.
In the meantime, you may want to keep legitimate bots (ChatGPT plugins, in this context) designed for generative AI away from content you intend to be open for everyone.
Identify the bots, like ChatGPT-User, and add a disallow rule for it in your website's `robots.txt` file.

== How to add a robots.txt file to your website

The `robots.txt` file is a plain text file placed at the root of your website.
Most human visitors to your site won't ever see it, and don't need to.
It's exclusively intended for, as its name implies, automated "bots".
A robot in this context isn't a synthetic humanoid typing away at a keyboard, it's just some simple code designed to go to every website on the Internet and take note of everything on the site.

There are three ways to add a `robots.txt` file to your website:

1. Upload it using an online file manager
2. Upload it using the `scp` terminal command
3. Talk to the person administering your website

Here are some tips on each method.

=== File manager

Many web hosting accounts provide you with a file manager to help you upload files to your site.
Every web hosting company is unique, so there's no way for me to know exactly what your file manager looks like (or whether your host provides you with one) but here's what the co-op webhost, http://webhosting.coop[webhosting.coop], uses:

image:file-manager.webp[A file manager on the web]

You can create a `robots.txt` file on your computer, and then use the **Upload** button in your website's file manager to copy the file into the **public_html** folder of your website.
This makes `robots.txt` available to the world, bots included.

=== OpenSSH

If you're comfortable with a terminal and you've set up your website account for OpenSSH access, you can use `scp` to copy `robots.txt` to the public folder of your website (usually called `public_html`).
For example:

[source,bash]
----
$ scp robots.txt seth@example.com:/home/seth/public_html/
----

=== Talk to your admin

If you don't manage your own website, or you're still learning how to manage it, then share this article to the person who manages your site.
That person probably has the ability to add a `robots.txt` file to your site.

== Open communities

The spirit of an open community is sharing, but its foundation is respect.
Generative AI can't be respectful or disrespectful, because machine learning is just code, a bot.
It's up to humans to use bots responsibly and respectfully, so if you're not producing content to interact with generative AI, then make it known in your `robots.txt` file.