https://dansdatathoughts.substack.com/p/from-s3-to-r2-an-economic-opportunity

[https]

Dan's Data Thoughts

Subscribe
Sign in
Share this post
[https]

From S3 to R2: An economic opportunity

dansdatathoughts.substack.com
Copy link
Facebook
Email
Note
Other
[https][https]

Discover more from Dan's Data Thoughts

Various thoughts and commentary on data engineering, data science,
and data analytics.
[                    ]
Subscribe
Continue reading
Sign in

From S3 to R2: An economic opportunity

Cloudflare's R2 is an undiscovered gem and is redefining the
economics of data storage

[https]
Dan Goldin
Nov 2, 2023
Share this post
[https]

From S3 to R2: An economic opportunity

dansdatathoughts.substack.com
Copy link
Facebook
Email
Note
Other
 
 
Share

S3 is an amazing service. One can argue that the launch of S3 was the
origin of the modern data stack. The most common initial use case was
to store static assets such as images, stylesheets, and JavaScript
code but we quickly discovered that can dump whatever we want to it
at a low cost. The approach shifted from being careful around what
was being stored to just store everything since it may end up being
useful. Before S3 big data only existed in the megacorps but the
ability to store nearly infinite amounts of data cheaply launched a
whole new ecosystem.

One of the most common data specific uses of S3 is to stage data. If
you built your own data collection stack you typically have Kafka or
Kinesis collecting events and are offloading them to S3 for permanent
storage. Once the data in S3 you have a variety of options. You can
use Spark to read, manipulate, and transform the data before dumping
it back to S3 or a warehouse. Or you can load the data into your data
warehouse, such as Snowflake, and do the data processing there. Once
you're happy with that neatly massaged and transformed data you can
put in a variety of places to support a variety of use cases. You can
have it back in the data warehouse in order to power a reporting API,
or you can dump it as an Iceberg table to S3 and have it accessible
via Jupyter notebook, or just dump it into parquet files that can be
read via DuckDB, or countless other options.

The biggest problem with S3 is data transfer costs. It's a well known
secret that AWS uses data transfer costs as a lock in mechanism. From
the AWS S3 pricing page you're paying anywhere from $0.05/GB to $0.09
/GB for data transfer in us-east-1. At big data scale this adds up.
AWS obviously has lower internal pricing and can pass on the savings
but the point isn't to make more money as much as it is to encourage
lockin which of course leads to more money. A few years ago
Cloudflare wrote up an analysis that estimates that AWS has an up to
8000% markup on data transfer.

Cloudflare has an incentive in calling out S3's egregious pricing -
they have a competing service called R2 but it really is better. They
charge less per gigabyte of storage, less for the various operations,
and do not charge for data egress at all. It's amazing what they've
achieved. These days everyone is trying to find ways to leverage AI
on top of their data. And given how nascent the AI space is there's
still significant differentiation in the AI services offered across
the cloud providers. Microsoft has OpenAI, AWS has Anthropic, and
Google has Google. While we wait for the offerings to get more
commoditized it's valuable to have the option to use our data with
whichever provider gives us the most benefit. Cloud neutrality
doesn't matter as much when services aren't differentiated but
matters a great deal when there is true differentiation. And R2 has
that.

I'm surprised R2 hasn't been widely adopted and consider it an
undiscovered gem. If you have data heavy workloads and have a high
storage bill you should seriously consider R2. In fact, there's an
opportunity to build entire companies that take advantage of this
price differential and I expect we'll see more and more of that
happening.

Share this post
[https]

From S3 to R2: An economic opportunity

dansdatathoughts.substack.com
Copy link
Facebook
Email
Note
Other
 
 
Share
Comments
[https]
[                    ]
Top
New

No posts

Ready for more?

[                    ]
Subscribe
(c) 2023 Dan Goldin
Privacy [?] Terms [?] Collection notice
 Start WritingGet the app
Substack is the home for great writing
This site requires JavaScript to run correctly. Please turn on
JavaScript or unblock scripts