[HN Gopher] EasyList is in trouble and so are many ad blockers ___________________________________________________________________ EasyList is in trouble and so are many ad blockers Author : shscs911 Score : 355 points Date : 2022-10-19 19:02 UTC (3 hours ago) (HTM) web link (adguard.com) (TXT) w3m dump (adguard.com) | bluehatbrit wrote: | I wonder what would happen if they renamed the file to robots.txt | and then did a redirect at the cloudflare level for the current | URL to robots.txt. | | I imagine some (many?) clients would handle it poorly, but it | would then be cachable at least. It's not exactly easy to test | though without a level of unknown damage to legitimate users. | noitpmeder wrote: | It seems they need some form of UUID in order to rate-limit | individual clients. I wonder what percentage of their traffic | would drop if they started requiring some form of authentication | to download this list? | russellbeattie wrote: | I used to host my blog on my own server years ago and some | popular site "hotlinked" to an image I had posted. My traffic - | which back then I paid for by the Gb - spiked like crazy. I | decided to put in Apache referrer checks for images and started | serving some porn when it didn't match my domain. That solved the | problem pretty quickly. | | Easylist should do something similar - requests from India should | include a list with all the popular sites like Google, YouTube, | aajtak.in, etc. When their browsers suddenly stop working the | problem will be solved. | Eisenstein wrote: | I am using a pi-hole which uses easylist, but it grabs it from | https://v.firebog.net/hosts/Easylist.txt which seems to be | working fine. | celsoazevedo wrote: | That's a different list. It only contains the hosts/domains on | easylist, because that's all pi-hole (and all host based | blockers) can block. It's also hosted by someone else (and they | too use Cloudflare, see firebog.net). | | The normal easylist is way bigger and has lots of rules for ad | blockers like uBlock Origin. | Tepix wrote: | This sucks. I hope they can reach the app authors and get it | fixed the source. | | Meanwhile perhaps some other CDN provider wants to create some | goodwill if Cloudflare isn't willing? | avani wrote: | I am not speaking on behalf of the company, but if someone | involved with EasyList can contact me (avani@cloudflare.com), | I'll see if there is a way to help out. | ameshkov wrote: | Thank you! I've passed that to EasyList maintainers. | Traubenfuchs wrote: | I would just put it on a public git repo and let it sort itself | out. | secondcoming wrote: | How much of that 100TB is for the stupid HTTP 'Date' header? | gst wrote: | > EasyList is hosted on Github and proxied with CloudFlare. | | What is the reason for proxying through Cloudflare? Are there any | bandwidth limits or performance issues when directly serving | those files from GitHub? | rapind wrote: | From https://docs.github.com/en/pages/getting-started-with- | github... | | --- | | > GitHub Pages sites have a soft bandwidth limit of 100 GB per | month. | | > In order to provide consistent quality of service for all | GitHub Pages sites, rate limits may apply. These rate limits | are not intended to interfere with legitimate uses of GitHub | Pages. If your request triggers rate limiting, you will receive | an appropriate response with an HTTP status code of 429, along | with an informative HTML body. | | > If your site exceeds these usage quotas, we may not be able | to serve your site, or you may receive a polite email from | GitHub Support suggesting strategies for reducing your site's | impact on our servers, including putting a third-party content | distribution network (CDN) in front of your site, making use of | other GitHub features such as releases, or moving to a | different hosting service that might better fit your needs. | | --- | tomudding wrote: | GitHub has a soft limit of like 100 GB/month on transfers for | Pages. According to the Adguard blog post traffic was already | several TBs a day before the issue arose. | pseudosavant wrote: | Why not only provide the list as a repo? You can't hotlink a | repo. And someone abusing raw links is GitHub's bandwidth | problem. | | 'Legitimate' users of the list would clone/pull the repo to | their own mirror? | pwinnski wrote: | Do you mean like this? https://github.com/easylist/easylist | | EasyList updates frequently, many times each day, as the | commits to that repo demonstrate. | pseudosavant wrote: | Exactly, but _only_ via a repo. | pwinnski wrote: | I'm curious, if an arbitrary GitHub repo suddenly started | attracting hundreds of terabytes of egress, violating | GitHub's ToS, would GitHub manage traffic in coordination | with the repo's owner, or would they disable the repo and | suspend the account? | | I suspect the latter. I don't know how to make a repo | public but limit web traffic to it. Do you? | pseudosavant wrote: | I could see disabling viewing raw links. But if the repo | becomes popular to fork what would GH do? The friction of | using git instead of HTTP will prevent 99.9% of | hotlinking. So it probably couldn't become _too_ popular. | sershe wrote: | Since they added "Access denied" for misbehaving browsers, can | they instead serve them some sort of bad response that will | "surface" issue to the users? Depending on what would work better | and cost less... (1) a small list that would block major | legitimate sites. Whoops, the browser is unusable, now users | complain to the developer to fix the issue, or abandon it. (2) | "hang" the request if the browser loads the list synchronously; | blocking UI thread is a hallmark of a bad developer, so they | might (3) stream /dev/zero. Might be expensive; maybe serve a | compressed zip-bomb if HTTP spec allows and/or browsers will | process it? | bugmen0t wrote: | Can't they run an open collective just to pay the bandwidth bill? | codalan wrote: | They shouldn't have to foot the bill for this. This is some bad | developers releasing a bad fork of a bad browser. | | At a minimum, they should have either reduced the requests to | something like a monthly download (not great, but far better | than requesting a file every startup), or ideally, hosted and | updated the file themselves, on their infrastructure. At least | that would force them to look at their own hosting bills, | instead of crippling a prominent and important contributor to | ad-blocking software. | metadat wrote: | 100TB of "Access Denied" replies per month, how many requests | served is this? | | This is hilarious in an unbelievably terrible and tragic way. The | scale is mind boggling. | | I wonder which browser it is. | ameshkov wrote: | Author here. Actually, it's getting better, I've just looked up | the stats and for the last 30 days we only served 70TB of | access denied pages, this is about 33-34B requests. | matthewaveryusa wrote: | That's 2 trillion requests a month, or 780,000 | requests/second. That would be just a little shy of 1% of all | of akamai's rps traffic. mind-boggling | ameshkov wrote: | Oh no, that were numbers for a month, not a day | [deleted] | mensetmanusman wrote: | Perfect application for a functioning system like torrent. | MuffinFlavored wrote: | > Even so, we continue to serve about 100TB of "Access Denied" | pages monthly! | | What's the carbon footprint of this? | ignoramous wrote: | The thing with _Access Denied_ is that these deprived clients | retry with some vengeance. So, you 're instead draining more | resources than you'd like. I run a content-blocking DoH | resolver, and this happened to us when we blocked IPs in a | particular range and the result was... well... a _lot_ of | bandwidth for nothing. | ameshkov wrote: | In this particular case this is not the case and they don't | retry. They just really want to download updates REALLY | often. | sneak wrote: | Why serve any HTTP replies to those at all? If you are doing | it at the IP level, why not just drop all inbound packets | from the L3 address? | ignoramous wrote: | We were on Netifly way back then. So, no L3 blocks. Now on | pages.dev and workers.dev, but haven't needed to enforce | any rules yet. | DigitallyFidget wrote: | This is what I was wondering. I'm taking a wild guess that | maybe they don't have that level of firewall access and it | was being done through filtering by the webserver to | provide an access denied. | DigitallyFidget wrote: | But why bother with deny? Just send a blank text file (or one | with as minimal data as needed to satisfy the rogue adblock) | to the "blocked IPs" to mitigate the traffic for now. If | firewall access exists, just drop the offending incoming | traffic entirely. | bombcar wrote: | This is the correct answer, and basically you have to setup | round-robin DDOS protection that provides these "wrong" | answers. | | While still trying to allow valid traffic through. | ignoramous wrote: | > _Just send a blank text file (or one with as minimal data | as needed to satisfy the rogue adblock) to the "blocked | IPs" to mitigate the traffic for now._ | | The sent http _body_ was blank, but I beleive we were still | sending http _head_... | | > _If firewall access exists, just drop the offending | incoming traffic entirely._ | | True, but the service we were using at the time didn't have | a L3 firewall, and so we ended up moving out, after paying | the bills in full, of course. | pseudosavant wrote: | It would seem like you could prevent hotlinking by adding 1-5 | minutes of latency to every request to a list. | | Almost no dev would hotlink an asset that took that much longer | to display, at least in critical/common paths. It would force | consumers (devs/businesses) of the lists to provide a | caching/mirroring solution of some kind for their users. | | But on the bankend, the request would be designed just for | updating the list cache. Handling 1-5 extra minutes per request, | on a request that runs less than a few dozen times a day, to | update the mirror/cache is trivial. | cogman10 wrote: | The issue with this approach is it's too late. It might work if | you designed it from the start, but adding it now would only | destroy your poor balancer with all the connections they have | to maintain (waiting for the 5 minutes to expire). | | It was mentioned in this article that they are now serving up | accessed denied, but the problem is one of just too many | requests. | | At this point, it's likely easier to just kill the domain all | together and get a new one. | pseudosavant wrote: | This is certainly not a cure to the problem Easylist has | right now. This is prevention. About how to design publicly | consumable resources to naturally discourage hotlinking, | before it is a problem. | xeromal wrote: | That's what the person you're replying to said. | tomschwiha wrote: | I'm confused about the ToS comment by Cloudflare. The txt is on a | website so it is a web content? | | So robots.txt is not supported by Cloudflare to cache/proxy it? | That would be a weird regulation. And I bet everyone violates the | Cloudflare ToS then. | andiareso wrote: | Yeah... That just doesn't seem right. All web content is | text... | jonny_eh wrote: | > All web content is text... | | It's all 1s and 0s too | naikrovek wrote: | text/html is not text/plain but that doesn't matter: it's not | a technical limitation that caused cloudflare to draw this | line. | | it's cloudflare deciding to protect "web content" and not | videos or .iso images or other things that _normally_ are not | commonly served while you browse a contemporary website and | read HTML. | webstrand wrote: | I guess they just need to serve it with a minimal html shell | tyingq wrote: | It's from this tos page: https://www.cloudflare.com/terms/ | | _2.8 Limitation on Serving Non-HTML Content | | ...Use of the Services for serving video or a disproportionate | percentage of pictures, audio files, or other non-HTML content | is prohibited, unless purchased separately..._ | | A huge text/plain artifact, requested often, would seem to fall | into that category of "disproportionate percentage" compared to | text/html served. | tomschwiha wrote: | Cloudflare can decide whom they want to do business with. But | a plain text file is in my opinion sort of HTML. At least it | is not "non-html" content. A .pdf file would be non-HTML | content. | | What else is important to note that the client is being | abused and not the client abusing the service. That should be | taken into consideration, when deciding if someone is | breaking the ToS. | lcnPylGDnU4H9OF wrote: | I'd agree that's weird. Seems like if it were simply | renamed to .html with no content changes, then it would be | okay. | | > What else is important to note that the client is being | abused and not the client abusing the service. That should | be taken into consideration, when deciding if someone is | breaking the ToS. | | My understanding has this as moot. The issue from | Cloudflare's perspective is only that the content is non- | HTML and doesn't have anything to do with the rate of | traffic (the abuse). | bombcar wrote: | > (i) serving web pages as viewed through a web browser | or other functionally equivalent applications, including | rendering Hypertext Markup Language (HTML) or other | functional equivalents, and (ii) serving web APIs subject | to the restrictions set forth in this Section 2.8. | | The key is "as viewed through a web browser" imo, this is | not really an API and it's not a webpage; it's a datafile | and would fall into R2 or similar things. | lcnPylGDnU4H9OF wrote: | I see, that makes the position more understandable. I | guess the same rule would (should) apply if they did | indeed simply change the extension. | Spunkie wrote: | Why do people keep talking like you can't just navigate | to a txt file in your browser and have it serve as any | old web content? Which is something I have actually done | many years ago to search for a domain in these types of | lists. | | Cloudflare is balancing on a razer for this TOS | technicality. | r3trohack3r wrote: | The TOS aren't referring to content-type headers, magic | bytes, TCP headers, browser support of file formats, or | any technical implementation. | | To oversimplify, they're saying Cloudflare's service is | to be used for serving websites to browsers. | | Serving a static text file that is primarily used by | applications is not in line with their terms of service. | | Cloudflare provides a significant service to the free and | open web by subsidizing the hosting costs of static | content for websites. They give that away for free under | what appears to be reasonable terms. I'm not sure why | you're trying to "gotcha" through their ToS. | | It would be great if Cloudflare would donate resources to | EasyList - it would do a lot to help the free and open | internet by giving users more power over what gets | delivered to their browser. But call that what it is: a | donation. | bombcar wrote: | It's lawyer speak, but the meaning is clear "this | Cloudflare service is for webpages in a browser, not | automated data downloads and distribution". | LinAGKar wrote: | A filter list is definitely not HTML | briffle wrote: | They host the zipped files of content for haveIbeenPwned | for Troy Hunt... | sp332 wrote: | That's a special project they decided to take on, not | subject to the standard ToS. | Macha wrote: | The minimal spec valid HTML5 document is currently: | <!DOCTYPE html> <title>a</title> | | Practically, browsers will accept omitting both of these, | and the spec even allows for omitting the title "if it is | provided by a higher level protocol" | | So it's not that crazy an argument that a plain text file | is a html document | [deleted] | tomrod wrote: | If I can read it in Lynx, it is web content. | layer8 wrote: | The solution seems simple, just wrap it in a trivial HTML | envelope. Enclose it in <pre> tags if needed. | ignoramous wrote: | This limitation apparently doesn't apply to R2 / Workers [0]. | | May be _EasyList_ could host them there? That 's what we do | [1] (and the dashboards show 400TB+ per mo [2], likely rigged | by the traffic between Workers and Cloudflare Cache). | | [0] https://news.ycombinator.com/item?id=20791660 | | [1] https://news.ycombinator.com/item?id=30034547 | | [2] https://nitter.net/rethinkdns/status/1546232186554417152 | spatley wrote: | My best guess is that CloudFlare wrote this to prevent folks | from serving big binary files like photo, music, or video and | this txt file case was an unintended condition that happens | to work to CloudFlare's advantage. | | text/plain though is decidedly not text/html and I would | expect CloudFlare to potentially do some on-the- fly | optimizations that are aware of the structure of an html file | that save terabytes a day at their scale. | ignoramous wrote: | > _My best guess is..._ | | Some think its very _Oracle_ of Cloudflare to do so. I do | not blame them. | bornfreddy wrote: | Sounds like it is meant to deal with multimedia mostly? | | But anyway, just rename .txt to .html and you're done. | Maxburn wrote: | Simple but will it will break all sorts of automation down | the line? All the other adlists are txt and I don't know | how they would handle other file types, even if the content | is unchanged. | PaulDavisThe1st wrote: | Determining file type from the file name suffix is a | fool's game and always was. | ethbr0 wrote: | Is it? Seems superior to arbitrary magic numbers or | headers, and God forbid full naive parsing, in most ways. | mannykannot wrote: | I doubt there is any solution that is both robust and | simple. In a sense, it is the same problem as that which | ad blockers are attempting to solve. | Maxburn wrote: | I hear you there. I'm more thinking someone probably hard | coded txt file extension somewhere so something is likely | to fall apart in simply handling that file. | tomschwiha wrote: | Fun stuff like embedding data into jpgs or pngs. | tyingq wrote: | I imagine that might help with automated tos rate limiting, | but eventually someone at Cloudflare will probably cut them | off. It's plain text, but it's basically serving a | distributed database. And a hint at their scale is _" 100TB | of "Access Denied"_ served up monthly. | | Cloudflare just seems to be trying to limit the free tier | to "caching website html for the purpose of showing it to | humans". They have pricing and plans for things other than | that. | Slix wrote: | This doesn't sound right to me. Cloudflare also protects web | APIs. This text file is an extremely simple web API, but it | is still a web API. | tyingq wrote: | If the web apis were a disproportionate amount of what was | served for some customers specific free CF plan, as | compared to the cached HTML, then that doesn't match their | TOS. | kenmacd wrote: | Imagine you're trying to block a DDoS attack. If the client is | downloading HTML then they likely also have JS enabled giving | you a ton of options for running code on their computer to help | you decide if the traffic is legitimate. | | If they're downloading text you can still use the headers, and | some tricks around redirects, but overall you have far less | data on which to decide. | tomudding wrote: | Cloudflare caches robots.txt by default when proxied (the only | .txt-file that they automatically cache), for all other content | the following from their ToS probably applies: | | > Use of the Services for serving video or a disproportionate | percentage of pictures, audio files, or other non-HTML content | is prohibited, unless purchased separately as part of a Paid | Service or expressly allowed under our Supplemental Terms for a | specific Service. | | We will never know the reasoning of the support agent who | replied to the EasyList maintainers, but I can imagine that it | is indeed disproportionate for EasyList. | | I really hope that Cloudflare actually sees that they are | making a wrong decision here and actually help the EasyList | maintainers. | tyingq wrote: | The TOS isn't that you can't serve plain text, it's that it | shouldn't be disproportionate in volume to the cached html | served. | jakear wrote: | Sounds like they're just using the wrong service. R2 is | designed for object storage, and has 0 egress fees. That'd be | the way to go. Not sure why the support engineer didn't mention | it. The standard cloudflare web caching probably doesn't work | well for this use case for whatever reason. The price is only | 0.015/GB/mo, so the ~MB(?) of list would be served in | perpetuity for less than a dollar. | vel0city wrote: | They're probably still getting many millions of requests a | month so probably more than a dollar but even 20 million | requests a month would only cost $3.60 (10 million free at | first then 10 million @ $0.36/million) | | I assume you probably know this but just wanting to share | there _are_ some pricing scales with R2 they 're just pretty | generous for a lot of things. | bombcar wrote: | Elsewhere they say they are seeing 36 billion | requests/month so that would be nearly $1,300 just for | these access denides. | stavros wrote: | Actually, you're right. How would this work? Is Cloudflare | _really_ willing to foot the bill of 20 TB of bandwidth per | day for a small text file that costs $0 to store? | trinovantes wrote: | Egress is free but not public i.e. you can't just give | anyone an url. You have to use your own server to fetch | content from R2 and then serve it to your visitors. Each | fetch costs money but first 10 mil reads are free and your | own server probably has egress fees. | stavros wrote: | No, egress is indeed public. Here's an example link, | straight to R2: | | https://img.phantasmagoria.me/img/96XJrjejoHNdrQv7.jpg | | Even if you have a private bucket, you can give people a | signed link with read access, for up to two weeks, IIRC. | trinovantes wrote: | Ah, I see they added public buckets last month | | But it'll still cost them money by number of reads | stavros wrote: | Hm, yeah, true, you do need to pay for reads, you're | right. | rvnx wrote: | Yes why not ? For reputation and attracting developers it | seems to be worth it. If it costs 75K USD/year, that's | already paid back with one big enterprise customer only. | | Though, adblocking is a big business, many actors there are | getting large revenue. | | For example, Eyeo's income was 50 million USD per year last | time I checked (and I guess most of it is actually profit), | so they can find a solution if they really want. | publicarray wrote: | Maybe try the open source programs at Fastly.com or Bunny.net | | https://www.fastly.com/open-source | | https://bunny.net/contact/ | mensetmanusman wrote: | I'm surprised ad companies have not tried this as an 'attack' | vector already. | randoglando wrote: | Would DNS blocking be affected by this (I presume not since the | lists are hardcoded)? | hnaccy wrote: | can they not block Indian IPs from cloudflare dashboard | fdfaz wrote: | I think the free plan allows this. Seems an easy solution. | bluehatbrit wrote: | I'll have a huge impact on anything making legitimate use of | the list. Adblockers on sensible browsers will stop working | etc. | | It may be easy, and it may even be the only option, but it's | a bad one that will need some thought from the maintainers I | expect. | [deleted] | Raed667 wrote: | That would prevent 1.4 billion people from ad-blocking. Not | sure if we want to use these kind of blanket measures as a | first response. | celsoazevedo wrote: | The logic here is that it's better to block 1 country and | keep it working for everyone else than to leave it as it is | and break it for everyone. | | It's not ideal, but until the problem is fixed/better | solutions are found, I think it's a good "first response". | sneak wrote: | No, it would block 1.4 billion people from that one specific | URL. | neilv wrote: | Assuming it's not a kind of DoS attack, and since it sounds like | they can detect the abusing clients (maybe by User-Agent)... some | very desperate technical options involve serving an _alternate_ | small blocklist that does one of: | | 1. Try having it block subsequent requests for EasyList itself, | just in case the frequent update requests are made with the prior | blocklist in effect. (I accidentally did this before, in one of | my own experimental blocklists, atop uBlock Origin.) Then the | device vendor can fix their end. | | 2. If the blocklist language and client support it (I suspect | they don't), you might safely replace or alter some Web pages, to | add a message saying to disable EasyList in the client, or | pressure the vendor, or similar. If this affects a lot of users, | the meaning will also be spread in other languages to other | users, even if not all of them understand any of the languages in | the message. But be careful. | | 3. If you can't get a better message to the user, another option | might be to block all requests, to prompt users to disable | EasyList or vendor to fix the problem. But before doing this, | you'll need to have verified that a combination of shoddy | client/device software won't prevent users from using important | functions of their devices for significant time. (Imagine this | might be their only means of being connected online, and some | shoddy client software pretty much prevents it from working, and | the user is unable to access critical services.) | | But before doing any of these desperate technical measures... | First, I'd really try to reach people in the country who'll know | what's going on, and who can reach and possibly pressure the | vendor who's causing the problem. If tech industry people aren't | able to help quick enough, reaching out to that government, | directly or through your own country's diplomats/officials, might | work. Communicating the risks of the desperate technical measures | that you're trying to avoid (e.g., possibly breaking critical | communications) could help people understand the urgency and | importance of the situation. | rc_mob wrote: | Phew. Is just a bandwidth issue. This goofy title made me think | advertisers found a way around ad blockers. | laundermaf wrote: | This issue caused CF to irreversibly ban them though, so it's | not "just a bandwidth issue" anymore. | | > Based on the URL that are being requested at Cloudflare, it | violates our ToS as well. All the requests are txt file | extension which isn't a web content | | > you cannot use Cloudflare to cache or proxy the request to | these text files | jsty wrote: | > This issue caused CF to irreversibly ban them though | | Do you have a source for that? The article only mentions them | being throttled + the screenshot with the support engineer | saying they seem to be breaking the ToS and asking them | politely to move back into compliance. | [deleted] | nicce wrote: | Well, CF is just one service provider. There are bigger | issues if they have already such a monopoly that their | decisions kill projects worlwide. | ziml77 wrote: | Where did you get that they were irreversibly banned? Or | banned at all for that matter? | codalan wrote: | They didn't get banned. They got an email from CF support | saying that they cannot cache TXT files and that they'd | need to disable the proxy. | | This does not mean banned. | ignoramous wrote: | It is a bandwidth issue for a _volunteer-run_ project. | shitlord wrote: | In a way, the advertisers did find a way around ad blockers. | | Google built an entire browser and used Manifest V3 as an | excuse to cripple ad blockers. | | Companies are also paying influencers, twitch streamers, and | YouTubers to promote their products in a way that conventional | ad blockers can't prevent. | NaturalPhallacy wrote: | In case anyone reading hasn't heard of it: SponsorBlock for | YouTube - Skip Sponsorships - | https://chrome.google.com/webstore/detail/sponsorblock- | for-y... | kaushikc wrote: | Sponsorblock has saved me hundreds of hours from watching | youtube ads and other time wasting bullshit. The devs | deserve to be paid for making this awesome application. | chlorion wrote: | I am on Chromium and using the manifest V3 version of Ublock | right now and I have noticed no difference between it and | Firefox with regular Ublock. | | The very interesting thing is that none of Google's ads have | ever made it through this new version of Ublock for me. | fweimer wrote: | pool.ntp.org hands out specific subdomains for large-scale pool | users. This way, it is possible to retire service for a subset of | users that use devices that aren't updated anymore and are | misbehaving. | | The traffic issue is not just punted to DNS service. It's | possible to return a cachable 127.0.0.1 response, and it's | somewhat rare for DNS caches to be constantly powered up and down | _and_ reach out directly to authoritative DNS servers. | therealmarv wrote: | Bittorrent, switch in long-term to that. Not saying every end- | user should be a seeder but there is big bittorrent community out | there and everyone could help a little bit. | | Other options: | | - A kind of mirror network (it only needs to keep sure that | integrity can be checked, maybe with a public key) | | - And while doing that why not also support compression (why not? | only devs need to read it and they can run easily a decompression | command), every bit saved would help. | ignoramous wrote: | > _Bittorrent, switch in long-term to that._ | | S3 buckets in IAD with <5GB blobs can double-up as bit-torrent | seeders. | | I'd imagine, some tech IPFS/Filecoin/Sia might come in handy, | too, but unsure of how healthy most of these web3 projects are | right now. | | There's also fosstorrents.com that help seed projects. | pmlnr wrote: | Oh, but yes every user should be a seeder. Why not? | nnopepe wrote: | serve a modified version to rate limited IP's that only contains | popular indian sites and I'm sure it'll be resolved in a day or | two | bionade24 wrote: | Limit this to the specific headers of these Webbrowsers though, | please. | stereo wrote: | They already have that part figured out. From the article: | | > When we encountered a similar problem last year, we found a | simple solution: block the undesired traffic from these apps. | Even so, we continue to serve about 100TB of "Access Denied" | pages monthly! | d12bb wrote: | The difference is that serving Access Denied Leads to the | users of these malicious browsers just getting more ads | over time, as the filter lists can't be updated anymore. | Serving a special list containing popular sites would | result in the users almost instantly not being able anymore | to access these popular sites, resulting in requests to the | developers to fix their shitty browser or switching | altogether. | [deleted] | bombcar wrote: | 100TB of Access Denied is only 38 MB/s, so not even a minor | DDOS these days. | georgebarnett wrote: | Blocking the browsers isn't a solution because they likely | fall back to being open, so the user doesn't notice. | | Instead, you need to break the user experience so they | complain to the developer of the app, thus impacting | reputation. | | It's unfortunate that the browsers developers are | unresponsive and this circumstance limits the available | options to easy list. | Raed667 wrote: | I worked on an ad-blocker a few months ago. I made the decision | to have the filter-list files hosted on our own domain and CDN | (similar to what Adguard does with their filters.adtidy.org). | | This was done for 2 reasons: | | 1- Avoid scenarios like this where you ship code (extension in | this case) that is hard to update. Then make that code depend on | external resources outside of your control. | | 2- Leak our users' IP addresses to each random hosting provider. | | So the solution was simple: Run a CRON once a day then host the | files ourselves. Pretty happy with that decision now. | chrismeller wrote: | Except neither of those would help in this case. They're | already using their own domain name, and it's unclear how they | would even build their own CDN since they're using that scale | of bandwidth - AdGuard said they're still pushing 100tb of | access denied pages a month for their similar case. That is a | LOT of bandwidth just for access denied messages. | lolinder wrote: | Their point isn't that EasyList could have done anything | differently, their point is that _they_ are glad that they | didn 't decide to rely on others' infrastructure for their | own ad blocker, because that makes _them_ resilient against | the fallout from this and similar. | chrismeller wrote: | Except it wouldn't make them resilient since, as I pointed | out, neither of the things they did would be of any help at | all to Easylist in this situation. | | It's great that they're happy with their choices, but the | choices would, in this same situation, likely saddle them | with a crippled infrastructure and/or some _insane_ | bandwidth bills for suddenly pushing 100 extra TB /m. | girvo wrote: | It was never suggested that what the original commenter | was doing _would_ help EasyList. | chrismeller wrote: | I didn't say the OP did, though it was implied. I was | responding to a comment which did... context, my friend. | girvo wrote: | It was not implied, you added that implication yourself | and started responding to things that were not said, | which is why the other commenter who replied to you was | also confused, my friend. | lolinder wrote: | EasyList got here because they _want_ all (respectful) | apps to be able to use their list. They _invited_ | traffic, the problem is only occurring because this | unknown browser violates the implicit "as long as you're | considerate" rule. | | OP, in contrast, wrote their own adblocker targeting | their own servers. They're in control of their ad blocker | code and can write it to be respectful of their servers. | They're not hosting the lists with the intent of allowing | other people to use it, and they're unlikely to attract | lazy app developers because the endpoints are | (presumably) not listed publicly on the internet to | anyone who wants an easy adblocker list. | kccqzy wrote: | I can't imagine slowing down could be a good idea. At their | scale, the sheer number of connection count probably matters more | and may contribute to a higher proportion of cost. Bandwidth is | expensive yes, but keeping a connection alive means consuming | extra memory for the socket in the kernel, in the app, and in | many other places. | extantproject wrote: | https://archive.ph/PKyUT | slt2021 wrote: | just ban all Indian IPs from the website on firewall. | | proper solution would be to use DNS to forward all Indian traffic | to one of the local VPCs. | | The idea is to rate-limit users - let them pull blacklist.txt | only once a day, so you serve the file and add requestor's IP to | denylist on the firewall. So that any subsequent requests are | blocked by firewall | | +Cloudflare has Geofencing feature | Reallyneed wrote: | Open and public utility data could be served with p2p | technologies. | eis wrote: | Regarding the 100TB of Access Denied pages: just drop the | connection instead. | | To make the system more scalable: instead of directly serving the | file, serve a bunch of URLs to mirrors plus a checksum. The | client must pick one of them. You can randomize the URLs and | maybe add some geo logic to it. Let people provide mirrors. An | additional indiraction step like this can prove incredibly | powerful for systems that need to scale massively. | [deleted] | pebcakID10T wrote: | My first thought as well. Why even provide a response? | jijji wrote: | Why not just limit one request per week per IP by those suspect | netblocks | tredre3 wrote: | I'm sympathetic to their trouble but we're talking about serving | a 330KB text file (150KB compressed), surely this isn't an | insurmountable technical hurdle to overcome? | | A 1000mbps dedicated server could serve it 70 MILLIONS times per | day. Considering that most wouldn't be served (E-tags and | whatnot), it can probably sustain a billion requests a day. | | What am I missing? | celsoazevedo wrote: | The last copy of the .txt file saved by the WayBackMachine is | 1.4MB: | | https://web.archive.org/web/20220901000327if_/https://easyli... | | If you're getting a 330KB file, maybe the server issues are | causing the download to fail? | edf13 wrote: | It's not really the serving that's the issue - it's the amount | of bandwidth used... in the case of serving simple content | (like txt) bandwidth is always going to be the expensive | element | ameshkov wrote: | You're missing three orders of magnitude. | ajsfoux234 wrote: | > The overall traffic quickly snowballed from a couple of | terabytes per day to 10-20 times that amount. | | A 1000Mbps server could only serve 10.8TB, and that's not even | accounting for overhead/daily usage patterns/etc | [deleted] | [deleted] | mmastrac wrote: | Scaling serving of a static text file would make a fantastic job | interview question, especially as you explore what happens as the | text file gets bigger and the number of downloads gets higher. | The "correct answer" isn't necessarily obvious in any of these | cases. | noasaservice wrote: | Really? "text files" arent web content? What the fuck does | cloudflare think CSS and HTML files are? | cr3ative wrote: | To be fair to them, this is configuration data, not a piece of | a website you would read in a browser. I don't agree with the | policy but it is reasonably clearly worded. | CityCobra wrote: | JoyfulPanda wrote: | I'm not sure, but is IPFS capable of solving the issue? | Reallyneed wrote: | Yes. But client must be native IPFS, not relying on gateway... | sneak wrote: | No, the issue is with a misbehaving client program accessing a | specific HTTP URL. | fabianhjr wrote: | The gateways would similarly not be thrilled about it. | | I would add a redirect to the makers of the browsers in | question (so that the leechers got to deal with the traffic | themselves; https://en.wikipedia.org/wiki/Inline_linking) | ff7c11 wrote: | Easylist should serve the Indian browser (based on user-agent) | with a giant file (expensive), a corrupt file, or some response | which causes the app the crash. If the browser crashes on every | startup due to a malicious response from the Easylist server, | users will likely delete it. | tux wrote: | GitHub @ https://github.com/easylist/easylist | | Alternative Lists @ https://filterlists.com/ | runlevel1 wrote: | 1. Add a ToS to the EasyList website that prohibits this sort of | abuse. (I don't see any currently.) | | 2. Send a cease and desist letter to the app creator. | | 3. If they don't respond, also send a C&D to Google demanding | they cease distribution of the malware responsible for the DDoS. | | Anyone can send a cease and desist -- it's just cautionary | letter. You aren't obligated to follow through with the | threatened legal action. | | It doesn't have the force of law behind it, but it'll at least | get their attention. | | (IANAL) | JimWestergren wrote: | A great opportunity right now for CloudFlare to win some goodwill | and PR by helping out EasyList for free right now. | | But what about simply enable a firewall and show captcha or | similar if the origin IP is from India and requesting that URL | until the situation is under control? I did that with the free | plan recently in CloudFlare in a similar situation and it worked | perfectly (of course on a much smaller scale). | metalliqaz wrote: | that would break everyone in India _not_ using one of those | broken browsers | iforgotpassword wrote: | They are already serving access denied replies, so I assume | they can identify the browsers via user agent or similar? | | If so, returning a bogus file that blocks everything and | adding a comment in that list asking the developers to use | caching or mirroring the file should be fine. | | I wonder if those browsers honor the list when fetching the | update though. Would be awesome if you could just add | easylist and lock out further requests right on the device. | democra wrote: | Browser developers can choose to fake user-agents. Brave | uses a generic chrome user agent so it cannot be | differentiated from regular Chrome. | bluehatbrit wrote: | Most requests will be in the background or in Cron jobs. | Captcha wouldn't be possible in those situations as it would | never be seen by anyone. | Nextgrid wrote: | I'm not sure a captcha would help though. These aren't | intentional attack requests, they're "legitimate" requests by | a clueless developer's app that happened to get popular. | | They just need to serve either an empty response or an | intentionally broken rule to break the misbehaving browser | and force its developers to fix it. | bluehatbrit wrote: | Yes there is of course that as well! | rvnx wrote: | These apps behind cannot render the captcha, as the fetch is | happening in the background. | | However what you can do is match the user-agents, and return a | global/catch-all adblocking rule that blocks all the content of | all the pages (by blocking the body element). | | The app developers are going to notice the issue very fast | (because users are reporting the problem), and mirroring the | lists or adding a cache is immediately going to be their | priority. | | Bonus: I think some browsers and extensions can execute | JavaScript in adblocking rules; | https://help.eyeo.com/adblockplus/snippet-filters-tutorial | | (which is essentially re-using a gigantic XSS in order to | notify the user) | [deleted] | jannyfer wrote: | Blocking all page content to knowingly cause unintended | behavior... I wonder if this can be considered criminal. | | I read that poisoning your own lunch to catch a workplace | fridge thief could be considered assault. | | EDIT: here's what I read. | https://law.stackexchange.com/questions/966/can-one-be- | liabl... | | Imagine, say, you update the list to block all URLs, and it | impacts some municipal government worker's ability to update | some emergency alert service and causes hundreds of people to | be permanently injured. | rvnx wrote: | I don't think so. Google often knowingly and intentionally | breaks apps (through API deprecation) because it's more | convenient for them or that it is costly to maintain. | Nothing criminal there. | | Same for Easylist, if they decide that a quota of 100000 | requests per IP+UA per day is the maximum, that's their | choice. They owe nothing to the consumers of the lists. | | That being said; Easylist actually benefits from being | distributed in many apps; it is really valuable to | influence / control adblocking lists, so the more flexible | they are to the browser developers, the better (I guess). | Volundr wrote: | If an application can't handle failed web requests that | application is already broken. Web requests can and will | fail at any time. | charcircuit wrote: | I think the idea was to block users without technically | consuming bandwidth. A captcha is equivalent to blocking. | bergenty wrote: | A captcha for all 600 million internet users seems like | overkill. Maybe a smaller subnet range. | anigbrowl wrote: | I can't understand their argument that a text file 'isn't a web | content'; seems like a bullshit excuse. | r3trohack3r wrote: | This doesn't sound like bullshit to me. Serving a static text | file that is primarily used by applications is not in line | with their terms of service. | | Cloudflare provides a significant service to the free and | open web by subsidizing the hosting costs of static content | for websites. They give that away for free under what appears | to be reasonable terms. | jacooper wrote: | Maybe if they created a web page for easylist and then hosted | that + the lists directly on CF pages, maybe that would | considered as web content? | cvwright wrote: | Does not inspire confidence in Cloudflare, that's for sure. | PaywallBuster wrote: | Cloudflare claims that R2 have free egress/bandwidth | | You could try that instead of the "CDN" service | | -- | | Alternatively, try the "cheap" CDN services like Bunny or Beluga, | which have packages for high volume like 0.005c/gb | | Cloudflare is not really selling a CDN, but all the "smart" | services on top of it. | | That's why you don't have as much control (like blocking IP/Geos | without Enterprise), or run into issues for breaking their ToS. | Jabbles wrote: | > like 0.005c/gb | | You're off by a factor of 100. | | https://www.belugacdn.com/cdn-pricing/ | | $5000/PB = $5/TB = 500c/TB = 0.5c/GB | PaywallBuster wrote: | beluga | | > You pay 1C/ (or less!) for every Gigabyte of data | accelerated over our cloud network. | | doesn't seem clear cut as they bundle it in a subscription | packages, but at this volume you'd surely need their | enterprise package (10Tb+), which you'd expect to get 0.01 or | less | | bunny | | > First 500TB $0.005 /GB | | > From 1PB-2PB $0.002 /GB | nfriedly wrote: | I have my pi-hole configured to use the Domains list from | https://oisd.nl/ which incorporates a bunch of other lists | (including easylist), de-duplicates, and removes a few false | positives. They also have an Adblock Plus Filter List that can be | used with uBlock Origin. | disadvantage wrote: | > There's an open source Android browser (now seemingly | abandoned) that implements ad-blocking functionality | | > The problem is that this browser has a very serious flaw. It | tries to download filters updates on every startup, and on | Android it may happen lots of times per day. It can even happen | when the browser is running in the background | | EasyList should be offered as a version-controlled copy you grab | once, that then gets bundled with an app, rather than offering a | download to be called from an app: | https://easylist.to/easylist/easylist.txt (Currently down as of | writing). | | The only caveat is such a list needs to be updated, so then a | version system should be implemented for EasyList and you | periodically bundle the new version via app updates. It would | save a lot of bandwidth doing this. | pwinnski wrote: | They're offering a text file. Presumably with an If-modified- | since header, although it's hard to check now. | | There is no approach you can describe that doesn't run afoul of | the described badly-behaved browser app which willfully | retrieves the entire file afresh at every init. If it _can_ be | downloaded, it _will_ be downloaded directly by the badly- | behaved mobile apps. | stillbourne wrote: | You can use firefox on android and install ad block on it as a | plugin | ElectricalUnion wrote: | How does that help when you can't download easylist anymore | since it's under DDoS? | muststopmyths wrote: | Call me lazy but I'd just support an If-modified-since header | in such a simple case and call it good | comboy wrote: | Bad app will just brutally fetch it every time with not even | a cache on its side. | | As a quick fix, there are many options for limiting per IP | per timespan, e.g. fail2ban, you could configure it to punish | bad apps without crippling functionality for others. Well, | maybe crippling a little bit in some very special use cases, | still better than it simply not working. | Maxburn wrote: | PiHole and PFBlockerng are two big ones that use these | resources too and setting those up it struck me as it did you | that simply polling these resources on a set schedule was a | waste. | | Podcasting 2.0 has been talking about podping as a solution | because podcasting basically has the same problem with periodic | polling of the RSS feed. Basically you subscribe and then | receive notice there's been an update, THEN you go get it. | | https://www.podcasthelpdesk.com/podping-and-other-stuff-with... | RobotToaster wrote: | The difference is pihole only updates weekly. | Maxburn wrote: | Didn't know that, in fact clicking around in the UI I don't | see a way to change that so good on them for being friendly | in this area. | | PFBlocker seems to default to once a day. | tssva wrote: | To change when pihole updates you edit the cron entry at | /etc/cron.d/pihole. | tejtm wrote: | hack-n-patch worm? I recall a white hat doing this for some IoT | annoyance. | Joel_Mckay wrote: | Rate-limit the GeoIP list for the affected areas to drop if more | than 20% of active traffic. i.e. the service outages get co- | located only with the problem users areas. | | Also, when doing auto-updates: always add a chaotic delay offset | 1 to 180 minutes to distribute the traffic loads. Even in an | office with 16 hosts or more this is recommended practice to | prevent cheap routers hitting limits. Another interesting trend, | is magnet/torrent being used for cryptographic-signed commercial | package file distribution. | | Free API keys are sometimes a necessary evil... as sometimes | service abuse is not accidental. | codalan wrote: | That would only work if they had an API; AFAICT, they're just | hosting a file. | | At this point, they might be better off coordinating with the | other major adblocker providers and just outright move the file | elsewhere. Breaking other people's garbage code is better than | breaking yourself trying to fix it. Especially on a budget of | $0.00. | | If the defective code for the browsers are in public repos, it | might also be more effective for someone to just fork the code, | fix the issue (i.e. only download this file once a month, | instead of every startup), and at least give the maintainers a | chance to merge the fix back in. | Joel_Mckay wrote: | It is very common to see API keys in urls for access to what | are essentially flat files. Thus, fairly trivial to change | from: | | https://127.0.0.1/file.csv | | to | | https://127.0.0.1/file.csv?apikey=abc123 | | This could allow client specific quotas, and easy adoption | with maintained projects in minutes. Thus, defective and out- | of-maintenance projects would need manually updated or get a | 404. | | =) | codalan wrote: | Ahhh, good point! | bombcar wrote: | If I recall correctly there was some image on wikipedia that was | getting billions of downloads a day or something, all from India, | because some smart phone had made it a default "hello" image and | hot linked it. | | Unfortunately, I can't find a reference to it anymore. | e12e wrote: | One could take some inspiration and simply rotate the image(s) | - like in the case of wifi leeches: | | https://www.ex-parrot.com/pete/upside-down-ternet.html | robin_reala wrote: | Not that you'd do it, but the temptation there is always to | repoint your real application to a different URL and change the | original image to something subtly NSFW. | timbit42 wrote: | Been there. Done that. Someone had used our image in their | phpBB signature. The hits slowed quite quickly. | neilv wrote: | In case anyone is inspired to do related things, I made a | mistake once (troubling and embarrassing), which I'll mention | in case it helps someone else avoid my mistake... | | In earlier days of the Web, someone appeared to have | hotlinked a photo from a page of mine, as their | avatar/signature in some Web forum for another country, and | it was eating up way too much bandwidth for my little site. | | I handled this in an annoyed and ill-informed way, but which | I thought was good-natured, and years later realized it was | potentially harmful. I'd changed the URL to serve a new | version of the image, to which I'd overlaid text with | progressive political slogans relevant to their country. | (Thinking I was making a statement to the person about the | political issues, and that it would be just a small joke for | them, before they changed their avatar/signature to stop | hotlinking my bandwidth.) Years later, once I had a bit more | understanding of the world, I realized that was very ignorant | and cavalier of me, and might've caused serious government or | social trouble for the person. | | Sensitized by my earlier mistake, I could imagine ways that a | subtly NSFW image could cause problems, especially in the | workplace, and in some other cultures/countries. | trinovantes wrote: | Some Japanese porn makers avoid getting pirated in China by | placing politically sensitive content in the backgrounds | e12e wrote: | Wasn't there news about police officers playing music in | order for videos of them triggering automated | copyright/DMCA takedowns? | | https://www.vice.com/en/article/bvxb94/is-this-beverly- | hills... | braingenious wrote: | That sounds hilarious! Do you have a link to any articles | about this practice? | trinovantes wrote: | Found these old articles but I'm not sure if this is a | widespread practice | | https://news-ltn-com- | tw.translate.goog/news/world/breakingne... (nsfw) | | https://www.rfa.org/english/news/china/japan- | piracy-09252022... | braingenious wrote: | Thanks! That's such a clever idea | numpad0 wrote: | Widespread in the sense that social media users have done | it for long time, and Chinese users are sometimes | counteracting by rewriting those into pro-regime phrases, | but not what considered safe for commercial entities to | exploit. That one is not a professionally produced film. | | 1: https://news-infoseek-co- | jp.translate.goog/article/president... (og: | https://news.infoseek.co.jp/article/president_61325/ ) | | 2: https://i.imgur.com/5hjqu3L.jpg (label on bottle and | window sign) | [deleted] | bombcar wrote: | Yeah, you could get someone gulag'd pretty easily if you | wanted to and they were in the right location. | | Subtle things like flipping the image upside down or | reversing the colors or other "not quite harmful but quite | annoying" responses are probably better, or just serve a | 1x1 pixel image of nothing. | xwdv wrote: | My mind must be in a dark place because once you mentioned | politics I thought of how just sitting at home I could | easily come up with some kind of image that could literally | imprison or kill some one off from thousands of miles away, | without even getting up from the couch. I think I spent | most of my internet youth lusting for such power. | fangril67 wrote: | magic_hamster wrote: | ...or a nice steak. | o_m wrote: | Or something less malicious, like "Donate to Wikipedia", or | some other organization. | Someone wrote: | > Or something less malicious, like "Donate to Wikipedia", | or some other organization. | | https://en.wikipedia.org/wiki/Censorship_of_Wikipedia: | | _"Wikipedia has been blocked in China since 23 April | 2019"_ | | = putting ads for Wikipedia on sites likely isn't safe | everywhere. | | I think it will be very hard to find "some other | organization" that is universally 'approved' everywhere. | Raed667 wrote: | I was debugging a similar issue where a small marketplace run | by a friend was being scrapped and the listings were being | used to make a competing marketplace look more active than it | actually was. | | The thing is, they didn't host the scrapped images | themselves, they just hot-linked everything. | | So through a little nginx config, we turned their entire | homepage to an ad for my friend's platform :) | the8472 wrote: | I assume you mean scraped, not scrapped. | kevingadd wrote: | A startup I used to work for had a horror story from before I | started, where a small .png file had been accidentally | hotlinked from a third party server. The png showed up on a | significant % of users' custom homepages (think myspace, | etc). At some point the person operating the server decided | that instead of emailing someone or blocking the requests, | they'd serve goatse up to a bunch of teenagers and housemoms. | Mildly hilarious depending on your perspective, I guess? | [deleted] | alphabet9000 wrote: | https://www.vice.com/en/article/qjpmyx/why-is-this-flower-on... | [deleted] | [deleted] | drexlspivey wrote: | just add the easylist.to domain to easylist! | wnevets wrote: | A lazy/bad developer ruining something so many people depend on | is incredibly annoying. | shadowgovt wrote: | On the internet, popularity is sometimes indistinguishable from | being targeted by a low-orbit ion cannon. | louison11 wrote: | 1. Restricting access until developers fix it. 2. Consider | encouraging the use of webtorrent within the extension? = each | user hosts and serves the list. | the8472 wrote: | Webtorrent isn't distributed, so you'd just shift the problem | to the tracker/signalling server. | | And in this particular case even BitTorrent proper may not have | helped because steady-state BT is distributed but if a client | doesn't persist its state after a bootstrap - and lack of | persistence is the issue here - it'd hit the bootstrap server | every time. Granted, it'd only be about one UDP packet per | client, much less traffic than what is easylist is seeing, but | foolish code deployed at scale can still overload services | provided on a budget. ___________________________________________________________________ (page generated 2022-10-19 23:00 UTC)