[HN Gopher] We analyzed 425k favicons
       ___________________________________________________________________
        
       We analyzed 425k favicons
        
       Author : gurgeous
       Score  : 229 points
       Date   : 2021-10-20 17:29 UTC (5 hours ago)
        
 (HTM) web link (iconmap.io)
 (TXT) w3m dump (iconmap.io)
        
       | paulirish wrote:
       | Aside: This article is a decent usecase for the esoteric `image-
       | rendering: pixelated;` css property.
        
         | nkriege wrote:
         | Great tip. I've never come across this before. I updated the
         | post and the scaled up icons look much sharper now.
        
         | dmitrygr wrote:
         | I used it to make this PWA work well on iPhones:
         | http://dmitry.gr/89
        
           | lostgame wrote:
           | Ha - that's a fantastically nerdy little project. I love it!
        
       | 1cvmask wrote:
       | The favicon visualization brought memories of the million dollar
       | homepage. I suppose it was precursor of NFTs.
       | 
       | https://en.wikipedia.org/wiki/The_Million_Dollar_Homepage
       | 
       | http://www.milliondollarhomepage.com/
        
       | Groxx wrote:
       | Off in one of the more esoteric corners of favicons, you have
       | games played within the favicon:
       | https://www.youtube.com/watch?v=fpjM5myls7I
       | 
       | Sadly it doesn't quite work for me any more, but the youtube
       | video does a decent job showing what it looked like when it
       | worked.
        
       | tinco wrote:
       | Not really relevant, but using Go to fetch the data, and then
       | Ruby to process the data is the best. I used this exact set up
       | for a project and it was amazing. Really the sweet spot of use
       | cases for both languages.
        
         | tweakimp wrote:
         | Can you please explain why they are the best languages for
         | these jobs?
        
           | tinco wrote:
           | Go's got an awesome feature set built in to the language for
           | building small networked services. I implemented a client to
           | a cryptocurrency network to extract information about its
           | status and clients. I can't really express why it's so good,
           | it just feels right.
           | 
           | Same for Ruby, the syntax is perfectly suited for
           | transforming, digging through and acting upon data. I didn't
           | even add a Gemfile, only used standard library functions,
           | transforming the data the Go program mined into usable
           | information serialized in JSON which was subsequently used as
           | a static database for a webpage.
           | 
           | You can find the source here:
           | https://github.com/tinco/stellar-core-go, the Go is in cmd
           | and the Ruby is in tools.
           | 
           | The site it powers is now defunct, apparently they changed
           | some stuff in the past 3 years and the crawler no longer
           | functions.
        
       | whalesalad wrote:
       | I have always wanted to do this _exact_ analysis - so awesome!
       | Every time I am building some kind of semi-intelligent parser to
       | fetch an arbitrary visual icon for a URL I think to myself there
       | has gotta be a better way do do this.
        
       | munk-a wrote:
       | Didn't they miss all the pre-sized icons in their scan as well?
       | For a while Apple encouraged multiple resolution sizes for
       | favicons for... reasons.
       | 
       | I know they additionally missed the directory specific favicons
       | which have always had iffy support (i.e. /index.html =>
       | /favicon.ico and /munks-page/index.html => /munks-
       | page/favicon.ico)
        
       | achillean wrote:
       | Nmap generated a similar version many years ago and it's still
       | available at:
       | 
       | https://nmap.org/favicon/
       | 
       | We also did something looks at favicons by IP:
       | 
       | https://faviconmap.shodan.io/
        
       | [deleted]
        
       | arp242 wrote:
       | I got mine down to 160 bytes with some pixel tweaking and
       | converting it to a 16-color indexed PNG. It's not a lot of work
       | or very difficult (I'm an idiot at graphics editing), but you do
       | need to spend the (small amount of) effort. I embed it as a data
       | URI and it's just four lines of (col-80 wrapped) base64 text,
       | which seems reasonable to me.
       | 
       | Haven't managed to get my headshot down to less than 10k without
       | looking horrible no matter how much I tweaked the JPEG or WebP
       | settings, and thought that was just a tad too big to embed. Maybe
       | I need to find a different picture that compresses better.
       | 
       | I got that 280k Discord favicon down to just 24K simply by
       | opening it in GIMP and saving it again. I got it down to 12K by
       | indexing it to 255 colours rather than using RGB (I can't tell
       | the difference even at full size). You can probably make it even
       | smaller if you tried, but that's diminishing returns. Still, I
       | bet with 5 more minutes you can get it to ~5k or so.
       | 
       | It's very easy; you just need to care. Does it matter? Well, when
       | I used Slack I regularly spent a minute waiting for them to push
       | their >10M updates, so I'd say that 250k here and 250k there etc.
       | adds up and matters, giving real actual improvements to your
       | customers.
       | 
       | The Event Horizon Telescope having a huge favicon I can
       | understand; probably just some astronomer who uploaded it in
       | WordPress or something. Arguably a fault of the software for not
       | dealing with that more sensibly, but these sort of oversights
       | happen. A tech company making custom software for a living is
       | quite frankly just embarrassing to the entire industry. It's a
       | big fat "fuck you" to anyone from less developed areas with less-
       | than-ideal internet connections.
        
         | TheJoeMan wrote:
         | " I got that 280k Discord favicon down to just 24K simply by
         | opening it in GIMP and saving it again. "
         | 
         | You made me laugh out loud.
         | 
         | I agree that stuff like YouTube.com saying 144x but really 145x
         | seems like it should be embarrassing.
        
           | arp242 wrote:
           | I wouldn't be surprised if that was for a specific reason,
           | like somehow showing up better somewhere for some reason, or
           | something like that. Or maybe not; who knows...
        
         | fbrchps wrote:
         | Oh hey, Discord must have seen this article -- their favicon is
         | down to 14k now.
        
           | gremloni wrote:
           | That's lit and a fantastic turnaround. Great work to whoever
           | is reading this!
        
         | nerfhammer wrote:
         | there are png optimizer programs, e.g. optipng
        
           | pseudosavant wrote:
           | The Squoosh (web) app is awesome for this too! All processing
           | is done locally with wasm.
           | 
           | https://squoosh.app
        
           | vadfa wrote:
           | `optipng -o9 -strip all' is a must
        
         | JohnTHaller wrote:
         | 256x256 PNG reduced to 256 colors with pixel transparency gets
         | it to 2.68K. I manually dropped the color depth to indexed and
         | saved it out in PhotoShop and I used FileOptimizer to shrink
         | it. It includes 12 different image shrinkers and runs them all.
        
       | TazeTSchnitzel wrote:
       | The non-PNG Apple touch icons might be CgBI files? It's an
       | undocumented proprietary Apple extension to PNG which most PNG
       | tools won't accept, but which Xcode uses for iOS apps.
        
       | ryan29 wrote:
       | > In fact, I recommend that browsers ignore these hints because
       | they are wrong much of the time.
       | 
       | I don't agree. That's the kind of coddling that encourages
       | incompetence. Instead of compensating for others' mistakes, just
       | let their stuff break.
       | 
       | I wonder if Safai on iOS ignores the hints. When I tested, I was
       | surprised to see that pressing the share icon, which holds the
       | option for `Add to Home Screen`, would cause a download of all of
       | the icons listed with `link rel="icon"`.
       | 
       | Favicons are a huge pain to deal with correctly.
        
         | malfist wrote:
         | People make mistakes all the time. Breaking because somebody
         | made a mistake that you can correct for just leads to
         | unnecessarily fragile code.
         | 
         | What's the point of failing and breaking stuff if someone tells
         | you their image is 144x144 but it's really 145x145? Who does
         | that benefit?
        
           | anyfoo wrote:
           | The opposite is the case. Overall, being too lenient in what
           | code accepts and applying heuristics will lead to way worse
           | problems down the line. For example, you want your compiler
           | to fail hard instead of saying: "Oh, this isn't a pointer,
           | but I'm sure you meant well, I'm just going to treat it as a
           | pointer!"
           | 
           | In _this_ particular case, it seems to me that the hints
           | serve no purpose and should be abolished, and in the meantime
           | fully ignored, altogether. All necessary metadata is
           | contained in the image file, and browsers should also be
           | (relatively) strict in what image files with what metadata
           | they accept, for security reasons alone.
           | 
           | And if they also went so far as limiting file size, the
           | perpetrators that clog up bandwidth by putting up multi-MB
           | favicons would catch on much earlier (or at all), too.
           | 
           | So what actually is the point of those hints, if browsers
           | have to fallback anyway?
        
         | iudqnolq wrote:
         | YouTube and Twitter both have wrong parameters. Presumably this
         | means all major browsers ignore them or someone would have
         | noticed their favicons not displaying right?
        
         | paxys wrote:
         | Browsers ignore the hints because they aren't needed. The image
         | file itself has everything you need for rendering it.
        
           | ygra wrote:
           | The point for the hints is probably that the browser doesn't
           | need to fetch the 2000x2000 favicon if it only needs
           | something in 16x16 to render in the tab bar.
        
         | Conlectus wrote:
         | A problem with this is that when a website breaks in one
         | browser, but works in another, I imagine most people's reaction
         | would be to blame the browser. This leads to a kind of race-to-
         | the-bottom for browser compatibility. See for example the
         | history of User-Agent strings.
        
       | silvestrov wrote:
       | It such a shame that Safari does not support SVG favicons. It's
       | the only major browser which doesn't: https://caniuse.com/link-
       | icon-svg
       | 
       | All current browsers support PNG.
        
         | amelius wrote:
         | Don't hold your breath. Safari is the new IE6.
        
       | ChrisArchitect wrote:
       | What is the Tranco dataset that this is based on? I mean come on
       | -- anything that claims to be based on 'Alexa' (or any of these
       | others: Cisco Umbrella/openDNS? Majestic? Quantcast?) is sooo
       | suspect. None of these sources are that good and especially Alexa
       | which harks back to a time 20 years ago of browser toolbars and
       | extensions which the large majority do not use anymore.
       | 
       | Just saying yes maybe it's easy to come up with a top 1000 list
       | of sites on the net, but other than that no one really knows
       | unless you're like Google/Bing/Apple/Cloudflare that have
       | redirection urls tracking clicks etc
        
       | gurgeous wrote:
       | Also, we turned up 2,000 domains that redirect to a very shady
       | site called happyfamilymedstore[dot]com. Stuff like
       | avanafill[dot]com, pfzviagra[dot]com, prednisoloneotc[dot]com.
       | These domains made it into the Tranco 100k somehow.
       | 
       | Full list here -
       | https://gist.github.com/gurgeous/bcb3e851087763efe4b2f4b992f...
        
         | unicornporn wrote:
         | Lately, happyfamilymedstore has mysteriously always been in the
         | top ~ten Google Images results for super niche bicycle parts
         | searches I do. They seem to have ripped an insane amount if
         | images that gets reposted on their domain.
        
           | 0des wrote:
           | What kind of parts are you looking for?
        
         | noitpmeder wrote:
         | Does anyone know the story behind these? How do seemingly
         | obscure sites consistently get massive amount of obscure
         | content placed highly in results.
        
           | jacurtis wrote:
           | What most of them do is they will use Wordpress exploits to
           | get into random wordpress website ran by people who know
           | nothing about managing a website and are running on a $3/mo
           | shared hosting account.
           | 
           | After they get into these random wordpress sites, then then
           | embed links back to their sketchy site in obscure places on
           | the wordpress site that they hacked, so that owners of the
           | site don't notice, but search bots do. They usually leave the
           | wordpress site alone, but will create a user account to get
           | back into it again later if Wordpress patches an exploit. All
           | of this exploit and link adding is automated, so it is just
           | done by crawlers and bots.
           | 
           | This is done tens of thousands or even millions of times
           | over. All of these sketchy backlinks eventually add up, even
           | if they are low quality, and provide higher ranking for the
           | site they all point to.
           | 
           | Think of websites like mommy blogs, diet diaries, family
           | sites, personal blogs, and random service companies
           | (plumbers, pest control, restaurants, etc) that had their
           | nephew throw up a wordpress site instead of hiring a
           | professional.
           | 
           | I don't mean to pick on wordpress, but it really is the most
           | common culprit of these attacks. Because so many Wordpress
           | sites exist that are operated by people who aren't informed
           | about basic security. Plus, wordpress is open source, so
           | exploits get discovered by looking at source code and
           | attackers will sell those exploits instead of reporting them.
           | So Wordpress is in an infinite cycle of chasing exploits and
           | patching them.
        
             | lazide wrote:
             | Pretty sure closed source wasn't very effective at stopping
             | 0days either (Windows). The most common platform gets the
             | attention generally.
        
             | shuntress wrote:
             | > "had their nephew throw up a wordpress site instead of
             | hiring a professional"
             | 
             | The web is _supposed_ to be accessible to everyone.
             | 
             | This type of "blame the victim" attitude is a poor way to
             | handle criminal activity.
        
           | IncRnd wrote:
           | It happens through search engine optimization, SEO, and a mix
           | of planting reviews and other tactics. Think of it like this
           | - what would you do to get people talking about your site?
           | You'd somehow put links, conversations, reviews, quotes, etc.
           | in front of them.
        
       | comeonseriously wrote:
       | Slightly OT, but what was that one that came around a few years
       | ago that would make everyone's CPU go to 100%?
        
       | nanis wrote:
       | I know of a company whose favicon was a hires true color PNG that
       | weighed in at more than 2 MB. The web site was the dominion of
       | marketing. Suggestions to improve the situation were detrimental
       | to one's career path. _sigh_
        
       | anyfoo wrote:
       | ... and wrote an interesting technical article about it, that
       | even someone like me, who doesn't do web development, enjoys
       | reading. Definitely why I come to HN (no sarcasm, it is).
        
       | toast0 wrote:
       | Favicons are slightly useful. You can serve your page at
       | http://www.example.com with a favicon from https://example.com
       | that has a HTTP Strict-Transport-Security header with
       | includeSubDomains, and then future page loads in that browser
       | will be https (across your whole domain). (This assumes you want
       | your domain to be https)
       | 
       | Other than that, I'm still pretty meh about them.
        
       | gurgeous wrote:
       | Also see the gigantic map - https://iconmap.io
       | 
       | The blog post is the analysis of the data set, the map is the
       | visualization.
        
         | isoprophlex wrote:
         | Is the dataset available for download? I couldn't immediately
         | find a download to the dataset in the linked article.
         | 
         | My hands itch to do some dimension reduction on that data and
         | make some nice plots
        
           | nkriege wrote:
           | We'd be happy to share the data. Reach us at help at
           | gurge.com if you're interested.
        
         | oehpr wrote:
         | I wonder if there might be a way to map all these using t-SNE
         | to discrete grid locations? Maybe even an autoencoder. I'd love
         | to see what features it could pick out.
         | 
         | I don't see their data set though. hmmm.
         | 
         | maybe I'll just have to crawl it on my own if I want to do it.
        
           | yboris wrote:
           | side note: instead of t-SNE consider UMAP - provides better
           | results (and it's _much_ faster)
           | https://github.com/lmcinnes/umap
        
         | svdr wrote:
         | I see a lot of repetitions in the map?
        
           | gurgeous wrote:
           | It's one icon per domain. Try hovering (on desktop) and
           | you'll see that many domains have the same favicon.
        
             | true_religion wrote:
             | It also works on mobile if you tap the fav icon.
        
       | bellyfullofbac wrote:
       | Huh, there's a row of identical icons of 3 blue circles (search
       | for cashadvancewow[dot]com) and all the domains using them are
       | loan-related. Interesting way to do forensics on clone sites
       | (although trying a few of them, they're not showing any icons
       | right now, and the URL /favicon.ico 404's)
       | 
       | And I checked a few of the sites, I just got lorem-ipsum style
       | landing pages. I wonder what's the point, or are the scammers
       | using the domains mostly for emails?
        
       | quitit wrote:
       | The difference between the Apple "precomposed" and standard icons
       | had to do with the gloss effect on icons on pre iOS 7 home
       | screens.
       | 
       | When adding a website/webapp to these earlier home screens, the
       | OS would apply a gloss effect over the icon in order to match the
       | aesthetic of the standard apps. The precomposed icon was a way
       | for the developer to stop the OS from applying this effect, such
       | as if their logo already had a different gloss effect already
       | applied (i.e "precomposed") or other design where adding the
       | glossy shine wouldn't look right. The standard icon allowed the
       | OS to apply the gloss effect - which was a timesaver as Apple did
       | tweak the gloss contour over the years: hence using a standard
       | icon ensured that the website/webapp always matched the user's OS
       | version.
        
       ___________________________________________________________________
       (page generated 2021-10-20 23:00 UTC)