[HN Gopher] We analyzed 425k favicons ___________________________________________________________________ We analyzed 425k favicons Author : gurgeous Score : 229 points Date : 2021-10-20 17:29 UTC (5 hours ago) (HTM) web link (iconmap.io) (TXT) w3m dump (iconmap.io) | paulirish wrote: | Aside: This article is a decent usecase for the esoteric `image- | rendering: pixelated;` css property. | nkriege wrote: | Great tip. I've never come across this before. I updated the | post and the scaled up icons look much sharper now. | dmitrygr wrote: | I used it to make this PWA work well on iPhones: | http://dmitry.gr/89 | lostgame wrote: | Ha - that's a fantastically nerdy little project. I love it! | 1cvmask wrote: | The favicon visualization brought memories of the million dollar | homepage. I suppose it was precursor of NFTs. | | https://en.wikipedia.org/wiki/The_Million_Dollar_Homepage | | http://www.milliondollarhomepage.com/ | Groxx wrote: | Off in one of the more esoteric corners of favicons, you have | games played within the favicon: | https://www.youtube.com/watch?v=fpjM5myls7I | | Sadly it doesn't quite work for me any more, but the youtube | video does a decent job showing what it looked like when it | worked. | tinco wrote: | Not really relevant, but using Go to fetch the data, and then | Ruby to process the data is the best. I used this exact set up | for a project and it was amazing. Really the sweet spot of use | cases for both languages. | tweakimp wrote: | Can you please explain why they are the best languages for | these jobs? | tinco wrote: | Go's got an awesome feature set built in to the language for | building small networked services. I implemented a client to | a cryptocurrency network to extract information about its | status and clients. I can't really express why it's so good, | it just feels right. | | Same for Ruby, the syntax is perfectly suited for | transforming, digging through and acting upon data. I didn't | even add a Gemfile, only used standard library functions, | transforming the data the Go program mined into usable | information serialized in JSON which was subsequently used as | a static database for a webpage. | | You can find the source here: | https://github.com/tinco/stellar-core-go, the Go is in cmd | and the Ruby is in tools. | | The site it powers is now defunct, apparently they changed | some stuff in the past 3 years and the crawler no longer | functions. | whalesalad wrote: | I have always wanted to do this _exact_ analysis - so awesome! | Every time I am building some kind of semi-intelligent parser to | fetch an arbitrary visual icon for a URL I think to myself there | has gotta be a better way do do this. | munk-a wrote: | Didn't they miss all the pre-sized icons in their scan as well? | For a while Apple encouraged multiple resolution sizes for | favicons for... reasons. | | I know they additionally missed the directory specific favicons | which have always had iffy support (i.e. /index.html => | /favicon.ico and /munks-page/index.html => /munks- | page/favicon.ico) | achillean wrote: | Nmap generated a similar version many years ago and it's still | available at: | | https://nmap.org/favicon/ | | We also did something looks at favicons by IP: | | https://faviconmap.shodan.io/ | [deleted] | arp242 wrote: | I got mine down to 160 bytes with some pixel tweaking and | converting it to a 16-color indexed PNG. It's not a lot of work | or very difficult (I'm an idiot at graphics editing), but you do | need to spend the (small amount of) effort. I embed it as a data | URI and it's just four lines of (col-80 wrapped) base64 text, | which seems reasonable to me. | | Haven't managed to get my headshot down to less than 10k without | looking horrible no matter how much I tweaked the JPEG or WebP | settings, and thought that was just a tad too big to embed. Maybe | I need to find a different picture that compresses better. | | I got that 280k Discord favicon down to just 24K simply by | opening it in GIMP and saving it again. I got it down to 12K by | indexing it to 255 colours rather than using RGB (I can't tell | the difference even at full size). You can probably make it even | smaller if you tried, but that's diminishing returns. Still, I | bet with 5 more minutes you can get it to ~5k or so. | | It's very easy; you just need to care. Does it matter? Well, when | I used Slack I regularly spent a minute waiting for them to push | their >10M updates, so I'd say that 250k here and 250k there etc. | adds up and matters, giving real actual improvements to your | customers. | | The Event Horizon Telescope having a huge favicon I can | understand; probably just some astronomer who uploaded it in | WordPress or something. Arguably a fault of the software for not | dealing with that more sensibly, but these sort of oversights | happen. A tech company making custom software for a living is | quite frankly just embarrassing to the entire industry. It's a | big fat "fuck you" to anyone from less developed areas with less- | than-ideal internet connections. | TheJoeMan wrote: | " I got that 280k Discord favicon down to just 24K simply by | opening it in GIMP and saving it again. " | | You made me laugh out loud. | | I agree that stuff like YouTube.com saying 144x but really 145x | seems like it should be embarrassing. | arp242 wrote: | I wouldn't be surprised if that was for a specific reason, | like somehow showing up better somewhere for some reason, or | something like that. Or maybe not; who knows... | fbrchps wrote: | Oh hey, Discord must have seen this article -- their favicon is | down to 14k now. | gremloni wrote: | That's lit and a fantastic turnaround. Great work to whoever | is reading this! | nerfhammer wrote: | there are png optimizer programs, e.g. optipng | pseudosavant wrote: | The Squoosh (web) app is awesome for this too! All processing | is done locally with wasm. | | https://squoosh.app | vadfa wrote: | `optipng -o9 -strip all' is a must | JohnTHaller wrote: | 256x256 PNG reduced to 256 colors with pixel transparency gets | it to 2.68K. I manually dropped the color depth to indexed and | saved it out in PhotoShop and I used FileOptimizer to shrink | it. It includes 12 different image shrinkers and runs them all. | TazeTSchnitzel wrote: | The non-PNG Apple touch icons might be CgBI files? It's an | undocumented proprietary Apple extension to PNG which most PNG | tools won't accept, but which Xcode uses for iOS apps. | ryan29 wrote: | > In fact, I recommend that browsers ignore these hints because | they are wrong much of the time. | | I don't agree. That's the kind of coddling that encourages | incompetence. Instead of compensating for others' mistakes, just | let their stuff break. | | I wonder if Safai on iOS ignores the hints. When I tested, I was | surprised to see that pressing the share icon, which holds the | option for `Add to Home Screen`, would cause a download of all of | the icons listed with `link rel="icon"`. | | Favicons are a huge pain to deal with correctly. | malfist wrote: | People make mistakes all the time. Breaking because somebody | made a mistake that you can correct for just leads to | unnecessarily fragile code. | | What's the point of failing and breaking stuff if someone tells | you their image is 144x144 but it's really 145x145? Who does | that benefit? | anyfoo wrote: | The opposite is the case. Overall, being too lenient in what | code accepts and applying heuristics will lead to way worse | problems down the line. For example, you want your compiler | to fail hard instead of saying: "Oh, this isn't a pointer, | but I'm sure you meant well, I'm just going to treat it as a | pointer!" | | In _this_ particular case, it seems to me that the hints | serve no purpose and should be abolished, and in the meantime | fully ignored, altogether. All necessary metadata is | contained in the image file, and browsers should also be | (relatively) strict in what image files with what metadata | they accept, for security reasons alone. | | And if they also went so far as limiting file size, the | perpetrators that clog up bandwidth by putting up multi-MB | favicons would catch on much earlier (or at all), too. | | So what actually is the point of those hints, if browsers | have to fallback anyway? | iudqnolq wrote: | YouTube and Twitter both have wrong parameters. Presumably this | means all major browsers ignore them or someone would have | noticed their favicons not displaying right? | paxys wrote: | Browsers ignore the hints because they aren't needed. The image | file itself has everything you need for rendering it. | ygra wrote: | The point for the hints is probably that the browser doesn't | need to fetch the 2000x2000 favicon if it only needs | something in 16x16 to render in the tab bar. | Conlectus wrote: | A problem with this is that when a website breaks in one | browser, but works in another, I imagine most people's reaction | would be to blame the browser. This leads to a kind of race-to- | the-bottom for browser compatibility. See for example the | history of User-Agent strings. | silvestrov wrote: | It such a shame that Safari does not support SVG favicons. It's | the only major browser which doesn't: https://caniuse.com/link- | icon-svg | | All current browsers support PNG. | amelius wrote: | Don't hold your breath. Safari is the new IE6. | ChrisArchitect wrote: | What is the Tranco dataset that this is based on? I mean come on | -- anything that claims to be based on 'Alexa' (or any of these | others: Cisco Umbrella/openDNS? Majestic? Quantcast?) is sooo | suspect. None of these sources are that good and especially Alexa | which harks back to a time 20 years ago of browser toolbars and | extensions which the large majority do not use anymore. | | Just saying yes maybe it's easy to come up with a top 1000 list | of sites on the net, but other than that no one really knows | unless you're like Google/Bing/Apple/Cloudflare that have | redirection urls tracking clicks etc | gurgeous wrote: | Also, we turned up 2,000 domains that redirect to a very shady | site called happyfamilymedstore[dot]com. Stuff like | avanafill[dot]com, pfzviagra[dot]com, prednisoloneotc[dot]com. | These domains made it into the Tranco 100k somehow. | | Full list here - | https://gist.github.com/gurgeous/bcb3e851087763efe4b2f4b992f... | unicornporn wrote: | Lately, happyfamilymedstore has mysteriously always been in the | top ~ten Google Images results for super niche bicycle parts | searches I do. They seem to have ripped an insane amount if | images that gets reposted on their domain. | 0des wrote: | What kind of parts are you looking for? | noitpmeder wrote: | Does anyone know the story behind these? How do seemingly | obscure sites consistently get massive amount of obscure | content placed highly in results. | jacurtis wrote: | What most of them do is they will use Wordpress exploits to | get into random wordpress website ran by people who know | nothing about managing a website and are running on a $3/mo | shared hosting account. | | After they get into these random wordpress sites, then then | embed links back to their sketchy site in obscure places on | the wordpress site that they hacked, so that owners of the | site don't notice, but search bots do. They usually leave the | wordpress site alone, but will create a user account to get | back into it again later if Wordpress patches an exploit. All | of this exploit and link adding is automated, so it is just | done by crawlers and bots. | | This is done tens of thousands or even millions of times | over. All of these sketchy backlinks eventually add up, even | if they are low quality, and provide higher ranking for the | site they all point to. | | Think of websites like mommy blogs, diet diaries, family | sites, personal blogs, and random service companies | (plumbers, pest control, restaurants, etc) that had their | nephew throw up a wordpress site instead of hiring a | professional. | | I don't mean to pick on wordpress, but it really is the most | common culprit of these attacks. Because so many Wordpress | sites exist that are operated by people who aren't informed | about basic security. Plus, wordpress is open source, so | exploits get discovered by looking at source code and | attackers will sell those exploits instead of reporting them. | So Wordpress is in an infinite cycle of chasing exploits and | patching them. | lazide wrote: | Pretty sure closed source wasn't very effective at stopping | 0days either (Windows). The most common platform gets the | attention generally. | shuntress wrote: | > "had their nephew throw up a wordpress site instead of | hiring a professional" | | The web is _supposed_ to be accessible to everyone. | | This type of "blame the victim" attitude is a poor way to | handle criminal activity. | IncRnd wrote: | It happens through search engine optimization, SEO, and a mix | of planting reviews and other tactics. Think of it like this | - what would you do to get people talking about your site? | You'd somehow put links, conversations, reviews, quotes, etc. | in front of them. | comeonseriously wrote: | Slightly OT, but what was that one that came around a few years | ago that would make everyone's CPU go to 100%? | nanis wrote: | I know of a company whose favicon was a hires true color PNG that | weighed in at more than 2 MB. The web site was the dominion of | marketing. Suggestions to improve the situation were detrimental | to one's career path. _sigh_ | anyfoo wrote: | ... and wrote an interesting technical article about it, that | even someone like me, who doesn't do web development, enjoys | reading. Definitely why I come to HN (no sarcasm, it is). | toast0 wrote: | Favicons are slightly useful. You can serve your page at | http://www.example.com with a favicon from https://example.com | that has a HTTP Strict-Transport-Security header with | includeSubDomains, and then future page loads in that browser | will be https (across your whole domain). (This assumes you want | your domain to be https) | | Other than that, I'm still pretty meh about them. | gurgeous wrote: | Also see the gigantic map - https://iconmap.io | | The blog post is the analysis of the data set, the map is the | visualization. | isoprophlex wrote: | Is the dataset available for download? I couldn't immediately | find a download to the dataset in the linked article. | | My hands itch to do some dimension reduction on that data and | make some nice plots | nkriege wrote: | We'd be happy to share the data. Reach us at help at | gurge.com if you're interested. | oehpr wrote: | I wonder if there might be a way to map all these using t-SNE | to discrete grid locations? Maybe even an autoencoder. I'd love | to see what features it could pick out. | | I don't see their data set though. hmmm. | | maybe I'll just have to crawl it on my own if I want to do it. | yboris wrote: | side note: instead of t-SNE consider UMAP - provides better | results (and it's _much_ faster) | https://github.com/lmcinnes/umap | svdr wrote: | I see a lot of repetitions in the map? | gurgeous wrote: | It's one icon per domain. Try hovering (on desktop) and | you'll see that many domains have the same favicon. | true_religion wrote: | It also works on mobile if you tap the fav icon. | bellyfullofbac wrote: | Huh, there's a row of identical icons of 3 blue circles (search | for cashadvancewow[dot]com) and all the domains using them are | loan-related. Interesting way to do forensics on clone sites | (although trying a few of them, they're not showing any icons | right now, and the URL /favicon.ico 404's) | | And I checked a few of the sites, I just got lorem-ipsum style | landing pages. I wonder what's the point, or are the scammers | using the domains mostly for emails? | quitit wrote: | The difference between the Apple "precomposed" and standard icons | had to do with the gloss effect on icons on pre iOS 7 home | screens. | | When adding a website/webapp to these earlier home screens, the | OS would apply a gloss effect over the icon in order to match the | aesthetic of the standard apps. The precomposed icon was a way | for the developer to stop the OS from applying this effect, such | as if their logo already had a different gloss effect already | applied (i.e "precomposed") or other design where adding the | glossy shine wouldn't look right. The standard icon allowed the | OS to apply the gloss effect - which was a timesaver as Apple did | tweak the gloss contour over the years: hence using a standard | icon ensured that the website/webapp always matched the user's OS | version. ___________________________________________________________________ (page generated 2021-10-20 23:00 UTC)