[HN Gopher] Analyzing Analytics (Featuring: The FBI)
       ___________________________________________________________________
        
       Analyzing Analytics (Featuring: The FBI)
        
       Author : benryon
       Score  : 234 points
       Date   : 2020-04-24 16:46 UTC (6 hours ago)
        
 (HTM) web link (exploits.run)
 (TXT) w3m dump (exploits.run)
        
       | LyndsySimon wrote:
       | The big takeaway from this article for me is that I should
       | probably look for or write a browser extension that tracks
       | changes to analytics tools and IDs on sites. If a site is
       | silently taken over, the state actor would either need to
       | separately gain access to the analytics tool accounts, or would
       | need to modify the IDs to connect to a new account. I'd love to
       | see how often tracking IDs change on high-profile sites.
        
       | jcrawfordor wrote:
       | Looking at the Siberian husky site... stdLauncher.js is part of
       | Verint ForeSee, one of those "would you like to take a survey
       | about our website" solutions. The AAM analytics code right above
       | the survey and urchin code lists as domain an IP associated with
       | Sungard AS, an outfit that holds a number of federal contracts
       | for IT services. This IP, 209.235.0.153, hosted the FBI website
       | at some point in time. It's oddly easy to figure this out, even
       | without something like a DomainTools subscription, because there
       | are a lot of people scraping and archiving the FBI most wanted
       | pages due to their cultural significance.
       | 
       | Some searching on code samples shows that the AAM section of
       | analytics code is an exact match for analytics code served up by
       | an older version of the FBI's most wanted website. Likely that it
       | was also used on older versions of other FBI websites as well.
       | 
       | In the end I find it unlikely that this website has anything to
       | do with the FBI, and more likely that the website owner copy-
       | pastad a large section of source code and accidentally ended up
       | with this result.
       | 
       | One bit of commonality I've noticed is that a lot of websites
       | with the FBI tracking code were all built with FrontPage. I'm not
       | sure if this is causal or coincidental, but perhaps it
       | contributes to this that FrontPage allows you to open a webpage
       | that you saved from IE and edit it... which might lead to some
       | websites being complete duplicates of FBI websites, except for
       | visible content, simply because websites like the FBI most wanted
       | were relatively prominent parts of the early internet.
       | 
       | Edit: I spent a little time riding the WayBackMachine to some of
       | the other webpages when they were apparently using FBI analytics
       | code. The results are odd but they're so inconsistent that it's
       | hard to think it was at all intentional. One interesting finding
       | is that both ohthx.com and ppc-guy.com, at the time they
       | supposedly had the FBI analytics ID, were apparently hosting an
       | analytics package called Prosper202 that redirected the
       | WayBackMachine crawler from the login page to fbi.gov. I have a
       | suspicion that this was a partially-joking way to deter crawling
       | of the admin interface of the software. The record that they used
       | the FBI analytics code is presumably just an artifact of the
       | crawler following the redirect. It seems that this exact
       | Prosper202 behavior results in the majority of the old hits.
        
       | A4ET8a8uTh0 wrote:
       | That is a fascinating read. It sounds like it is also prudent to
       | use separate analytics ID on your websites if you choose to go
       | that route.
        
       | maerF0x0 wrote:
       | In his book "Permanent Record" Edward Snowden[1] describes fake
       | websites used by government agencies to disguise internet traffic
       | that is actually use for spy craft stuff.
       | 
       | eg: maybe a website about siberian huskies actually has a hidden
       | login or hosts another service when contacted on port 80/443 in
       | just the right way?
       | 
       | Now, that would make more sense for the CIA than the FBI, but I
       | think it illustrates another avenue of interpretation
       | 
       | [1]: https://www.goodreads.com/book/show/46223297-permanent-
       | recor...
        
         | poyu wrote:
         | That doesn't make sense, why would they even let people know
         | that there's a connection? The hidden login part may be true,
         | but just not on a sites that are related so obviously. It could
         | be a smokescreen of some kind though.
        
           | maerF0x0 wrote:
           | I agree that having a fbi google analytics would be a gaffe
        
         | LyndsySimon wrote:
         | Interesting. I've got his book on my reading list, but haven't
         | gotten to it yet.
         | 
         | I just made a tenuous mental connection between this concept
         | and a Reddit phenomenon called "Lake City quiet pills". I heard
         | about it on the podcast "Stuff They Don't Want You To Know" and
         | it held my interest for a few hours' worth of investigation.
         | 
         | The short version is that a Redditor died. He was a
         | stereotypical grumpy old dude, and someone hopped on Reddit and
         | posted that he'd passed. Someone got interested and tied that
         | poster to some websites, one of which had a bunch of stuff
         | hidden in the public source. It definitely seems like a
         | clandestine group of some kind communicating, but to who it was
         | and to what end isn't clear. The Reddit conspiracist belief
         | seems to be that it was a group of assassins-for-hire.
         | 
         | Podcast: https://www.iheart.com/podcast/182-stuff-they-dont-
         | want-you-... Subreddit:
         | https://www.reddit.com/r/LakeCityQuietPills/
        
         | joering2 wrote:
         | "Pet scam" is a big business [1]
         | 
         | In this example its quite possible FBI put their traps to get
         | better understanding what third parties are involved; who is
         | visiting the site, and probably some admin management page
         | behind it. Sort of like get the contacts of a criminal and go
         | from there.
         | 
         | [1] https://www.ipata.org/current-pet-scams
        
       | galacticaactual wrote:
       | Pointing back to a government domain is not how nation state
       | monitoring infrastructure is set up.
        
         | [deleted]
        
         | save_ferris wrote:
         | Sure, this isn't a comprehensive strategy, but you'd be amazed
         | at how far behind some of those agencies are in terms of day-
         | to-day operations for investigations.
         | 
         | A relative of mine works at FBI and several years back he told
         | me a story about how an investigation into an organized crime
         | syndicate was blown up because an agent on the case was dumb
         | enough to check out the target's LinkedIn profile while he was
         | logged into his own real account. So the target got a
         | notification that Joe Blow from the FBI had just viewed his
         | profile. Over a year of work down the drain with a single GET
         | request, crazy.
        
           | galacticaactual wrote:
           | My issue is the confidence with which the author presupposes
           | that the existence of this code on sites indicates seizure or
           | utilization in an investigation. It is a lazy position that
           | leaves others (i.e. HN readers in this thread) with a little
           | more intellectual horsepower to evaluate the other - and
           | frankly more realistic - alternatives.
        
             | not_a_moth wrote:
             | What are the more realistic alternatives?
        
               | galacticaactual wrote:
               | Please refer to the (current) top comment.
        
               | mimi89999 wrote:
               | Please see my comment:
               | https://news.ycombinator.com/item?id=22970996
        
       | mimi89999 wrote:
       | Or maybe they just stolen code from FBI website to have a feature
       | and pulled way more code than required without even knowing what
       | it does.
        
         | mimi89999 wrote:
         | A coworker sysadmin once told me that when he was inspecting
         | the web server access logs (for an unrelated reason) he noticed
         | that many requests to a resource on our website have a strange
         | referer URL that was never present in requests to pages. He
         | inspected that site and found that they were using our
         | resource. We didn't really care about it, but that was really
         | interesting.
         | 
         | Maybe it's the same with these sites?
        
       | three_seagrass wrote:
       | This technique was recently done by some redditors to uncover
       | that the multi-state COVID reopen protest is being pushed by some
       | guy who uses an antique shop in FL as a front for his shell LLCs.
       | 
       | They are the websites that are being used on the facebook pages
       | that are primarily pushing 'reopen' content, and the GA accounts
       | on those pages links them to a bunch of pro-firearm shell corps
       | as well.
       | 
       | Here's the thread. It got deleted since it was deemed as doxxing
       | (a reddit no-no) even though Whois data is public:
       | 
       | http://removeddit.com/r/maryland/comments/g3niq3/i_simply_ca...
        
         | maxchehab wrote:
         | Krebs also mentioned this in his recent post
         | https://krebsonsecurity.com/2020/04/whos-behind-the-reopen-d...
         | 
         | A very interesting way to associate the same site owners!
        
           | LegitShady wrote:
           | Look at the updates on that post, nothing is so clear cut.
           | The problem with internet sleuthing is that everyone gets
           | very excited and innocent people can be injured in the
           | unnecessary witch-hunt.
           | 
           | >Update, April 21, 6:40 a.m. ET: Mother Jones has published a
           | compelling interview with Mr. Murphy, who says he registered
           | thousands of dollars worth of "reopen" and "liberate" domains
           | to keep them out of the hands of people trying to organize
           | protests. KrebsOnSecurity has not be able to validate this
           | report, but it's a fascinating twist to this tale: How an
           | 'Old Hippie' Got Accused of Astroturfing the Right-Wing
           | Campaign to Reopen the Economy
           | 
           | Update, April 22, 1:52 p.m. ET: Mr. Murphy told
           | Jacksonville.com he did not register reopenmn.com or
           | reopenpa.com, contrary to data in the spreadsheet linked
           | above. I looked up each of the records in that spreadsheet
           | manually, but did have some help from another source in
           | compiling and sorting the information. It is possible the
           | registration data for those domains got transposed with
           | reopenmd.com and reopenva.com, which included Mr. Murphy's
           | information prior to being redacted by the domain registrar.
        
             | panda-giddiness wrote:
             | Right, and this is exactly why reddit bans doxxing. The
             | original reddit poster was correct that there was a single
             | individual buying most of these domains; however, other
             | than purchasing the domains, there was no evidence that the
             | individual was using those domains to promote protests.
             | Let's not forget reddit's Boston bombing debacle[1].
             | 
             | [1] https://en.wikipedia.org/wiki/Sunil_Tripathi#Misidentif
             | icati...
        
         | whycombagator wrote:
         | Sadly I get "Could not connect to Reddit" when visiting that
         | link
        
       | gk1 wrote:
       | Worth noting that, since the Analytics ID is the publicly
       | visible, anyone can load Google Analytics on their own site using
       | that ID. No FBI connection required.
       | 
       | This is called Analytics hi-jacking and it was once (still is) a
       | common spam technique: Create site buy-my-stuff.net, load a bunch
       | of hijacked analytics scripts there, and then the owners of those
       | accounts will see "but-my-stuff.net" in their analytics reports.
       | 
       | Edit: As commenter lmgk reminded me, you don't even need to make
       | a site, just use the API to make pageview calls.
        
         | enlyth wrote:
         | Is it not possible to whitelist your own domains in Google
         | Analytics? Forgive my ignorance, I don't use it at all.
        
           | lmkg wrote:
           | You don't need to host a site. The data format to send data
           | into Google Analytics is an open API (called the Measurement
           | Protocol). You can just ping Google's servers directly with
           | the appropriate payload, which include crafted URL
           | parameters.
        
           | allset_ wrote:
           | Any info about what domain is being visited would be client
           | side, which could be easily changed.
        
           | codezero wrote:
           | The actual google analytics account has a setting admins can
           | control to only allow data from specific domains though this
           | can be faked.
           | 
           | Also, usually these IDs are copied when someone clones a
           | website they want to steal the design of but they don't
           | bother updating the style or JS.
        
       | [deleted]
        
       | vmception wrote:
       | Now this is the Hacker News I want to see. Just a mere
       | observation using known meta-analytics with entertaining
       | implications.
        
       | [deleted]
        
       | londons_explore wrote:
       | Google analytics ID's are tied to the account that created them.
       | 
       | Presumably the FBI doesn't all share just one massive
       | "fbi@gmail.com" email address.
       | 
       | Even if a bunch of FBI employees decided foolishly to use google
       | analytics on their honeypot sites, one would expect them to all
       | separately sign up using different google accounts - either using
       | their real email addresses, or hopefully throwaway ones.
        
         | lmkg wrote:
         | I think you're confusing _Google_ accounts (email addresses)
         | with _Google Analytics_ accounts (tracking ID prefixes). A
         | single user can create dozens of GA accounts.
        
       | Ozzie_osman wrote:
       | Google analytics used to be called Urchin (they bought Urchin and
       | made it Analytics). So all the urchin.js code is probably just
       | really old Google analytics tracking code.
        
         | elbac wrote:
         | The original Urchin was used for log analysis
         | https://en.wikipedia.org/wiki/Urchin_(software). Which might
         | explain a 'self-hosted' version of the software as well.
        
       | vmception wrote:
       | The article says all three fbi.js files were on waybackmachine. I
       | was only able to download urchin and the other ones are not
       | there. Anyone have a mirror? Besides the author? pastebin or mega
        
         | jcrawfordor wrote:
         | All three are from commodity commercial software, finding other
         | websites of the same period that used Urchin/GA and ForeSee
         | should get you more or less the same files.
        
       | danso wrote:
       | On a related note, I wonder if there are/were common patterns in
       | the sting sites set up by Dept. of Homeland Security, such as U
       | of Northern New Jersey [0] and U of Farmington [1]. Both of those
       | were initiated during the Obama administration and featured
       | fairly nice modern designs, similar in aesthetic to much of the
       | Obama-era digital overhauls (though a quick skim shows that they
       | don't share similar CSS naming semantics).
       | 
       | [0] https://www.nytimes.com/2016/05/06/nyregion/students-at-
       | fake...
       | 
       | https://web.archive.org/web/20160327093120/http://unnj.edu/
       | 
       | [1]
       | https://www.freep.com/story/news/local/michigan/2019/11/27/i...
       | 
       | https://web.archive.org/web/20180414235355/http://university...
       | 
       | http://archive.is/qLrUi
        
         | jcrawfordor wrote:
         | On brief review, one (UNNJ) is running WordPress while the
         | other (Farmington) doesn't show any evidence of a dynamic CMS.
         | That suggests to me totally separate provenance. My guess would
         | be that two different contracts were awarded to two different
         | companies to build the websites, which would both be consistent
         | with common federal contracting behavior and a good idea from
         | an OPSEC perspective since it would minimize any similarity in
         | these "sting" websites.
        
       | bhartzer wrote:
       | The Google Analytics 'trick' (to identify all the sites someone
       | owns) has been around for quite a while. All you have to do is
       | use a code search engine like publicwww to search for the snippet
       | of code or the analytics ID.
       | 
       | It's not just the Google Analytics ID or GTM Id, you can also use
       | the Adsense pub-id or just about anything else that you might
       | think sites have in common. When you start to also look at
       | backlinks and IP neighborhoods, things can get interesting, as
       | well.
        
       ___________________________________________________________________
       (page generated 2020-04-24 23:00 UTC)