Analyzing Analytics (Featuring: The FBI)
Recently while conducting some research, I found myself down the path of Google Analytics ID’s as well as other analytics services. I was investigating ways to not only identify varying analytics code in sites, but to correlate them with other sites that may be linked to the same owner.
WHAT ARE ANALYTICS ID’S?
Various analytics services essentially give you a small chunk of code to inject into your site that reports information back to them when somebody views your website. This is generally then provided on a fancy dashboard so you can better understand the traffic coming to your site whether it’s visitor counts, visitor locations, referrers, etc. Depending on the service, you may just get one unique ID to use across all of your sites or perhaps, like in the image above, you’ll get an ID with appended information to distinguish sites you’re monitoring. If you look at the structure of the image above, you’ll see that I myself as a user have the ID
2319990. Then, for each site or “property” I want to track, they append more information to the end of it. In the case of this site, they’ve added
-11 because I suppose I’ve previously monitored 10 other things that are now dead or gone.
So why is this important? Because, especially if you look at the case of Google Analytics, you’ll notice that because the base ID stays the same, you can use services or search engines to identify other assets linked to this ID and in turn, find other sites I own or at least have some kind of permission to monitor or edit code on. If sites
-10 still existed, you’d be able to see them all and know that they were in some way possibly related to me.
There are two problems with this type of searching / enumeration. One of them being, if at any point an analytics ID is reassigned, you are going to get false positives. Another is that certain 3rd parties can distribute Analytics ID’s. So you may find two sites that appear seemingly related based on the Analytics ID but find out later or perhaps never that the only real connection is that they both received ID’s from a 3rd party and have very little material connection to each other. All that said, finding related sites through analytics ID’s is not a guarantee but is definitely worth checking for the inquisitive investigator or researcher.
A CASE STUDY OF THE FBI
I didn’t actually think this would be a thing. I tried finding analytics ID’s for both the CIA and NSA first but nothing turned up. Personally, I think it’s probably a good idea that these federal agencies aren’t utilizing this if only in part from reasons I’ve already mentioned. Then I put the FBI in and… well let’s see how this analytics research can both work successfully and provide some unnecessary visibility to the people who own the account.
One site that will help you identify these links is called “BuiltWith”. If you enter a domain in there and then click the “Relationships” tab, you’ll be greeted with a list of every ID this site has observed them use, and a fancy historical record with graphs for you graph people (shown above). You’ll notice that the same analytics ID linked to the FBI has appeared on a lot of other seemingly unrelated sites over the years. Of note, the unrelated sites seem to share the same ID during the same timeframe which is curious. One site in particular from the list seems to have been up with that ID for over 6 years. Maybe its something they forgot about, let’s take a look.
This site is straight out of the internet boom and has clearly never been updated since. This obviously isn’t something that should be showing up in relation to the FBI, so how did it get in our list? Well if we take a brief look deeper, theres more going on here than we realize. I use Brave Web Browser because it has a lot of built in privacy functions that you’d have to add extensions in other browsers to replicate. One of them being protection against cross-site trackers. On Google.com, Brave blocks 6 trackers, all belonging to them. On this husky site from the 90s?
So what’s going on here? If I had to guess, I suspect that the FBI may be using fake (or real, seized) pages in their investigations and they’re utilizing their own FBI-linked code and analytics IDs to monitor them. It’s actually rather comical considering they’re betting on poor tech practices for the people visiting the pages but are making what I’d consider a pretty major oversight in how they’re operating their own practices. I counted 57 total pages linked to them through analytics. Barring any of the very obviously gov / FBI related pages (17 I believe), it appears there are still 40 in this list that were possibly at some point used to collect analytics data for the FBI but made to not look like they belonged to, or were being monitored by the FBI.
To their credit, many of the sites either no longer exist or no longer have the FBI’s code embedded in them. In fact, though I haven’t checked them all, this page has been the only one where I’ve seen present, current evidence of code related to the FBI. Some other interesting domains include an official edu domain for Chattahoochee Technical College, a torrent site, and a proxy site (with clear FBI labeling on the page itself).
Just because a site doesn’t exist anymore doesn’t necessarily mean we can’t see what it looked like. Using the dates provided from BuiltWith, we can use the “Wayback Machine” on Archive.org to see if any historical snapshots of the pages exist around the time the analytics ID showed up there.
I found some similarities in a random sampling of pages I checked. First, they were all available in the Wayback Machine. This isn’t always guaranteed because a person has to manually submit archival to archive.org, it’s not just indexing pages. Second, of the sampling I took, all the pages had a snapshot taken within a month preceding the time when the analytics ID reported being added to the site. Was this the FBI? I have no idea but it seems odd.
Sometimes the Wayback Machine has trouble indexing pages for whatever reason. One of these reasons, at least in my experience, is when a site is redirecting somewhere else. If it does have trouble following the redirect however, it displays a log of where it was going.
I don’t know. This is about as far as I took it. There’s probably a lot more informatin to be dug into here but this post has already gone on long enough and I’m sure some daring adventurers out there who see this will come out with research that completely shadows this. That’s great and I can’t wait to read it! I for one have to push pause. This was a fun find for me and I hope for you too. If anything, I hope it highlights the importance of checking analytics ID’s for associations when conducting an investigation. If you find anything else interesting or have any feedback, please feel free to reach out! You can find me on Twitter or by emailing me at my name (at the top of the post) at this domain.