Anonymous Source Tracker 2.0

The Anonymous Source Tracker is back.

Some things have changed since I took it offline roughly three years ago. The old tracker searched more than 10,000 English-language news site crawled by Google. The new Anonymous Source Tracker only tracks the 50 or so most-popular online news sites in the U.S. That’s because Google deprecated the feed API used for the old tracker, forcing me me to turn to Google Custom Search, which makes it more difficult to track every English-language news site in existence.

I’ve eliminated the counts and bar graphs of anonymous sources for each media outlet that were part of the old tracker. Those were somewhat misleading because of false matches, duplicates and because larger news organizations had higher counts solely because of their size.

The new tracker is still flawed. It also includes duplicates, false matches and suffers from other issues, which I hope to minimize over time.

I rewrote the new tracker in Python instead of PHP because I’d rather work in Python than PHP, which I was required to use at work at the time I created it. Python is used more often for the type of data projects I hope to dabble in in the future.

The new tracker is a static site instead of a database-driven one, which should make it easier to maintain. The old tracker occasionally fell over on the cheap Web hosting service I had been using.

The code is on Github. I welcome any suggestions or help improving it. I’ve also made it easier to obtain the data itself. It’s available on GitHub as a csv file and as a SQLite database.

Long-term plans include:

  • Writing custom web scrapers so the tracker is no longer dependent on Google.
  • Incorporating text mining so it will say more useful things about who is using anonymous sources, why and in what context.

If you have thoughts on that, let me know.