Over the last few years I’ve published about a dozen random datasets. They’ve all been posted on Socrata’s OpenData portal, which recently came to the absurd conclusion that making profile pages private was the best way to combat spam (and didn’t bother to tell anyone beforehand).
Since the profile page I previously linked to can no longer be searched by Google and I’m actively looking for a new place to host future datasets, I decided to compile a list of them here:
-
South Carolina Midlands Employee Salary Database
An export of all the employees of local governments in the Midlands area of South Carolina that make at least $50,000 per year. Scraped from the database made available thanks to The State newspaper. The government agencies included:
Counties: Kershaw, Lexington, Richland Libraries: Richland County
Municipalities: Batesburg-Leesville, Blythewood, Camden, Cayce, Columbia, Forest Acres, Irmo, Lexington, West Columbia
School Districts: Kershaw, Lexington 1, Lexington 2, Lexington 3, Lexington 4, Lexington-Richland 5, Richland 1, Richland 2
Other Views: City of Columbia Employee Salaries -
South Carolina State Employee Salary Database
All employees of the State of South Carolina that make at least $50,000 per year. Imported from the South Carolina Budget and Control Board website.
Other Views: Average Compensation by Agency and Total Compensation by Agency -
New Jersey Traffic Violations
Traffic violations from the State of New Jersey. These were all scraped for analysis from the now defunct peopleviolations.com. The fields are: Row ID Violator First Name Violator Last Name Violator’s Home City Offense Committed City Offense Committed In Unix Timestamp when record was added to scraped database Unique hash representing this record The first 3 letters of the violator’s last name that were used for display The Source URL this record was scraped from.
Other Views: Most Popular Offense Cities and Most Popular Violator Home Cities -
PHP Repositories on GitHub
A list of all the PHP repositories I could find using the GitHub search API. -
US Zipcodes
A list of all the US Zipcodes and the State, County, and City they are associated with. These were extracted from the 2012 TIGERLine exports. -
Two Million LastFM User Profiles
Almost 2 million (1,840,647) user profiles extracted from the Last.fm API around Christmas, 2012. -
All Starbucks Locations in the World
An export of all Starbucks locations in the world. This dataset was scraped from the Starbucks website and is regularly updated.
Other Views: Heat Map, Point Map, Number of Locations by Country, and Number of Locations by US State -
Greenville County School District Spending
Check, purchase card, and credit card transactions for the Greenville County School System.
Other Views: Chart of Spending by Month, 2013 - 2014 School Year, 2012 - 2013 School Year, Spent on Hotels, and Refunds / Credits / Voids -
Aiken County School District Spending
Check and purchase card transactions for the Aiken County School System.
Other Views: Chart of Spending by Month, Top Expenses, Top Vendors, 2013 - 2014 School Year, 2012 - 2013 School Year, and Refunds / Credits / Voids -
City of Columbia Spending
Checks written by the City of Columbia, South Carolina.
Other Views: Chart of Spending by Month -
South Carolina State Agency Spending Transparency
An export of South Carolina’s monthly expenditures that can actually be analyzed in a useful way.
Other Views: Donut Chart of Total by Agency and Chart of Spending by Month