- Which state has the highest percentage of working moms?
- Who’s more employed—people with bachelor’s degrees or doctorates?
- Who earns more income—people who get to work at 7 am, 8 am or 7 pm?
When we think about the information that the American people should have at our fingertips to make decisions about the way we live and work, the above data is exactly the kind that needs to be accessible and available. And when Commerce—“America’s Data Agency”—issued a call to the private sector to get our data out to those who are not accessing it directly, these were exactly the kinds of answers we were looking for. You have seen our prior call for help for the public good, and Kaggle is one of those companies that have stepped up to the challenge.
With Commerce public datasets loaded on to the Kaggle platform, you can find the answers to the above questions. In fact, “Kagglers”—members of the Kaggle community—analyzed data from the Census Bureau’s American Community Survey, the nation’s premier source for information about America’s changing population, housing and workforce, to challenge conventional wisdom with these answers.
Kaggle has committed to putting valuable Commerce datasets in front of its global community of data scientists, developers and coders. Making public data more open and accessible in this way helps democratize our data, promote data equality, and show what’s possible when the private and nonprofits sectors collaborate to take public data and run with it to address public problems.
Kaggle’s Response to the Challenge of Data Inequality
As Anthony will tell you, Kaggle’s mission is to help the world learn from data, making it easier for researchers, data scientists, and hobbyists to work collaboratively on reproducible projects by allowing data, code, and discussion to live and grow in a single ecosystem.
Responding to the Department’s call to address data inequality, Kaggle has committed to taking a series of publicly available Commerce datasets from the US Patent and Trademark Office and the US Census Bureau and others, and challenge the Kaggle community to solve public problems. Kagglers will be challenged to analyze innovation, creativity, and technological progress in the United States, and dig deeply into the stories of how Americans live and work to uncover insights about our country.
And, how does putting Commerce datasets on the Kaggle platform and before the Kaggle community help address data inequality?
First, by publishing datasets into an active data science community of around 700,000 Kagglers, where sharing insights, analytic approaches or methods and learning is the norm, there is a real opportunity to bring insights from this data to people, charities, nonprofits and small companies around the country. In addition to data, the Kaggle platform offers conversational threads, visual stories and a repository of documented code to accompany datasets prepared for analysis.
Second, Kaggle also runs machine-learning competitions in domains ranging from the diagnosis of diabetic retinopathy to the classification of galaxies, and brings together machine-learning veterans and students with varied academic and professional backgrounds. Datasets shared on Kaggle enable data scientists, researchers, and others who work with data, to find and share anything from civic statistics to European soccer matches for open community collaboration. This permits combining consistent access to public data with reproducible analysis, visibility of results, and conversations on forums with others interested in the data.
The ability to combine our Commerce data with other public data sets could bring insights that may not exist in our data alone.
Third, the in-browser analytics platform, Kaggle Kernels, will allow open analysis, visualization, and modeling of the Commerce data sets, as you’ll see illustrated below. Each Commerce dataset will be accompanied by a repository of code and insights, which enables quick learning and active contribution by the whole community.
The goal of all of this is to enable data scientists to find critical insights in our data and share them with the American people.
Kaggle will post more Commerce public datasets soon. We look forward to giving you an update—and of course, getting your thoughts, insights and comments.
– Justin and Anthony
PS: Here are the answers to the quiz at the top of this blog:
Kaggle Kernel, involving over 11,000 data scientists, found that Americans who start their day around 8 am earn the most.
This Kaggle Kernel investigated whether it pays to pursue a PhD and the best states to find a job post-degree. The analysis has received over 30,000 views and nearly 90 other data scientists have created reproducible forks of the code.
One working mother and data scientist uses the rich data provided by the Census Commerce American Communities Survey to explore the stories of American working moms in this Kaggle Kernel viewed by over 14,000 people.