“We can each define ambition and progress for ourselves. The goal is to work toward a world where expectations are not set by the stereotypes that hold us back, but by our personal passion, talents and interests.”—Sheryl Sandberg
As we bring private-sector innovators and technologies into our challenge to use public data to solve public problems, it’s striking how many are finding new ways to break through and apply their passions, talents and interests:
- We have one company that is making our data available free and open on a platform with 700,00 data scientists.
- We have another company that is wrangling, integrating, and presenting our data with information from a number of other public sources.
- A third is making Commerce data more accessible via interactive visualizations and filters.
In today’s announcement, we are sharing what Ephesoft will be doing, free and open for the public, to advance the goals of democratizing our data using their technology.
We here at Commerce make a lot of data available in many formats, including in bulk and through our application programming interfaces (APIs). However, some of the data that we make available to the public might be available in pictures that contain the data—like files in “portable document format” (PDF) or in “tagged image file format” (TIFF). Documents in these formats are nearly impossible to derive insights from because the data itself is unstructured and hard to analyze.
In response, Ephesoft has used its digitization and machine learning technology to start extracting meaningful data from these images. Ephesoft first performed a proof-of-concept exercise on data from the US Patent and Trademark Office, by running patent data in image-based PDF format through their platform and identifying fields such as patent date and number.
Once these fields have been identified, Ephesoft’s algorithms extract pertinent data from the images and identify linkage across multiple documents. In its exercise on US patent data, Ephesoft’s resulting mind map visualization displays how one patent is connected to other patents, based on references, citations, and abstracts. This information can be used to analyze bright spots and clusters in US research, as well as identify gaps in patented technology or ‘lonely’ patents in spaces where little other patented art exists.
In addition to its ability to extract data from images, Ephesoft can also house large sums of public data to create knowledge bases that allow organizations to compare their unstructured data against a series of benchmarks. For free and open use, their team is now working to combine trade data from the US International Trade Administration, the US Census Bureau, and the Bureau of Economic Analysis at Commerce to develop a public knowledge base for American industry.
This tool will be able to help U.S. businesses answer questions such as:
- Will regulation requirements impact shifts in my export strategy?
- Are my export practices compliant with all relevant trade regulations?
- Which markets are most similar to my current trade portfolio?
- How does my organization compare to other organizations in the same industry?
What continues to be exciting about these collaborations is that they highlight the many ways that the readers of this blog might bring to help improve the lives of the America people. It’s not about having a specific talent; it’s about how you can use your unique talent to serve.
Data can be used for just about anything these days—ordering a ride, booking a hotel, or determining which political candidate to cast a vote for. When the Commerce Department announced its challenge for private companies to use their technology for public good, we had no idea such a wide variety of organizations would come forward, offering to leverage their unique capabilities for the good of the American people. But they did come forward, in droves, using their individual talents to build tools for our citizens.
We hope you join us.
Thanks for reading.
Justin and Ike