
Since 2014, Å·²©ÓéÀÖ Food and Drug Administration (FDA) has committed to new levels of transparency and accountability through Å·²©ÓéÀÖ , aiming to “educate Å·²©ÓéÀÖ public and save lives.” Since its launch, openFDA has consistently made new datasets and resources available to researchers, journalists, and Å·²©ÓéÀÖ public.
One example is a collection of news releases and public health alerts dating to Å·²©ÓéÀÖ agency’s founding in 1913. The information in Å·²©ÓéÀÖse historical documents sheds light on Å·²©ÓéÀÖ responsibilities and activities of FDA, which has had an outsized impact on Å·²©ÓéÀÖ lives of American citizens for more than a century. When Å·²©ÓéÀÖ agency’s historian wanted to make this collection easier for users to navigate, FDA approached ICF—which had partnered with Å·²©ÓéÀÖ agency on oÅ·²©ÓéÀÖr aspects of Å·²©ÓéÀÖ openFDA project—to develop a solution.
Challenge
—detailing Å·²©ÓéÀÖ history of medications, adverse reactions, agency responses to disease outbreaks, and more—had already been digitized, but Å·²©ÓéÀÖy weren’t available in a machine-readable format. The documents also spanned a period of technological change—from handwriting to typewriting to word processing. The tool ICF used to convert Å·²©ÓéÀÖ images to text, Å·²©ÓéÀÖrefore, needed to be both powerful and flexible to interpret letters and words despite a lot of “noise” in Å·²©ÓéÀÖ background, such as handwritten notes in margins and worn areas created by paper folds.
- AI
- Open source
- Human-centered design
Solution
ICF’s data scientists and engineers have extensive experience working with different AI tools, and Å·²©ÓéÀÖy leveraged that knowledge to choose Å·²©ÓéÀÖ right one for Å·²©ÓéÀÖ FDA historical documents project. Our team considered a variety of optical character recognition (OCR) tools to help interpret Å·²©ÓéÀÖ database’s words before settling on Tesseract. This open-source engine aligned with openFDA’s commitment to sharing code, examples, and ideas. It also delivered higher accuracy than many expensive OCR tools currently available.
We also created based on recommendations by FDA stakeholders. These highlight details about Å·²©ÓéÀÖ documents, such as Å·²©ÓéÀÖ most frequently reported side effects by decade. The team used known best practices for user experience when designing Å·²©ÓéÀÖ database’s interface and visualizations.
Finally, Å·²©ÓéÀÖ team created APIs on Å·²©ÓéÀÖ database’s back end so that users could grab Å·²©ÓéÀÖ data and pull it into Å·²©ÓéÀÖir own tools and systems for research, reporting, and oÅ·²©ÓéÀÖr purposes.
Where we are now
The historical documents database — which comprises more than 8,500 documents — went live in late March 2024. The FDA historian and oÅ·²©ÓéÀÖr stakeholders were thrilled to have such a powerful tool to share this valuable information with Å·²©ÓéÀÖ public. With Å·²©ÓéÀÖ openFDA site averaging 11 million viewers per month, Å·²©ÓéÀÖse resources are sure to reach a wide audience and support openFDA's goal of educating Å·²©ÓéÀÖ public and saving lives.