When we began our 14-week tech health sprint in October 2018, we did not realize the profound lessons we would learn in just a few months. Together with federal agencies and private sector organizations, we demonstrated the power of applying artificial intelligence (AI) to open federal data. Through this collaborative process, we showed that federal data can be turned into products for real-world health applications with the potential to help millions of Americans have a better life.
Joshua Di Frances, the executive director of the Presidential Innovation Fellows (PIF) program, says that this collaboration across agencies and private companies represents a new way of approaching AI and federal open data. “Through incentivizing links between government and industry via a bidirectional AI ecosystem, we can help promote usable, actionable data that benefits the American people," Di Frances said.
Challenge #1 - Artificial Intelligence (AI) Teams
In this first part of our series, we asked federal agencies and patient advocates who were part of the AI teams to share their reactions and insights on their collaborations. Many shared that this sprint allowed for an open exchange of ideas and the opportunity for further collaboration between the U.S. government and private sector companies.
By January 2019, seven TOP Health teams had delivered digital tools — all built with federal data and leveraging emerging technology like artificial intelligence (AI). Teams transformed Data.gov resources into digital products, apps, and gamification inventions to improve clinical trials, experimental therapies, and data-driven solutions for complex challenges from cancer to Lyme and tick-borne diseases.
From the onset of the AI-ecosystem sprint, Presidential Innovation Fellow Dr. Gil Alterovitz engaged with federal and industry thought leaders who emphasized that the key to fostering an AI-centric approach was quality data. To create value — especially with AI — data must be in an accessible and in a usable format.
Dr. Alterovitz and Dr. Kristen Honey worked across agencies seeking to leverage existing industry-based tools that incorporated federal data. TOP Health results included new insight on how sharing federal data with the private sector could be incentivized and measured. Government agencies were interested in understanding and customizing datasets for various use cases, but were primarily focusing on internal government use. The lightweight collaboration through TOP Health strengthened the intersection of government and industry.
On the industry side, the private sector was looking to minimize risk when choosing which datasets to leverage. There is a sea of federal datasets of various quality and usability. Organizing data for a particular business case has costs, both in terms of personnel time to process and map data for a particular use. Through the TOP Health process, multiple high-value datasets were unlocked from over a dozen collaborating agencies/departments.
To inspire prospective inventors, including those with applications in the AI ecosystem and experimental therapeutics space, the U.S. Patent and Trademark Office (USPTO), for instance, assembled a large set of invention topics. USPTO released this data as part of TOP Health sprint.
The Department of Veterans Affairs (VA) took a different approach. They provided access to de-identified cancer patients for use in matching to trials/therapeutics and analytical tools around AI and machine learning.
Patient Participation and Perspectives
Core to the TOP Health process is people — including patients. User-centered feedback from diverse individuals guided the development of all digital tools. In the AI ecosystem track, patient advocates like Stephen Aldrich, a 63-year-old ex-founder and CEO of Bio Economic Research Associates, LLC (bio-era) provided valuable feedback and perspective. Aldrich was diagnosed with metastatic Stage IV adenocarcinoma of the esophagus in late March 2017. “My response to my fatal cancer diagnosis served as an inspirational case study for the sprint group,” shares Aldrich. He is hopeful though for the future of experimental therapeutics via AI and open data.
“I am extremely grateful to live at a time when what used to be a terminal cancer diagnosis can be turned into something much less threatening due, in no small part, to our exploding ability to gather and analyze personal omic information. I envision a day when all cancer patients have had their cancers fully sequenced, and enjoy direct control over their fundamental genomic and health data, enabling them to quickly identify the best potential treatment options for their unique cancers. Amazing cures are possible, if we enable them to happen,” Aldrich says.
Rick Bangs, MBA/PMP, is a bladder and prostate cancer survivor and works as a patient advocate, primarily in research and clinical trials. He serves as patient advocate for the National Cancer Institute (NCI), SWOG Cancer Research Network, the National Comprehensive Cancer Network (NCCN), and ASCO and has leadership roles in both the NCI and SWOG. He has also supported the clinicaltrials.gov development team in its efforts to improve its user interface and provide more relevant search results.
Bangs says the AI ecosystem demos were promising though he believes that no one supplier can make this work without partnering with other teams who bring different capabilities to the table. “The solution here will require vision, and that vision will cross capabilities that no one supplier will individually have. It might help if there was a vision that the suppliers were striving for, an aspirational North Star if you will. That might result in the original vision being expanded and extended,” Bangs said.
He notes that applied responsibly open data AI approaches “probably” can facilitate a new experimental therapy ecosystem that will benefit the patients but he says data structures and hierarchies must be modernized and matching must account for location as well as disease, “Open data AI approaches can facilitate the ecosystem but only one which is as strong as its weakest link, which is data. The question is whether we can drive a result that is fit for purpose as defined by key stakeholders within the constraints and limitations.”
National Cancer Institute Deep Dive on Data Collaboration
The AI collaboration actually created new datasets, which resulted in a number of interesting patterns and lessons. For example, the National Cancer Institute (NCI), which is part of the National Institutes of Health (NIH), generated three new datasets including data on: (1) structured eligibility criteria, (2) participants based on call samples to contact center, and (3) medical professional-curated participant/trial match rating. NCI also provided medical professional curation to make datasets, and give guidance.
Dr. Gisele Sarosy is a medical oncologist leading informatics projects in NCI’s Coordinating Center for Clinical Trials who oversees the development of information about cancer clinical trials tailored to cancer patients, providers, and caregivers, and how to leverage technology to do so effectively and efficiently. She points out that it’s hard for patients to find and match clinical trials, even with ongoing efforts to improve the online search and retrieval of cancer clinical trials through NCI’s trials.cancer.gov. Challenges exist for a variety of reasons, including:
- Searches often retrieve too many trials for which a potential participant may not be eligible.
- Searches may miss relevant trials, particularly those based on molecular alterations rather than the disease site.
- Eligibility criteria are often long, detailed, inconsistent, and not listed in order of importance.
One of the goals of the TOP Health sprint was to identify whether application of artificial intelligence to clinical trials eligibility criteria as currently written could improve the precision of cancer clinical trial search and retrieval. The short answer is, yes! To support this work, NCI provided:
- Access to NCI’s Clinical Trials Search API (CTS-API) which includes cancer clinical trial abstracts, disease and intervention coding terms, and information about sites where participants can enroll on these studies. TOP Health Sprint participants were asked to provide feedback regarding the API.
- Three datasets, including: Data Set 1: Subset of eligibility criteria translated into machine readable code; Data Set 2: 100 records based on callers to the NCI’s Cancer Information Service and enhanced with synthetic data, translated into machine readable code; and Data Set 3: 30 participant records matched against 50 clinical trials for which the eligibility criteria as well as the participant data had been previously translated into machine readable code. The last data set, Data Set 3, produced by oncology professionals, then served as a comparison dataset for the match identified through the application of artificial intelligence.
Dr. Sarosy applauds the various organizations for “applying resources to tackle a very difficult problem, i.e., interpreting the complex and inconsistently written eligibility criteria and attempting to parse and structure them to facilitate search and matching in an automated fashion.”
Although she said that some of the new digital tools and applications from TOP Health are still quite early in development, she encourages continued progress. “This direct interaction helped me think of ways to work with my team to refine the datasets to better meet patients’ needs. I am hopeful that the direct interaction helped the innovators to better understand the challenges of providers as they seek to find the right trial for a potential participant at the point of need,” Sarosy added.
United States Leadership in AI and Technology
This first-ever TOP Health tech sprint advanced priorities recently identified in the American AI Initiative, established by Executive Order in Feb. 2019. It also illustrated real-world value as government data was unlocked for public use as machine-readable “open data,” per the Jan. 2019 signed Foundations for Evidence-based Policymaking Act.
“At HHS, we recognize that federal government alone cannot solve our most important and complex challenges,” said Ed Simcox, HHS CTO and acting chief information officer. He also added that “the TOP Health sprint is a valuable step in leveraging skills from industry with public resources to promote better health outcomes.” This is just the beginning of a long-term government commitment to data-driven innovation. In early March, we’ll publish another part in this blog series that will feature lessons learned and outcomes from leveraging open data and emerging technologies for Lyme and tick-borne diseases – the second theme explored in the TOP Health sprint.
On February 28, 2019, the PIFs and HHS Office of the CTO will co-host the TOP Health participants and cross-agency leadership in Washington, D.C. The goal of this event at the White House Eisenhower Executive Office Building is three-fold:
- Welcome and recognize the TOP Health teams and collaborating federal partners for completing the health tech sprint and their work via Data’s Choice and AI’s Choice metrics for sharing data and results
- Provide opportunity for all 10 TOP Health teams to pitch their 11 digital tools to federal leadership.
- Build momentum for the larger, 250-person event at Census Bureau – The Opportunity Project Demo Day – to be held on March 1.
In early March, we’ll share learnings from Challenge #2, focusing on Lyme and Tick-Borne Disease Teams — four teams developed capabilities to support data-driven decisions for the prevention, education, and science for improving public health outcomes related to tick-borne diseases.
Our TOP Health sprint, in collaboration with the U.S. Department of Health and Human Services Office of the Chief Technology Officer and Presidential Innovation Fellows, was modeled in part after The Opportunity Project (TOP) at U.S. Department of Commerce. It was co-lead by Presidential Innovation Fellow, Dr. Gil Alterovitz, and HHS Innovator-in-Residence, Dr. Kristen Honey.
Are you interested in learning more about using/extending the AI ecosystem, Lyme Innovation, Open Data, Data’s Choice, AI’s Choice, or the TOP Health sprint? If so, please see the TOP Health site or contact us at: PIF-Team@gsa.gov or firstname.lastname@example.org. On social media follow @InnovFellows @HHSCTO and the hashtag #TOPHealth.