← Back to home

Paper2Graph

Pharmacological Pilot Exploration of GPT-KG · Dan Sosa · June 2023 · IdeaFlow

Paper2Graph (P2G) uses GPT-derived knowledge graphs to extract, represent, and discover knowledge from scientific papers. This pilot analysis explored pharmacological use cases, comparing GPT-extracted drug pathways against gold-standard manually curated knowledge bases.

PDF  ·  Original (.docx)  ·  Zoom Recording  ·  papertograph.ai

Paper2Graph: scientific paper alongside Neo4j knowledge graph visualization with Cypher query

Source paper (left) and GPT-extracted knowledge graph in Neo4j (right)

Objectives

1. Recapitulate Gold-Standard Knowledge

Recreate existing manually curated drug pathways and qualitatively assess knowledge extraction quality, focusing on hallucinations and mapping to biomedical ontologies.

2. Explore Drug Mechanism Paths

Explore metapaths in Neo4j for recapitulating known drug mechanisms via Cypher queries.

Objective 1 – Recapitulating Gold-Standard Knowledge

Methods

PharmGKB, an authoritative manually-curated knowledge base of personalized drug response, was referenced for drug pathways. Two drug pathways were studied:

Text from source papers was fed into Paper2Graph. GPT-4 was used in all cases, with duplicate runs to assess non-deterministic behavior.

Observations: Strengths

Observations: Challenges

Objective 2 – Exploring Drug Mechanism Paths

Methods

The KGs from Objective 1 were interrogated via Cypher queries to generate mechanistic understanding of drug function. Questions posed:

  1. By what mechanism does abacavir lead to a hypersensitivity adverse reaction?
  2. How does abacavir inhibit HIV?
  3. What is known about lansoprazole's effect on PPIs?
  4. What adverse reaction is this drug known to cause?
  5. At what dosage is this drug effective?
  6. Why might a new drug X be effective in this disease context? (repurposing)
  7. What normal function can be inhibited by a drug with no major harmful downstream consequences? (toxicology)

Observations

Conclusions

GPT-derived knowledge bases present a great opportunity to extract knowledge and rich semantics at scale. GPT-4 does well at sentence parsing, entity identification, and capturing quantitative information. Key remaining work includes ensuring high quality, reducing noise, preserving context, and tailoring output to specific end users.

Design considerations around noise tolerance, KG detail level, and pre-processing depend heavily on the end user – a toxicologist at Merck has very different needs from an academic computational pharmacologist. These tools will undoubtedly augment humans' ability to understand and contribute to science.