pyomop: OMOP Swiss Army Knife 🔧

✨ Overview

pyomop is your OMOP Swiss Army Knife 🔧 for working with OHDSI OMOP Common Data Model (CDM) v5.4 or v6 compliant databases using SQLAlchemy as the ORM. It supports converting query results to pandas DataFrames for machine learning pipelines and provides utilities for working with OMOP vocabularies. Table definitions are based on the omop-cdm library. Pyomop is designed to be a lightweight, easy-to-use library for researchers and developers experimenting and testing with OMOP CDM databases. It can be used both as a commandline tool and as an imported library in your code.

Supports SQLite, PostgreSQL, and MySQL. CDM and Vocab tables are created in the same schema. (See usage below for more details)
LLM-based natural language queries via langchain. Usage.
🔥 FHIR to OMOP conversion utilities. (See usage below for more details)
Execute QueryLibrary. (See usage below for more details)

Please ⭐️ If you find this project useful!

Installation

Stable release:

pip install pyomop

Development version:

git clone https://github.com/dermatologist/pyomop.git cd pyomop pip install -e .

LLM support:

pip install pyomop[llm]

✨ See this notebook or script for examples. 👇 MCP SERVER is recommended for advanced usage.

Docker

A docker-compose is provided to quickly set up an environment with postgrs, webapi, atlas and a sql script to create a source in webapi. The script can be run using the psql command line tool or via the webapi UI. Please refresh after running the script by sending a request to /WebAPI/source/refresh.

🔧 Usage

import asyncio import datetime from sqlalchemy import select from pyomop import CdmEngineFactory, CdmVector, CdmVocabulary # cdm6 and cdm54 are supported from pyomop.cdm54 import Base, Cohort, Person, Vocabulary async def main(): cdm = CdmEngineFactory() # Creates SQLite database by default for fast testing # cdm = CdmEngineFactory(db='pgsql', host='', port=5432, # user='', pw='', # name='', schema='') # cdm = CdmEngineFactory(db='mysql', host='', port=3306, # user='', pw='', # name='') engine = cdm.engine # Comment the following line if using an existing database. Both cdm6 and cdm54 are supported, see the import statements above await cdm.init_models(Base.metadata) # Initializes the database with the OMOP CDM tables vocab = CdmVocabulary(cdm, version='cdm54') # or 'cdm6' for v6 # Uncomment the following line to create a new vocabulary from CSV files # vocab.create_vocab('/path/to/csv/files') async with cdm.session() as session: # type: ignore # Add Persons async with session.begin(): session.add( Person( person_id=100, gender_concept_id=8532, gender_source_concept_id=8512, year_of_birth=1980, month_of_birth=1, day_of_birth=1, birth_datetime=datetime.datetime(1980, 1, 1), race_concept_id=8552, race_source_concept_id=8552, ethnicity_concept_id=38003564, ethnicity_source_concept_id=38003564, ) ) session.add( Person( person_id=101, gender_concept_id=8532, gender_source_concept_id=8512, year_of_birth=1980, month_of_birth=1, day_of_birth=1, birth_datetime=datetime.datetime(1980, 1, 1), race_concept_id=8552, race_source_concept_id=8552, ethnicity_concept_id=38003564, ethnicity_source_concept_id=38003564, ) ) # Query the Person stmt = select(Person).where(Person.person_id == 100) result = await session.execute(stmt) for row in result.scalars(): print(row) assert row.person_id == 100 # Query the person pattern 2 person = await session.get(Person, 100) print(person) assert person is not None assert person.person_id == 100 # Convert result to a pandas dataframe vec = CdmVector() # https://github.com/OHDSI/QueryLibrary/blob/master/inst/shinyApps/QueryLibrary/queries/person/PE02.md result = await vec.query_library(cdm, resource='person', query_name='PE02') df = vec.result_to_df(result) print("DataFrame from result:") print(df.head()) result = await vec.execute(cdm, query='SELECT * from person;') print("Executing custom query:") df = vec.result_to_df(result) print("DataFrame from result:") print(df.head()) # Close engine await engine.dispose() # type: ignore # Run the main function asyncio.run(main())

🔥 FHIR to OMOP mapping

pyomop can load FHIR Bulk Export (NDJSON) files into an OMOP CDM database.

Sample datasets: https://github.com/smart-on-fhir/sample-bulk-fhir-datasets
Remove any non-FHIR files (for example, log.ndjson) from the input folder.
Download OMOP vocabulary CSV files (for example from OHDSI Athena) and place them in a folder.

Run:

pyomop --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/

This will create an OMOP CDM in SQLite, load the vocabulary files, and import the FHIR data from the input folder and reconcile vocabulary, mapping source_value to concept_id. The mapping is defined in the mapping.example.json file. The default mapping is here. Mapping happens in 5 steps as implemented here.

Example using postgres (Docker)

pyomop --dbtype pgsql --host localhost --user postgres --pw mypass --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/

FHIR to data frame mapping is done with FHIRy
Most of the code for this functionality was written by an LLM agent. The prompts used are here

Command-line

 -c, --create Create CDM tables (see --version). -t, --dbtype TEXT Database Type for creating CDM (sqlite, mysql or pgsql) -h, --host TEXT Database host -p, --port TEXT Database port -u, --user TEXT Database user -w, --pw TEXT Database password -v, --version TEXT CDM version (cdm54 (default) or cdm6) -n, --name TEXT Database name -s, --schema TEXT Database schema (for pgsql) -i, --vocab TEXT Folder with vocabulary files (csv) to import -f, --input DIRECTORY Input folder with FHIR bundles or ndjson files. -e, --eunomia-dataset TEXT Download and load Eunomia dataset (e.g., 'GiBleed', 'Synthea') --eunomia-path TEXT Path to store/find Eunomia datasets (uses EUNOMIA_DATA_FOLDER env var if not specified) --connection-info Display connection information for the database (For R package compatibility) --mcp-server Start MCP server for stdio interaction --pyhealth-path TEXT Path to export PyHealth compatible CSV files --help Show this message and exit.

MCP Server

pyomop includes an MCP (Model Context Protocol) server that exposes tools for interacting with OMOP CDM databases. This allows MCP clients to create databases, load data, and execute SQL statements.

Starting the MCP Server

To start the MCP server for stdio interaction:

# Using the main CLI pyomop --mcp-server

Usage with MCP Clients

The server communicates via stdio and can be used with any MCP-compatible client. Example configuration for vscode:

{ "servers": { "pyomop": { "command": "uv", "args": ["run", "pyomop", "--mcp-server"] } } }

If the vocabulary is not installed locally or advanced vocabulary support is required from Athena, it is recommended to combine omop_mcp with PyOMOP.

Available MCP Tools

create_cdm: Create an empty CDM database
create_eunomia: Add Eunomia sample dataset
get_table_columns: Get column names for a specific table
get_single_table_info: Get detailed table information, including foreign keys
get_usable_table_names: Get a list of all available table names
run_sql: Execute SQL statements with error handling

create_cdm and create_eunomia support only local sqlite databases to avoid inadvertent data loss in production databases.

Available Prompts

query_execution_steps: Provides step-by-step guidance for executing database queries based on free text instructions

Eunomia import and cohort creation

pyomop -e Synthea27Nj -v 5.4 --connection-info pyomop -e GiBleed -v 5.3 --connection-info

PyHealth and PLP Compatibility (For Machine Learning pipelines)

pyomop supports exporting OMOP CDM data (to --pyhealth-path) in a format compatible with PyHealth, a machine learning library for healthcare data analysis (See Notebook and usage below). Additionally, you can export the connection information for use with the various R packages such as PatientLevelPrediction using the --connection-info option.

pyomop -e GiBleed -v 5.3 --connection-info --pyhealth-path ~/pyhealth

Additional Tools

Convert FHIR to pandas DataFrame: fhiry
.NET and Golang OMOP CDM: .NET, Golang

Supported Databases

PostgreSQL
MySQL
SQLite

Environment Variables for Database Connection

You can configure database connection parameters using environment variables. These will be used as defaults by pyomop and the MCP server:

PYOMOP_DB: Database type (sqlite, mysql, pgsql)
PYOMOP_HOST: Database host
PYOMOP_PORT: Database port
PYOMOP_USER: Database user
PYOMOP_PW: Database password
PYOMOP_SCHEMA: Database schema (for PostgreSQL)

Example usage:

export PYOMOP_DB=pgsql export PYOMOP_HOST=localhost export PYOMOP_PORT=5432 export PYOMOP_USER=postgres export PYOMOP_PW=mypass export PYOMOP_SCHEMA=omop

These environment variables will be checked before assigning default values for database connection in pyomop and MCP server tools.

Contributing

Pull requests are welcome! See CONTRIBUTING.md.

Contributors

Bell Eapen

Name		Name	Last commit message	Last commit date
Latest commit History 353 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
docs		docs
examples		examples
notes		notes
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
AUTHORS.md		AUTHORS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
codecov.yaml		codecov.yaml
config-local.js		config-local.js
docker-compose.yml		docker-compose.yml
docker-start.sh		docker-start.sh
main.py		main.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
t_install.py		t_install.py
tox.ini		tox.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pyomop: OMOP Swiss Army Knife 🔧

✨ Overview

Installation

✨ See this notebook or script for examples. 👇 MCP SERVER is recommended for advanced usage.

Docker

🔧 Usage

🔥 FHIR to OMOP mapping

Command-line

MCP Server

Starting the MCP Server

Usage with MCP Clients

Available MCP Tools

Available Prompts

Eunomia import and cohort creation

PyHealth and PLP Compatibility (For Machine Learning pipelines)

Additional Tools

Supported Databases

Environment Variables for Database Connection

Contributing

Contributors

About

Uh oh!

Releases 25

Uh oh!

Contributors 6

Uh oh!

Languages

License

dermatologist/pyomop

Folders and files

Latest commit

History

Repository files navigation

pyomop: OMOP Swiss Army Knife 🔧

✨ Overview

Installation

✨ See this notebook or script for examples. 👇 MCP SERVER is recommended for advanced usage.

Docker

🔧 Usage

🔥 FHIR to OMOP mapping

Command-line

MCP Server

Starting the MCP Server

Usage with MCP Clients

Available MCP Tools

Available Prompts

Eunomia import and cohort creation

PyHealth and PLP Compatibility (For Machine Learning pipelines)

Additional Tools

Supported Databases

Environment Variables for Database Connection

Contributing

Contributors

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 25

Uh oh!

Contributors 6

Uh oh!

Languages