Skip to content

SynthBioData (Synthetic Biological Data)

Python Ruff Polars

A Python package for generating synthetic drug discovery data that mimics real-world scenarios using realistic molecular descriptors and target properties.

Important Notice

This package generates synthetic data for testing and educational purposes only.

The data produced does not represent real biological or chemical measurements and should not be used for clinical, regulatory, or production applications.

Quick Start

Get started with synthbiodata in just a few lines of code:

from synthbiodata import generate_sample_data

# Generate molecular descriptor data
df = generate_sample_data(data_type="molecular-descriptors")
print(f"Generated {len(df)} samples with {len(df.columns)} features")

# Generate ADME data
df_adme = generate_sample_data(data_type="adme")
print(f"Generated {len(df_adme)} samples with {len(df_adme.columns)} features")

Key Features

  • Molecular Descriptors

    Generate realistic molecular properties like MW, LogP, TPSA, HBD, HBA, and more

  • ADME Data

    Simulate Absorption, Distribution, Metabolism, and Excretion properties

  • Target Families

    Support for GPCR, Kinase, Protease, and other protein families

  • Chemical Fingerprints

    Generate binary chemical fingerprints as features

  • Configurable

    Customize data generation parameters and distributions

  • Efficient

    Built on Polars for fast data manipulation and processing

Data Types

Molecular Descriptors

Generate synthetic molecular data with features like:

  • Molecular weight, LogP, TPSA
  • Hydrogen bond donors/acceptors
  • Rotatable bonds, aromatic rings
  • Chemical fingerprints
  • Target protein families (GPCR, Kinase, Protease, etc.)

ADME Data

Generate ADME (Absorption, Distribution, Metabolism, Excretion) data with:

  • Absorption percentages
  • Plasma protein binding
  • Clearance rates and half-life
  • Bioavailability predictions

⬇ Installation

Install synthbiodata using your preferred package manager:

uv pip install synthbiodata
pip install synthbiodata

📖 Documentation

Explore the docs: