🧬 DeepFoldChange

A Deep Learning Framework that Emulates Statistical Models for Differential Gene Expression Analysis

🌍 Background and Motivation

Traditional RNA-seq differential expression (DE) analysis relies on statistical tools such as DESeq2, edgeR, and limma-voom.
While these methods are robust, they can be computationally intensive, require repeated normalization, and are not easily generalizable across datasets.

DeepFoldChange offers a new perspective — a deep neural network (DNN) framework that learns to imitate the statistical inference process used in DESeq2.
By training on DESeq2-derived log2FoldChange (LFC) values and expression count matrices, DeepFold accurately predicts fold-changes for unseen genes, effectively serving as an AI-driven surrogate model for DE analysis.

⚡ Key Features

🧠 Deep Neural Emulation — Learns DESeq2-style fold-change behavior directly from count matrices.
🧩 Lightweight & Scalable — Uses PCA-compressed expression features for faster training.
📊 Model Evaluation Suite — Automatically generates regression, error, and agreement plots.
💻 Reproducible Pipeline — Compatible with any RNA-seq dataset that includes count and LFC tables.

📁 Repository Structure

DeepFold/
│
├── train_predict_lfc_dnn.py           # Main training + prediction script
├── plot_deepfold_performance.py       # Performance plotting utility
├── requirements.txt                   # Package dependencies
├── lfc_dnn_outputs/                   # Folder containing model outputs
│   ├── holdout_predictions.csv
│   ├── all_predictions_fullfit.csv
│   ├── pca_dnn_lfc_predictor.pkl
│   ├── metrics.txt
│   ├── plot_true_vs_pred_scatter.png
│   ├── plot_error_distribution.png
│   ├── plot_bland_altman.png
│   ├── plot_cumulative_accuracy.png
│   ├── plot_accuracy_vs_threshold.png
│   └── plot_accuracy_bar.png
└── README.md

🧩 Model Architecture

DeepFoldChange uses a PCA → DNN regression pipeline:

Counts → CPM normalization → log1p → PCA(≤100) → 
DNN(hidden: 256→128→64, ReLU, Adam optimizer) → Predicted LFC

📊 Example Results

1️⃣ True vs Predicted LFC

Pearson r = 0.99, Spearman ρ = 1.00, MAE = 0.10.
Predicted fold-changes align almost perfectly with DESeq2 results, demonstrating that DeepFold reproduces both the magnitude and direction of differential expression.

2️⃣ Prediction Error Distribution

Errors are sharply centered at 0, indicating no bias between predicted and true LFC values.

More than 95 % of genes fall within ±0.5 log₂ fold-change error.

3️⃣ Bland–Altman Agreement Plot

Differences between predicted and true LFCs remain consistent across the entire range of expression changes — confirming uniform agreement and no proportional bias.

4️⃣ Cumulative Accuracy Curve

Over 90 % of genes are predicted within |ΔLFC| ≤ 0.5,
showing high fidelity between DeepFold and statistical estimates.

5️⃣ Accuracy vs. Error Threshold

This curve quantifies model accuracy as a function of tolerance.
For example, if the threshold is ±0.25 LFC, the model correctly predicts ~80 % of genes.

🧪 Evaluation Summary

Metric	Value	Interpretation
Pearson r	0.99	Linear correlation between predicted and true LFC
Spearman ρ	1.00	Rank-order agreement
MAE	0.10	Average absolute LFC difference
90 % genes	≤ 0.5	High-confidence accuracy

DeepFoldChange achieves statistical-model-level precision, validating its use as a deep learning surrogate for DE analysis.

⚙️ Installation & Environment Setup

Option 1: Conda (recommended)

conda create -n DeepFold_env python=3.10
conda activate deepfold_env
pip install -r requirements.txt

Option 2: Manual install

pip install pandas numpy scikit-learn scipy joblib matplotlib seaborn

🚀 Running the Pipeline

1️⃣ Prepare input files

filtered_counts_DEGs.csv – normalized or raw counts (rows = genes, columns = samples)
resLFC_p_cut.csv – DESeq2 output containing at least a log2FoldChange column.

2️⃣ Train and Predict

python train_predict_lfc_dnn.py   --counts filtered_counts_DEGs.csv   --lfc resLFC_p_cut.csv   --outdir lfc_dnn_outputs

This will:

train the PCA + DNN model,
evaluate on a hold-out set,
fit on all data,
and generate predictions + metrics.

3️⃣ Plot performance

python plot_deepfold_performance.py

All figures are saved inside lfc_dnn_outputs/.

📘 Output Files Explained

File	Description
holdout_predictions.csv	True vs predicted LFC for test genes
all_predictions_fullfit.csv	Predicted LFC for all DEGs
pca_dnn_lfc_predictor.pkl	Serialized model (scaler + PCA + DNN)
metrics.txt	R², MAE, Spearman metrics summary
plot_*.png	Performance and accuracy figures

🧩 How to Use Trained Model for New Data

Once trained, you can load the model and predict on new normalized counts:

import joblib, pandas as pd
model = joblib.load("lfc_dnn_outputs/pca_dnn_lfc_predictor.pkl")

new_counts = pd.read_csv("your_new_counts.csv", index_col=0)
predicted_lfc = model.predict(new_counts)

📈 Example Interpretation

DeepFoldChange closely approximates DESeq2-derived fold-changes with > 90 % accuracy for |ΔLFC| ≤ 0.5.
The framework offers a reproducible, fast, and model-agnostic solution for high-throughput DE prediction using deep learning.

✨ Citation (suggested)

If you use or adapt this repository, please cite:

Debnath, J. P., et al. (2025). DeepFoldChange: A Deep Learning Framework that Emulates Statistical Models for Differential Gene Expression Analysis. GitHub repository: https://github.com/Prokash21/DeepFoldChange

🧠 Author

Joy Prokash Debnath
Department of Biochemistry and Molecular Biology
Shahjalal University of Science and Technology, Sylhet, Bangladesh

🧾 License

This project is distributed under the MIT License – free for academic and research use.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
DeepFold		DeepFold
MLPRegressor_sudo		MLPRegressor_sudo
RandomForestRegressor_sudo		RandomForestRegressor_sudo
dump		dump
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧬 DeepFoldChange

🌍 Background and Motivation

⚡ Key Features

📁 Repository Structure

🧩 Model Architecture

📊 Example Results

1️⃣ True vs Predicted LFC

2️⃣ Prediction Error Distribution

3️⃣ Bland–Altman Agreement Plot

4️⃣ Cumulative Accuracy Curve

5️⃣ Accuracy vs. Error Threshold

🧪 Evaluation Summary

⚙️ Installation & Environment Setup

Option 1: Conda (recommended)

Option 2: Manual install

🚀 Running the Pipeline

1️⃣ Prepare input files

2️⃣ Train and Predict

3️⃣ Plot performance

📘 Output Files Explained

🧩 How to Use Trained Model for New Data

📈 Example Interpretation

✨ Citation (suggested)

🧠 Author

🧾 License

About

Uh oh!

Releases

Packages

Languages

License

Prokash21/DeepFoldChange

Folders and files

Latest commit

History

Repository files navigation

🧬 DeepFoldChange

🌍 Background and Motivation

⚡ Key Features

📁 Repository Structure

🧩 Model Architecture

📊 Example Results

1️⃣ True vs Predicted LFC

2️⃣ Prediction Error Distribution

3️⃣ Bland–Altman Agreement Plot

4️⃣ Cumulative Accuracy Curve

5️⃣ Accuracy vs. Error Threshold

🧪 Evaluation Summary

⚙️ Installation & Environment Setup

Option 1: Conda (recommended)

Option 2: Manual install

🚀 Running the Pipeline

1️⃣ Prepare input files

2️⃣ Train and Predict

3️⃣ Plot performance

📘 Output Files Explained

🧩 How to Use Trained Model for New Data

📈 Example Interpretation

✨ Citation (suggested)

🧠 Author

🧾 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages