A Deep Learning Framework that Emulates Statistical Models for Differential Gene Expression Analysis
Traditional RNA-seq differential expression (DE) analysis relies on statistical tools such as DESeq2, edgeR, and limma-voom.
While these methods are robust, they can be computationally intensive, require repeated normalization, and are not easily generalizable across datasets.
DeepFoldChange offers a new perspective β a deep neural network (DNN) framework that learns to imitate the statistical inference process used in DESeq2.
By training on DESeq2-derived log2FoldChange (LFC) values and expression count matrices, DeepFold accurately predicts fold-changes for unseen genes, effectively serving as an AI-driven surrogate model for DE analysis.
- π§ Deep Neural Emulation β Learns DESeq2-style fold-change behavior directly from count matrices.
- π§© Lightweight & Scalable β Uses PCA-compressed expression features for faster training.
- π Model Evaluation Suite β Automatically generates regression, error, and agreement plots.
- π» Reproducible Pipeline β Compatible with any RNA-seq dataset that includes count and LFC tables.
DeepFold/
β
βββ train_predict_lfc_dnn.py # Main training + prediction script
βββ plot_deepfold_performance.py # Performance plotting utility
βββ requirements.txt # Package dependencies
βββ lfc_dnn_outputs/ # Folder containing model outputs
β βββ holdout_predictions.csv
β βββ all_predictions_fullfit.csv
β βββ pca_dnn_lfc_predictor.pkl
β βββ metrics.txt
β βββ plot_true_vs_pred_scatter.png
β βββ plot_error_distribution.png
β βββ plot_bland_altman.png
β βββ plot_cumulative_accuracy.png
β βββ plot_accuracy_vs_threshold.png
β βββ plot_accuracy_bar.png
βββ README.md
DeepFoldChange uses a PCA β DNN regression pipeline:
Counts β CPM normalization β log1p β PCA(β€100) β
DNN(hidden: 256β128β64, ReLU, Adam optimizer) β Predicted LFC
Pearson r = 0.99, Spearman Ο = 1.00, MAE = 0.10.
Predicted fold-changes align almost perfectly with DESeq2 results, demonstrating that DeepFold reproduces both the magnitude and direction of differential expression.
Errors are sharply centered at 0, indicating no bias between predicted and true LFC values.
More than 95 % of genes fall within Β±0.5 logβ fold-change error.
Differences between predicted and true LFCs remain consistent across the entire range of expression changes β confirming uniform agreement and no proportional bias.
Over 90 % of genes are predicted within |ΞLFC| β€ 0.5,
showing high fidelity between DeepFold and statistical estimates.
This curve quantifies model accuracy as a function of tolerance.
For example, if the threshold is Β±0.25 LFC, the model correctly predicts ~80 % of genes.
| Metric | Value | Interpretation |
|---|---|---|
| Pearson r | 0.99 | Linear correlation between predicted and true LFC |
| Spearman Ο | 1.00 | Rank-order agreement |
| MAE | 0.10 | Average absolute LFC difference |
| 90 % genes | β€ 0.5 | High-confidence accuracy |
DeepFoldChange achieves statistical-model-level precision, validating its use as a deep learning surrogate for DE analysis.
conda create -n DeepFold_env python=3.10
conda activate deepfold_env
pip install -r requirements.txtpip install pandas numpy scikit-learn scipy joblib matplotlib seabornfiltered_counts_DEGs.csvβ normalized or raw counts (rows = genes, columns = samples)resLFC_p_cut.csvβ DESeq2 output containing at least alog2FoldChangecolumn.
python train_predict_lfc_dnn.py --counts filtered_counts_DEGs.csv --lfc resLFC_p_cut.csv --outdir lfc_dnn_outputsThis will:
- train the PCA + DNN model,
- evaluate on a hold-out set,
- fit on all data,
- and generate predictions + metrics.
python plot_deepfold_performance.pyAll figures are saved inside lfc_dnn_outputs/.
| File | Description |
|---|---|
| holdout_predictions.csv | True vs predicted LFC for test genes |
| all_predictions_fullfit.csv | Predicted LFC for all DEGs |
| pca_dnn_lfc_predictor.pkl | Serialized model (scaler + PCA + DNN) |
| metrics.txt | RΒ², MAE, Spearman metrics summary |
| plot_*.png | Performance and accuracy figures |
Once trained, you can load the model and predict on new normalized counts:
import joblib, pandas as pd
model = joblib.load("lfc_dnn_outputs/pca_dnn_lfc_predictor.pkl")
new_counts = pd.read_csv("your_new_counts.csv", index_col=0)
predicted_lfc = model.predict(new_counts)DeepFoldChange closely approximates DESeq2-derived fold-changes with > 90 % accuracy for |ΞLFC| β€ 0.5.
The framework offers a reproducible, fast, and model-agnostic solution for high-throughput DE prediction using deep learning.
If you use or adapt this repository, please cite:
Debnath, J. P., et al. (2025). DeepFoldChange: A Deep Learning Framework that Emulates Statistical Models for Differential Gene Expression Analysis. GitHub repository: https://github.com/Prokash21/DeepFoldChange
Joy Prokash Debnath
Department of Biochemistry and Molecular Biology
Shahjalal University of Science and Technology, Sylhet, Bangladesh
This project is distributed under the MIT License β free for academic and research use.




