AKI Predict
DEU HOSPITAL · RESEARCH
Research Project

AKI Mortality Prediction Research

A comprehensive retrospective cohort study applying four machine learning models to predict in-hospital mortality risk in ICU patients with Acute Kidney Injury.

2,230
Total ICU Patients
80% / 20%
Train / Test Split
30+
Raw Features
15
Model Features
~72%
Survived
~28%
Deceased
5
MICE Iterations
42
Random Seed
What is Acute Kidney Injury?
Definition

AKI is a sudden reduction in kidney function over hours to days — one of the most frequent and severe complications in ICU patients.

Why It Matters

AKI significantly increases ICU mortality. Early risk stratification enables timely intervention that can be life-saving.

AI Approach

Machine learning models combine laboratory and demographic data to estimate individual mortality risk, supporting clinical decision-making.

Methodology Pipeline
Data LoadingPostgreSQL · deu_retro_clean table

Raw retrospective ICU data loaded directly from a local PostgreSQL instance. The table contains 2,230 patient records with 30+ raw features including demographics, vital signs, and laboratory measurements.

01
EDA & ValidationTarget distribution · missing value analysis

Exploratory data analysis to verify target balance (~72% survived, ~28% deceased), identify missing value patterns, and detect anomalies. ID columns (row_id, patient_id, protocol_no) are automatically removed.

02
PreprocessingCategorical encoding · feature selection

Categorical variables one-hot encoded. Non-predictive identifier columns removed. Feature matrix prepared with the deathflag target variable extracted separately.

03
Stratified Split80% train · 20% test · seed = 42

Stratified train-test split preserves the class ratio in both sets. Fixed random seed (42) ensures full reproducibility. No data from the test set ever touches preprocessing fitting.

04
MICE Imputationmiceforest · 5 iterations · fit on train

Multiple Imputation by Chained Equations via the miceforest library. The imputer is fit exclusively on training data (5 iterations) and then applied to both train and test sets — preventing data leakage.

05
Feature ScalingStandardScaler · fit on train set only

StandardScaler normalizes features to zero mean and unit variance. Fit on train set only, then applied to test set. Required for Logistic Regression and ANN; tree models use unscaled features.

06
Model Training & EvaluationLR · RF · GB · ANN · Metrics

Four models trained in parallel. Evaluated on held-out test set with AUC, Accuracy, F1, Precision, Recall, and MCC. Best model selected by AUC for the prediction interface.

07
Technical Stack
🐍
Python 3.9+
Runtime
🔬
scikit-learn
ML Models
🧬
miceforest
MICE Imputation
🗄️
PostgreSQL
Data Source
📊
pandas / NumPy
Data Processing
📈
matplotlib
Visualizations
💾
joblib
Model Serialization
Streamlit
Original Web UI
Important Notice

This system was developed exclusively for academic research purposes. It must not be used for clinical decision-making, patient triage, or any medical diagnosis. All model outputs are statistical estimates based on retrospective data and must be reviewed by a qualified clinician before any action is taken. The authors accept no liability for clinical outcomes.

Research use only
Not CE/FDA cleared
Requires clinical validation