Research Project

AKI Mortality Prediction Research

A comprehensive retrospective cohort study applying four machine learning models to predict in-hospital mortality risk in ICU patients with Acute Kidney Injury.

DEU Hospital Cohort

2,230

Total ICU Patients

80% / 20%

Train / Test Split

30+

Raw Features

Model Features

~72%

Survived

~28%

Deceased

MICE Iterations

Random Seed

MIMIC-III External Validation Cohort

1,489

Total ICU Patients

80% / 20%

Train / Test Split

Raw Features

Model Features

~52%

Survived

~48%

Deceased

Median

Imputation

Random Seed

What is Acute Kidney Injury?

Definition

AKI is a sudden reduction in kidney function over hours to days — one of the most frequent and severe complications in ICU patients.

Why It Matters

AKI significantly increases ICU mortality. Early risk stratification enables timely intervention that can be life-saving.

AI Approach

Machine learning models combine laboratory and demographic data to estimate individual mortality risk, supporting clinical decision-making.

Methodology Pipeline

Data LoadingPostgreSQL · deu_retro_clean table

Raw retrospective ICU data loaded directly from a local PostgreSQL instance. The table contains 2,230 patient records with 30+ raw features including demographics, vital signs, and laboratory measurements.

EDA & ValidationTarget distribution · missing value analysis

Exploratory data analysis to verify target balance (~72% survived, ~28% deceased), identify missing value patterns, and detect anomalies. ID columns (row_id, patient_id, protocol_no) are automatically removed.

PreprocessingCategorical encoding · feature selection

Categorical variables one-hot encoded. Non-predictive identifier columns removed. Feature matrix prepared with the deathflag target variable extracted separately.

Stratified Split80% train · 20% test · seed = 42

Stratified train-test split preserves the class ratio in both sets. Fixed random seed (42) ensures full reproducibility. No data from the test set ever touches preprocessing fitting.

MICE Imputationmiceforest · 5 iterations · fit on train

Multiple Imputation by Chained Equations via the miceforest library. The imputer is fit exclusively on training data (5 iterations) and then applied to both train and test sets — preventing data leakage.

Feature ScalingStandardScaler · fit on train set only

StandardScaler normalizes features to zero mean and unit variance. Fit on train set only, then applied to test set. Required for Logistic Regression and ANN; tree models use unscaled features.

Model Training & EvaluationLR · RF · GB · ANN · Metrics

Four models trained in parallel. Evaluated on held-out test set with AUC, Accuracy, F1, Precision, Recall, and MCC. Best model selected by AUC for the prediction interface.

Technical Stack

Python 3.9+

Runtime

scikit-learn

ML Models

miceforest

MICE Imputation

PostgreSQL

Data Source

pandas / NumPy

Data Processing

matplotlib

Visualizations

joblib

Model Serialization

Next.js 16

Web Application

Important Notice

This system was developed exclusively for academic research purposes. It must not be used for clinical decision-making, patient triage, or any medical diagnosis. All model outputs are statistical estimates based on retrospective data and must be reviewed by a qualified clinician before any action is taken. The authors accept no liability for clinical outcomes.

Research use only

Not CE/FDA cleared

Requires clinical validation