Complete Technical Documentation β Dataset, Models, Training, Results & Comparison
To develop an explainable hybrid deep learning framework that accurately detects and stages Ankylosing Spondylitis from sacroiliac joint MRI scans, providing automated segmentation, multi-stage classification, and interpretable visual explanations to support clinical decision-making.
Deep dive into CNNs, Vision Transformers, Hybrid architectures, and XAI techniques for AS diagnosis
Source MRI images from Kaggle Lumbar Coordinate Pretraining dataset, preprocess and organize
Resize, normalize, generate masks, apply feature-based labeling, split into train/test sets
Build Attention U-Net, Simple CNN, and Hybrid CNN-Transformer architectures
Measure accuracy, precision, recall, F1-score, IoU, generate confusion matrices
Implement Grad-CAM and focused Grad-CAM for sacroiliac joint region visualization
Build Flask application with authentication, upload, prediction, and history features
Compile findings, prepare technical report, finalize Dockerfile for deployment
This project implements an end-to-end AI-powered diagnostic framework for detecting and staging Ankylosing Spondylitis (AS) from sacroiliac joint MRI scans. It addresses three critical gaps identified in existing literature (see Section 9):
Source: Kaggle: Lumbar Coordinate Pretraining Dataset
The original raw dataset contains lumbar spine MRI scans in NumPy (.npy) and JPEG formats from 4 medical imaging sources:
| Source Folder | Files (.npy) | Files (.jpg) | Description |
|---|---|---|---|
processed_lsd | 516 | 516 | Lumbar Spine Degeneration dataset |
processed_spider | 211 | 210 | SPIDER spine segmentation dataset |
processed_osf | 35 | 35 | Open Science Framework spine images |
processed_tseg | 479 | 479 | T-SEG thoracolumbar segmentation dataset |
| Total | 1,241 | 1,240 |
Additional CSV files provide spine coordinate annotations:
coords_pretrain.csv (277 KB) β Maps filename β source β x, y coordinates β spine level (L1/L2 to L5/S1)coords_rsna_improved.csv (5.2 MB) β RSNA improved coordinates with conditions (Neural Foraminal Narrowing, etc.)From the original 1,241 raw MRI files, 900 images were selected and processed through the following pipeline:
dataset.csv) was created with image paths, mask paths, AS status labels, stage labels, bounding box coordinates.Dataset/ βββ images/ # 900 PNG files (img_0000.png to img_0899.png) β βββ img_0000.png # 256Γ256 grayscale MRI β βββ img_0001.png β βββ ... (900 files) βββ masks/ # 900 PNG files (mask_0000.png to mask_0899.png) β βββ mask_0000.png # 256Γ256 binary segmentation mask (~815 bytes each) β βββ mask_0001.png β βββ ... (900 files) βββ annotations/ β βββ dataset.csv # 900 rows with labels and bounding boxes βββ dataset_info.txt # Dataset metadata summary
| Column | Type | Description | Example |
|---|---|---|---|
| image_id | String | Unique image identifier | img_0000.png |
| image_path | String | Relative path to image | images/img_0000.png |
| mask_path | String | Relative path to segmentation mask | masks/mask_0000.png |
| AS_status | Integer | 0 = Negative, 1 = Positive | 1 |
| stage | Integer | 0 = Normal, 1 = Early, 2 = Moderate, 3 = Advanced | 1 |
| stage_name | String | Human-readable stage label | Early |
| bbox_x1, bbox_y1 | Integer | Top-left bounding box corner | 76, 102 |
| bbox_x2, bbox_y2 | Integer | Bottom-right bounding box corner | 179, 204 |
| Class | Count | % |
|---|---|---|
| AS Positive | 469 | 52.1% |
| AS Negative | 431 | 47.9% |
| Stage | Count | % |
|---|---|---|
| Normal (0) | 557 | 61.9% |
| Early (1) | 111 | 12.3% |
| Moderate (2) | 129 | 14.3% |
| Advanced (3) | 103 | 11.4% |
Since the original Kaggle dataset did not include AS-specific labels, a two-iteration feature-based labeling approach was developed to assign clinically motivated labels based on actual image characteristics.
Three features were extracted from each image:
| Feature | Method | Thresholds |
|---|---|---|
| Brightness | np.mean(img) | 45th percentile |
| Texture | np.std(img) | 55th percentile |
| Structure (Edge Density) | cv2.Canny(img, 50, 150) | 30th/50th/70th percentiles for staging |
Rule: AS+ if brightness < 45th percentile AND texture > 55th percentile
Four features were extracted and combined into a composite score:
| Feature | Method | Weight in Score | Clinical Rationale |
|---|---|---|---|
| Mean Intensity | np.mean(img) | 40% of brightness | Overall tissue density |
| Lower-Half Intensity | Mean of bottom 50% of image | 60% of brightness | Sacroiliac region is in lower spine |
| Texture (Std Dev) | np.std(img) | 40% of total | Inflammatory changes vary texture |
| Edge Density | cv2.Canny(img, 50, 150) | 30% of total | Structural damage increases edges |
# Combined score formula brightness_scores = mean_intensity Γ 0.4 + lower_half_intensity Γ 0.6 combined_score = brightness_scores Γ 0.3 + texture_scores Γ 0.4 + edge_density Γ 0.3 # Binary: AS+ if score > 52nd percentile threshold_binary = np.percentile(combined_score, 52) # Staging (for AS+ images only): # Advanced (3): score > 85th percentile # Moderate (2): score > 70th percentile # Early (1): score > 60th percentile # Normal (0): below 60th percentile (AS+ but no clear stage)
Resulting balanced distribution:
| Label | Count | Result |
|---|---|---|
| AS Positive | 432 | Well balanced β |
| AS Negative | 468 | Well balanced β |
| Stage 0 (Normal) | 540 | Includes all AS- (468) + some AS+ (72) |
| Stage 1 (Early) | 90 | AS+ with moderate scores |
| Stage 2 (Moderate) | 135 | AS+ with high scores |
| Stage 3 (Advanced) | 135 | AS+ with highest scores |
| Split | Count | Ratio |
|---|---|---|
| Training | 720 images | 80% |
| Testing / Validation | 180 images | 20% |
Method: train_test_split(test_size=0.2, random_state=42, stratify=y_binary) β stratified on binary labels to maintain class proportions in both sets.
Purpose: Automatic segmentation of sacroiliac joint regions β eliminates manual ROI extraction.
| Property | Value |
|---|---|
| Parameters | 7,869,572 |
| Input Shape | (256, 256, 1) β grayscale |
| Output | (256, 256, 1) β sigmoid binary mask |
| File | attention_unet_model.keras (~94.6 MB) |
| Encoder Filters | 64 β 128 β 256 β 512 (bottleneck) |
| Decoder | 3 levels with UpSampling2D + Conv2D(2) + Attention Gate + Concatenate |
| Activation | ReLU (hidden), Sigmoid (output) |
| Notebook Cell | Cell 3 (build) + Cell 4 (train) |
Encoder Path (each level = 2Γ Conv2D + MaxPool2D): Level 1: Conv(64) β Conv(64) β Pool # 256β128 Level 2: Conv(128) β Conv(128) β Pool # 128β64 Level 3: Conv(256) β Conv(256) β Pool # 64β32 Bottleneck: Conv(512) β Conv(512) # 32Γ32 Decoder Path (each level = UpSample + AttGate + Concat + 2Γ Conv2D): Up Level 3: UpSample(2) β Conv(256,2) β AttentionGate(conv3, up, 256) β Concat β Conv(256)Γ2 Up Level 2: UpSample(2) β Conv(128,2) β AttentionGate(conv2, up, 128) β Concat β Conv(128)Γ2 Up Level 1: UpSample(2) β Conv(64,2) β AttentionGate(conv1, up, 64) β Concat β Conv(64)Γ2 Output: Conv2D(1, kernel=1, activation='sigmoid')
Attention Gate Mechanism:
ΞΈ(x) = Conv2D(inter_ch, 1)(skip_connection) # Transform skip features Ο(g) = Conv2D(inter_ch, 1)(gating_signal) # Transform decoder features Ο = sigmoid(Conv2D(1, 1)(relu(ΞΈ + Ο))) # Compute attention coefficients output = skip_connection Γ Ο # Apply learned attention weighting
Why Attention U-Net? Standard U-Net passes all skip connection features equally. The attention gates learn to suppress irrelevant background regions (muscle, fat) and highlight the small sacroiliac joint structures, which is critical since the joint occupies only ~15-20% of the full MRI field of view.
Purpose: Dual-output classification β simultaneous AS detection + disease stage classification. β Best Performing Model
| Property | Value |
|---|---|
| Parameters | 16,870,790 |
| Input Shape | (256, 256, 1) |
| Outputs | 2 heads: binary_output (2 classes, softmax) + stage_output (4 classes, softmax) |
| File | cnn_classifier_model.keras (~202.5 MB) |
| Notebook Cell | Cell 7 (build_simple_cnn) |
Input(256, 256, 1) β Conv2D(32, 3, relu, same) β MaxPooling2D(2) # 256β128 β Conv2D(64, 3, relu, same) β MaxPooling2D(2) # 128β64 β Conv2D(128, 3, relu, same) β MaxPooling2D(2) # 64β32 β Flatten() # 32Γ32Γ128 = 131,072 β Dense(128, relu) β Dropout(0.5) βββ Dense(2, softmax) β binary_output [AS Negative / AS Positive] βββ Dense(4, softmax) β stage_output [Normal / Early / Moderate / Advanced]
Design choice: Despite its simplicity, this 3-block CNN significantly outperformed the more complex Hybrid model. The large Flatten+Dense layer (131,072β128) gives it strong discriminative power. The dual-output design allows simultaneous binary detection and staging from a single forward pass, with loss_weights={'binary': 1.0, 'stage': 0.5} prioritizing correct AS detection.
Purpose: Combine EfficientNetB0 CNN features with actual Vision Transformer attention blocks for global context.
| Property | Value |
|---|---|
| Parameters | ~5.2M (approximate) |
| Backbone | EfficientNetB0 (ImageNet, frozen) |
| Transformer | 2 Γ MultiHeadAttention blocks (4 heads, key_dim=256) |
| File | classifier_best_model.keras (~162.9 MB) |
| Notebook Cell | Cell 5 (build_hybrid_cnn_transformer) |
Input(256, 256, 1) β Conv2D(3,1,same) # GrayscaleβRGB adapter β EfficientNetB0(frozen, imagenet) # Feature extraction β GlobalAveragePooling2D() # (batch, 1280) β Reshape(1, 1280) # Sequence for Transformer β TransformerBlock(4 heads, mlp_dim=256) # Self-attention + MLP + residual β TransformerBlock(4 heads, mlp_dim=256) # Second Transformer layer β Flatten() β Dense(256, relu) β Dropout(0.3) βββ Dense(2, softmax) β binary_output βββ Dense(4, softmax) β stage_output
Transformer Block internals:
LayerNorm β MultiHeadAttention(4 heads) β Residual Add LayerNorm β Dense(mlp_dim, relu) β Dense(original_dim) β Residual Add
Performance: Binary: 54.44%, Stage: 49.44% (early stopped at epoch 21/41)
Purpose: Simplified version replacing Transformer blocks with dense layers.
| Property | Value |
|---|---|
| Parameters | 4,838,319 |
| Backbone | EfficientNetB0 (ImageNet, frozen) |
| Post-CNN | Dense layers only (no Transformer blocks) |
| File | hybrid_cnn_transformer_model.keras (~26.5 MB) |
| Notebook Cell | Cell 15 (build_hybrid_cnn_transformer_v2) |
Input(256, 256, 1) β Conv2D(3,1,same) # GrayscaleβRGB adapter β EfficientNetB0(frozen, imagenet) # Feature extraction β GlobalAveragePooling2D() # (batch, 1280) β Dense(512, relu) β Dropout(0.4) β Dense(256, relu) β Dropout(0.3) βββ Dense(2, softmax) β binary_output βββ Dense(4, softmax) β stage_output
Performance: Binary: 52.22%, Stage: 57.78% (early stopped at epoch 16, best epoch 1)
base_model.trainable = False), meaning it could not adapt its ImageNet features (trained on natural images like cats, dogs, cars) to the very different domain of medical MRI scans. Additionally, the single-token sequence (1Γ1280 after GAP) provided minimal benefit from the Transformer attention mechanism. Fine-tuning the last few EfficientNet layers would likely improve results significantly.
| Model | File | Size | Type | Cell | Status |
|---|---|---|---|---|---|
| Attention U-Net | attention_unet_model.keras | 94.6 MB | Segmentation | 3+4 | Deployed |
| U-Net Best Checkpoint | unet_best_model.keras | 94.6 MB | Segmentation | 4 | Deployed |
| Simple CNN Classifier β | cnn_classifier_model.keras | 202.5 MB | Classifier | 7+10 | Primary |
| Hybrid v1 (Transformer) | classifier_best_model.keras | 162.9 MB | Classifier | 5+6 | Needs tuning |
| Hybrid v2 (Dense) | hybrid_cnn_transformer_model.keras | 26.5 MB | Classifier | 15 | Needs tuning |
Total model storage: ~581.1 MB
pd.read_csv(), images read via cv2.imread(IMREAD_GRAYSCALE), normalized to [0,1], reshaped to (N, 256, 256, 1). Train/test split with stratification.| Parameter | Attention U-Net | Simple CNN β | Hybrid v1 | Hybrid v2 |
|---|---|---|---|---|
| Optimizer | Adam (default lr) | Adam (lr=0.001) | Adam (default) | Adam (lr=0.001) |
| Loss | Binary CE | Sparse Cat. CE (Γ2) | Sparse Cat. CE (Γ2) | Sparse Cat. CE (Γ2) |
| Loss Weights | N/A | binary:1.0, stage:0.5 | binary:1.0, stage:0.5 | binary:1.0, stage:0.5 |
| Max Epochs | 50 | 100 | 50 | 50 |
| Batch Size | 16 | 32 | 16 | 16 |
| EarlyStopping Monitor | val_loss | val_binary_output_accuracy | val_binary_output_accuracy | val_binary_output_accuracy |
| EarlyStopping Patience | 10 | 20 | 15 | 15 |
| ReduceLROnPlateau | Yes (factor=0.5, patience=5) | Yes (factor=0.5, patience=7) | Yes (factor=0.5, patience=5) | Yes (factor=0.5, patience=5) |
| ModelCheckpoint | Yes (unet_best_model.keras) | No | No | No |
| Actual Epochs | ~50 (full run) | 26 (best @ epoch 6) | 41 (best @ epoch 21) | 16 (best @ epoch 1) |
| Library | Version |
|---|---|
| Python | 3.x (Kaggle) |
| TensorFlow | 2.19.0 |
| Keras | 3.10.0 |
| NumPy | 2.0.2 |
| Pandas | 2.2.2 |
| OpenCV | 4.12.0 |
| Scikit-learn | 1.6.1 |
| Matplotlib | 3.10.0 |
| Seaborn | 0.13.2 |
| Component | Spec |
|---|---|
| Platform | Kaggle Notebooks |
| GPU | 1Γ GPU (T4 / P100) |
| RAM | ~16 GB |
| CUDA | Enabled |
| Storage | Kaggle workspace |
| Model | Type | Parameters | Binary Acc | Stage Acc | IoU | Status |
|---|---|---|---|---|---|---|
| Attention U-Net | Segmentation | 7,869,572 | 89.35% (val) | β | 0.5652 | β Deployed |
| Simple CNN β | Classification | 16,870,790 | 96.67% | 82.22% | β | β Primary |
| Hybrid v1 | Classification | ~5.2M | 54.44% | 49.44% | β | β οΈ Needs tuning |
| Hybrid v2 | Classification | 4,838,319 | 52.22% | 57.78% | β | β οΈ Needs tuning |
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| AS Negative | 0.99 | 0.95 | 0.97 | 94 |
| AS Positive | 0.94 | 0.99 | 0.97 | 86 |
| Overall Accuracy | 0.97 (96.67%) | 180 | ||
| Macro Avg | 0.97 | 0.97 | 0.97 | 180 |
| Weighted Avg | 0.97 | 0.97 | 0.97 | 180 |
| Stage | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Normal (0) | 0.87 | 1.00 | 0.93 | 104 |
| Early (1) | 0.00 | 0.00 | 0.00 | 18 |
| Moderate (2) | 0.61 | 0.73 | 0.67 | 26 |
| Advanced (3) | 0.93 | 0.78 | 0.85 | 32 |
| Accuracy | 82.22% | 180 | ||
| Macro Avg | 0.60 | 0.63 | 0.61 | 180 |
| Weighted Avg | 0.75 | 0.82 | 0.78 | 180 |
Two Grad-CAM implementations provide visual explanations for the model's classification decisions:
1. Create sub-model: [input] β [last_conv_layer_output, model_predictions] 2. Forward pass with GradientTape 3. Compute gradients: d(predicted_class_score) / d(last_conv_layer_output) 4. Global average pooling of gradients β per-channel importance weights 5. Weighted combination: heatmap = conv_output @ pooled_grads 6. ReLU activation (keep only positive influence) + normalize to [0,1] 7. Resize to input dimensions, apply JET colormap, overlay with alpha=0.4
# After standard Grad-CAM computation, apply sacroiliac joint ROI mask: roi_y_start = int(height Γ 0.5) # Focus on lower 50% of image roi_y_end = int(height Γ 0.9) # Down to 90% (avoiding edge) roi_x_start = int(width Γ 0.3) # Central 40% horizontally roi_x_end = int(width Γ 0.7) mask = np.zeros_like(heatmap) mask[roi_y_start:roi_y_end, roi_x_start:roi_x_end] = 1.0 heatmap_focused = heatmap Γ mask heatmap_focused = heatmap_focused / (max(heatmap_focused) + 1e-8)
Target Conv Layer: conv2d_49 (last convolutional layer in the Simple CNN)
Clinical Value: Standard Grad-CAM may highlight any discriminative region including background artifacts. The focused variant ensures explanations align with the sacroiliac joint area β where radiologists actually look for AS signs like bone marrow edema, erosion, and ankylosis.
predict.py and app.py):uploads/img = cv2.imread(path, cv2.IMREAD_GRAYSCALE) img = cv2.resize(img, (256, 256)) img = img / 255.0 img = np.expand_dims(img, axis=(0, -1)) # β shape (1, 256, 256, 1)
static/results/{uuid}/, metadata to SQLite DB| Key | Display Name | Type | File |
|---|---|---|---|
| attention_unet | Attention U-Net | Segmentation | attention_unet_model.keras |
| unet_best | U-Net Best | Segmentation | unet_best_model.keras |
| cnn_classifier | CNN Classifier | Classifier | cnn_classifier_model.keras |
| classifier_best | Best Classifier | Classifier | classifier_best_model.keras |
| hybrid_cnn_transformer | Hybrid CNN-Transformer | Classifier | hybrid_cnn_transformer_model.keras |
Users can select multiple models simultaneously for side-by-side comparison.
| Study | Dataset | Model | Task | Performance | Limitations |
|---|---|---|---|---|---|
| Lee et al. (2023) | 296 patients, 4,746 slices | Faster R-CNN + VGG-19 | Detect sacroiliitis | AUROC ~0.83, Sens ~0.725, Spec ~0.936 | Binary only, no staging |
| Xie et al. (2025) | 1,294 patients, 4 centers | ResNet50 + KNN-11 | axSpA classification | AUC ~0.912, Acc ~86.9% | Manual ROI, binary only |
| Zhou et al. (2024) | 485 patients | 3D U-Net + ResNet50 + ensemble | Diagnose sacroiliitis | AUC ~0.910, Acc ~85.6% | No stage classification |
| Bordner et al. (2023) | 362 images (DESIR) | Deep Learning | BME & sacroiliitis | β | Binary only |
| Deep Learning Chris (2023) | 326 axSpA + 63 NSBP | Attention U-Net | Segmentation/Detection | AUC ~0.96, Sens ~0.90, Spec ~0.93 | Segmentation only, no classification |
| Kumar et al. (2025) | β | Transfer Learning | AS detection | β | Binary only, no staging |
| Kocaoglu (2025) | β | FPGA DL | AS detection | β | Hardware-specific |
| Manikandan et al. (2023) | β | ASNET | AS diagnosis | β | No visual explainability |
| Our Framework | 900 images | Att. U-Net + CNN + Hybrid + Grad-CAM | Seg + Binary + Staging | Binary: 96.67%, Stage: 82.22%, IoU: 0.5652 | Fully automated, explainable, web-deployed |
| Feature | Most Existing Systems | Our Framework |
|---|---|---|
| ROI Extraction | Manual / Semi-automatic | β Fully automatic (Attention U-Net) |
| Classification Type | Binary only (AS+/AS-) | β Binary + 4-stage severity |
| Explainability | Probability scores only | β Region-focused Grad-CAM heatmaps |
| Multi-model Comparison | Single model | β 5 selectable models |
| Web Deployment | Research code only | β Flask app with auth + history |
| Clinician-Verified Labels | β Radiologist annotations | β οΈ Feature-based (synthetic) |
| Dataset Size | 296β1,294 patients | β οΈ 900 images (single source) |
| Clinical Validation | β Some prospective studies | β Not yet clinically validated |
| Component | Minimum | Recommended |
|---|---|---|
| CPU | Intel i5 / AMD Ryzen 5 | Intel i7 / AMD Ryzen 7 (multi-core) |
| GPU | Not required (CPU inference) | NVIDIA with CUDA (GTX 1660+ / RTX 2060+) |
| RAM | 4 GB | 16 GB (32 GB for training) |
| Storage | 1 GB (models only) | 500 GB SSD (models + dataset + outputs) |
| Software | Version |
|---|---|
| OS | Windows 10/11, Linux (Ubuntu 20.04+), macOS |
| Python | 3.8+ (3.12 recommended) |
| TensorFlow | 2.19.0 |
| Keras | 3.10.0 |
| OpenCV | 4.12.0 |
| Flask | Latest |
| NumPy / Pandas / Scikit-learn | Latest compatible |
| Docker (optional) | For containerized deployment |
| Model File | Size |
|---|---|
| attention_unet_model.keras | 94.6 MB |
| unet_best_model.keras | 94.6 MB |
| cnn_classifier_model.keras | 202.5 MB |
| classifier_best_model.keras | 162.9 MB |
| hybrid_cnn_transformer_model.keras | 26.5 MB |
| Total | ~581.1 MB |
Models stored in models/ directory. Git-ignored due to size; must be transferred separately for deployment.
| Component | Technology | Details |
|---|---|---|
| Backend | Flask (Python) | app.py β routes, auth, file handling |
| ML Inference | TensorFlow/Keras | predict.py β preprocessing, prediction, Grad-CAM |
| Model Management | Custom | model_loader.py β lazy loading, caching, model registry |
| Database | SQLite | database.py β users + predictions tables |
| Auth | Session-based | SHA-256 password hashing, Flask sessions |
| Frontend | HTML/CSS templates | 6 pages: index, login, signup, upload, results, history |
| File Storage | Local filesystem | uploads/ for input, static/results/{uuid}/ for outputs |
| Port | 8000 | Configurable |
| Containerization | Docker | Dockerfile + docker-compose.yml available |
CREATE TABLE users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
email TEXT UNIQUE NOT NULL,
password TEXT NOT NULL, -- SHA-256 hash
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE predictions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
uuid TEXT UNIQUE, -- Unique run identifier
user_id INTEGER NOT NULL,
image_path TEXT NOT NULL,
as_status TEXT NOT NULL, -- "AS Positive" / "AS Negative"
stage TEXT NOT NULL, -- "Normal" / "Early" / "Moderate" / "Advanced"
confidence REAL, -- Binary confidence %
stage_confidence REAL, -- Stage confidence %
segmentation_mask TEXT, -- Path to predicted mask image
gradcam_overlay TEXT, -- Path to Grad-CAM overlay image
segmentation_overlay TEXT, -- Path to segmentation overlay image
model_results TEXT, -- JSON of all model outputs
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id)
);
| File | Lines | Purpose |
|---|---|---|
app.py | ~173 | Flask routes: signup, login, upload, predict, history, results |
predict.py | ~202 | Core ML: preprocessing, segmentation, classification, Grad-CAM |
model_loader.py | ~57 | Model registry, lazy loading, caching |
database.py | ~145 | SQLite operations: init, CRUD for users and predictions |
ankylosings.ipynb | 18 cells | Full training notebook: data loading β training β evaluation β Grad-CAM |
| Component | Status | Details |
|---|---|---|
| Dataset Collection | Complete | 900 images curated from Kaggle lumbar dataset |
| Feature-Based Labeling | Complete | Balanced labels based on image characteristics |
| Attention U-Net (Segmentation) | Complete | 89.35% val accuracy, IoU 0.5652 |
| Simple CNN Classifier | Complete | 96.67% binary, 82.22% stage β deployed as primary |
| Hybrid CNN-Transformer v1 | Needs Improvement | 54.44% binary β frozen backbone limits performance |
| Hybrid CNN-Transformer v2 | Needs Improvement | 52.22% binary β frozen backbone limits performance |
| Standard Grad-CAM | Complete | Working for CNN classifier |
| Focused Grad-CAM | Complete | Sacroiliac joint ROI-focused variant |
| Flask Web App | Complete | Auth, upload, predict, history β all working |
| Docker Deployment | Complete | Dockerfile + docker-compose.yml ready |
| Early Stage Detection | Needs Work | 0% recall for Stage 1 (Early) β class imbalance issue |
| Clinical Validation | Not Started | No radiologist-verified labels or prospective trials |
| Transformer Attention Maps | Not Started | Visualizing Transformer self-attention patterns |
| Issue | Impact | Proposed Solution |
|---|---|---|
| Frozen EfficientNetB0 Backbone | Hybrid models stuck at ~52% (near random) | Unfreeze last 20-30 layers of EfficientNetB0, use differential learning rates (base: 1e-5, head: 1e-3) |
| Early Stage (Stage 1) 0% Recall | Model cannot detect early AS | Oversample early-stage images (SMOTE/augmentation), use focal loss instead of CE, add class weights |
| Synthetic Labels | Labels don't reflect true clinical ground truth | Partner with radiology department for expert annotations on subset; validate feature-label correlation |
| Single MRI Modality | Misses clinical context | Add HLA-B27 status, CRP levels, patient age/gender as additional inputs (multimodal fusion) |
| 2D Slice Analysis Only | Misses inter-slice continuity | Implement 3D CNN or use multiple adjacent slices as input channels |
| Small Dataset (900 images) | Limited generalization | Expand to multi-center datasets (DESIR, ASAS cohorts), apply heavy augmentation |
| U-Net IoU = 0.5652 | Moderate segmentation quality | Use Dice loss + BCE combined, increase training data, add boundary-aware loss |
Definitions of key machine learning and medical imaging terms used in this documentation.