AI-Powered Pathology Assistant - WBC Analyzer Changelog
Versioned update notes for the WBC Analyzer project covering model improvements, domain adaptation, and deployment changes.
AI-Powered Pathology Assistant - WBC Analyzer Changelog
CHANGELOG – Final Report Updates
Part – 1
Newly Added Features and Components
Inference Time Domain Adaptation Pipeline — Part 3.5
- A three-stage adaptation pipeline has been added that can be implemented without the need to retrain the model. - Steps: Reinhard color normalization; Binary routing (restriction from 5 classes to Lymphocyte/Neutrophil space); Mild Test-Time Augmentation (TTA).
- A net improvement of +31.19 points was achieved in TestB.
Shortcut Learning Resistant Training Architecture — Part 3.6
- A new training strategy was developed with the
train_shortcut_resistant.pyscript. - 20% Foreground Cropping and 15% Background Noise Injection were applied to the images. - An XAIFocusMonitor callback class was added that performs autonomous focus measurement via Grad-CAM at the end of the epoch.
Background Segmentation
- OpenCV-based background masking (Otsu Thresholding + Morphological Operations) integrated into the shortcut-resistant training architecture.
Ablation Analysis and Comparative Experiments
Summary Results
- Baseline (TestB): 56.96%
- Binary routing: 73.90% (+16.94 pp)
- Reinhard: 86.46% (+12.56 pp)
- TTA (final): 88.15% (+1.69 pp)
Experiment Comparison Table
| Configuration | TestB Accuracy | Change |
|---|---|---|
| Baseline | 56.96% | — |
| Binary routing | 73.90% | +16.94 pp |
| Reinhard | 86.46% | +12.56 pp |
| TTA (final) | 88.15% | +1.69 pp |
Changed and Updated Content
Test Set Accuracy
- TestA accuracy: 93.4% → 98.53% (+5.13 points)
- TestB accuracy (overall): 89% → 88.15%
- TestB accuracy (baseline → adaptation): 56.96% → 88.15%
Classification Report Updates
- Neutrophil Precision (TestA): 0.987 → 0.9958 (+0.009)
- Weighted avg F1 (TestA): 0.956 → 0.9853 (+0.029)
| Metric | Before | After | Change |
|---|---|---|---|
| Basophil Precision | 0.937 | 1.0000 | +0.063 |
| Monocyte F1 | 0.866 | 0.9376 | +0.0716 |
| Eosinophil F1 | 0.849 | 0.9541 | +0.1051 |
| Lymphocyte F1 | 0.964 | 0.9865 | +0.0225 |
Confidence Score and Response Time
- Neutrophil test confidence score: 99.8% → 97.7%
- Response time: ~200 ms (unchanged)
Preprocessing Flowchart updated
Domain Shift Analysis and Constraints
- It was shown that the largest source of domain shift is staining/color variation and that this is corrected with Reinhard normalization. - It was noted that the Raabin-WBC dataset only includes Giemsa staining and that full-size WSI images are not yet supported.
Discussion Section Expansions
The discussion section in the final report was divided into 7 subheadings:
- Literature Comparison
- Class Imbalance
- Shortcut Learning and XAI
- Domain Shift and TTA
- Clinical MLOps
- Original Contributions
- Limitations
Reference Updates
References: 27 → 43
Added fields: Domain shift, TTA, Shortcut learning (Panboonyuen 2026, Geirhos 2020, Bassi 2024), Clinical MLOps (Spadacini 2026, Ali 2026), LLM-based XAI control (Mermigkis 2026), Stain-aware domain alignment (Li 2026)
Abstract Updates
Key points added to the abstract:
Domain shift management
- TestB baseline 56.96% → after adaptation 88.15%
- Shortcut-Resistant Training
- +31.19 points improvement
New Sections Added to the Final Report
- § 3.5 — Domain Shift Management and Inference Time Improvements
- § 3.6 — Shortcut-Resistant Training Architecture
- § 4.1.2 — Inter-Experimental Comparative Analysis and Shortcut Learning Finding
- § 4.1.3 — Inference-Time Domain Adaptation and Ablation Analysis
- § 5.3–5.7 — Discussion subsections (XAI, domain shift, MLOps, contributions, limitations)
Part – 2
Model and Provider Configuration
- The primary model was configured as openai/gpt-4o via GitHub Models (with GITHUB_TOKEN).
- The secondary (fallback) provider was retained as Gemini 2.5-flash.
- The backend now returns the actual provider used for each response.
- The model badge (agentReportModelBadge) in the frontend has been made dynamic and shows the actual provider.
Token and Environment Variable Management
- The old MODEL_TOKEN has been completely removed.
- A normalized, secure read mechanism has been added for GITHUB_TOKEN and the optional GEMINI_API_KEY with the
get_env_tokenfunction.
Multimodal and Fallback Logic
- Multimodal calls are wrapped with
try/except. - Automatic text-only fallback is applied in case of visual support rejection or error. - Response parsing process strengthened (
robust parsing,extract_completion_text).
Model Parameters and Error Handling
- GitHub/OpenAI model parameters updated (e.g.,
max_tokenscompliance ensured). - Parameter mismatches leading to 400 series errors fixed.
Response Reliability and Backup Mechanism
- Response parsing mechanism made more robust.
- A deterministic, rule-based final fallback layer (build_rule_based_report) capable of producing a result in every case has been added.
Prompt and Style Improvements
- Global SYSTEM_INSTRUCTION updated. - Class-weighted CLASS_MORPHOLOGY_CONTEXT added.
- A dynamic build_agent_prompt system has been implemented to reduce repetitive and template-like reports.
API Call Ordering
- Call order has been reorganized:
- GitHub (GPT-4o)
- Gemini fallback in case of failure
- The backend returns the provider used for each response and passes it to the frontend.
Frontend Updates
- The model badge on
index.htmlhas been made dynamic. - The model information shown to the user is now updated according to the actual provider from the backend.
Documentation Updates
- README.md and README.tr.md have been updated. - Required environment variables (GITHUB_TOKEN, GEMINI_API_KEY) and model ordering have been documented.
Modified Files
app.pyindex.htmlREADME.mdREADME.tr.md
Test / Validation Status
- Static and syntax checks were performed; no errors were found. - End-to-end validation with actual API keys is pending in the user environment.
This post is licensed under CC BY 4.0 by the author.
