0824 4256456   |   91-7892581597   |   project4uindia@gmail.com
Chat on WhatsApp Call Us Email Us

BERT-Based Classification of Papillary Thyroid Carcinoma Recurrence from Clinical Notes

Project Code: 25P4U15

Abstract

This research investigates the application of a BERT-based natural language processing (NLP) model for binary classification of papillary thyroid carcinoma (PTC) recurrence risk using free-text clinical notes. The objective is to develop a robust and accurate AI system to assist clinicians in predicting PTC recurrence. The study utilizes a large corpus of de-identified clinical notes to train and evaluate the BERT model. Results demonstrate improved accuracy and efficiency compared to existing methods, highlighting the potential of AI-driven tools in personalized PTC management.

Introduction

Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer. Recurrence after initial treatment poses a significant clinical challenge, requiring close monitoring and potentially further interventions. Accurate prediction of recurrence risk is crucial for personalized treatment strategies. Current methods rely heavily on clinician experience and interpretation of diverse clinical data, including free-text clinical notes, which are often unstructured and time-consuming to analyze manually. This creates a need for automated systems that can efficiently extract relevant information and improve prediction accuracy. The lack of standardized, readily available datasets specifically for PTC recurrence prediction presents a major hurdle.

Project Demo

Technical Details

  • Utilizes pre-trained BERT (Bidirectional Encoder Representations from Transformers)
  • Processes free-text clinical notes for binary classification (recurrence vs. non-recurrence)
  • Fine-tuned on de-identified patient records related to PTC treatment
  • Performance evaluated using accuracy, precision, recall, F1-score
  • Demonstrates significant improvement over traditional rule-based and keyword methods
Project Information

Domain: Natural Language Processing, Medical AI

Year: 2025

Technology: Python, PyTorch, Transformers (Hugging Face), BERT

Dataset: De-identified clinical notes from EHR systems