Research / Publications
Scientific work across
the precision pipeline.
Browse papers spanning screening and diagnostics, risk and therapeutic modeling, and clinical translation.
Research / Publications
Browse papers spanning screening and diagnostics, risk and therapeutic modeling, and clinical translation.
Gallifant, J., Afshar, M., Ameen, S. et al.
Large language models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present transparent reporting of a multivariable model for individual prognosis or diagnosis (TRIPOD)-LLM, an extension of the TRIPOD + artificial intelligence statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight and task-specific performance reporting. We also introduce an interactive website (https://tripod-llm.vercel.app/) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility and clinical applicability of LLM research in healthcare through comprehensive reporting.
Modise, LM., Alborzi Avanaki, M., Ameen, S., Celi, LA., Chen, VXY., et al.
This paper introduces the Team Card (TC) as a protocol to address harmful biases in the development of clinical artificial intelligence (AI) systems by emphasizing the often-overlooked role of researchers' positionality. While harmful bias in medical AI, particularly in Clinical Decision Support (CDS) tools, is frequently attributed to issues of data quality, this limited framing neglects how researchers' worldviews—shaped by their training, backgrounds, and experiences—can influence AI design and deployment. These unexamined subjectivities can create epistemic limitations, amplifying biases and increasing the risk of inequitable applications in clinical settings. The TC emphasizes reflexivity—critical self-reflection—as an ethical strategy to identify and address biases stemming from the subjectivity of research teams. By systematically documenting team composition, positionality, and the steps taken to monitor and address unconscious bias, TCs establish a framework for assessing how diversity within teams impacts AI development. Studies across business, science, and organizational contexts demonstrate that diversity improves outcomes, including innovation, decision-making quality, and overall performance. However, epistemic diversity—diverse ways of thinking and problem-solving—must be actively cultivated through intentional, collaborative processes to mitigate bias effectively. By embedding epistemic diversity into research practices, TCs may enhance model performance, improve fairness and offer an empirical basis for evaluating how diversity influences bias mitigation efforts over time. This represents a critical step toward developing inclusive, ethical, and effective AI systems in clinical care. A publicly available prototype presenting our TC is accessible at https://www.teamcard.io/team/demo.
Yee, KC.; Wong, MC.; Ameen, S.; Wylie, S.
Generative artificial intelligence (AI) such as ChatGPT, using the large language model, is a disruptive technology that is affecting healthcare, especially medical education dramatically. Traditional medical school assessments such as essays and open book tests may no longer be useful, particularly given the success ChatGPT has demonstrated in answering medical examination questions, most recently achieving a high pass rate on the American Surgical Board examination. Furthermore, reflective essays can be generated by ChatGPT in full or in part, which often escape detection by anti-plagiarism software. A simplistic view would be to ban the use of AI completely in medical education assessment and attempt to use newer AI detection technologies. This will not, however, equip our students with the skills and knowledge required by the workplace, as AI is being integrated into clinical practice. While workplace assessments can reduce the impact of AI, they are expensive, time- consuming, and are introduced too late in the education process. We tested the performance of ChatGPT and have developed potential models that can be used in assessment, which incorporates the use of generative AI, while assessing the performance and competency of students. In this presentation, we will showcase some of our ChatGPT testings, and present our models for assessment. We propose that models for assessment need to take into consideration the process of delivering the output, rather than assessing the output exclusively itself. Given the potential pitfalls to using generative AI tools such as ChatGPT in healthcare, such as its capacity to generate erroneous text, we strongly believe that healthcare educators and students need to acquire knowledge about the basic science behind AI in order to appropriately utilise its potential. We have developed a preliminary framework for designing assessment in medical education for the AI era which will be presented for discussion.
Ameen, S., Wong, MC., Turner, P., Yee, KC.
The current "Gold Standard" colorectal cancer (CRC) screening approach of faecal occult blood test (FOBT) with follow-up colonoscopy has been shown to significantly improve morbidity and mortality, by enabling the early detection of disease. However, its efficacy is predicated on high levels of population participation in screening. Several international studies have shown continued low rates of screening participation, especially amongst highly vulnerable lower socio-economic cohorts, with minimal improvement using current recruitment strategies. Research suggests that a complex of dynamic factors (patient, clinician, and the broader health system) contribute to low citizen engagement. This paper argues that the challenges of screening participation can be better addressed by (1) developing dynamic multifaceted technological interventions collaboratively across stakeholders using human-centered design; (2) integrating consumer-centred artificial intelligence (AI) technologies to maximise ease of use for CRC screening; and (3) tailored strategies that maximise population screening engagement, especially amongst the most vulnerable.
Ameen, S.; Wong, MC.; Yee, KC.; Turner, P.
Advances in artificial intelligence in healthcare are frequently promoted as ‘solutions’ to improve the accuracy, safety, and quality of clinical decisions, treatments, and care. Despite some diagnostic success, however, AI systems rely on forms of reductive reasoning and computational determinism that embed problematic assumptions about clinical decision-making and clinical practice. Clinician autonomy, experience, and judgement are reduced to inputs and outputs framed as binary or multi-class classification problems benchmarked against a clinician’s capacity to identify or predict disease states. This paper examines this reductive reasoning in AI systems for colorectal cancer (CRC) to highlight their limitations and risks: (1) in AI systems themselves due to inherent biases in (a) retrospective training datasets and (b) embedded assumptions in underlying AI architectures and algorithms; (2) in the problematic and limited evaluations being conducted on AI systems prior to system integration in clinical practice; and (3) in marginalising socio-technical factors in the context-dependent interactions between clinicians, their patients, and the broader health system. The paper argues that to optimise benefits from AI systems and to avoid negative unintended consequences for clinical decision-making and patient care, there is a need for more nuanced and balanced approaches to AI system deployment and evaluation in CRC.
Ameen, S.; Wong, MC.; Yee, KC.; Nøhr, C.; Turner, P.
AI augmented clinical diagnostic tools are the latest research focus in colorectal cancer (CRC) detection. While the opportunity presented by AI-enhanced CRC diagnosis is sound, this paper highlights how its effectiveness with respect to reducing CRC-related mortality and enhancing patient outcomes may be limited by the fact that patient participation remains extremely low globally. This paper builds a foundation to consider how human factors tend to contribute to low participation rates and suggests that a more nuanced socio-technical approach to the development, implementation and evaluation of AI systems that is sensitive to the psycho-social and cultural dimension of CRC may lead to tools that increase screening uptake.
Sharp, A.; Ameen, S.; Walker, J.
A proposal for a million man Mars colony was presented as part of a global competition by the Mars Society and presented at the 23rd Annual International Mars Society Convention. This proposal was awarded 5th place globally.
Ameen, S.; Han, S.; Lin, Y.; Lah, M; Kang, B.
Educational theory has purported the notion that student-centric modes of learning are more effective in enhancing student engagement and by extension, learning outcomes. However, the translation of this theoretical pedagogy of learning into an applied model for medical training has been fraught with difficulty due to the structural complexity of creating a classroom environment that enables students to exercise full autonomy. In this paper, we propose an intelligent computational e-learning platform for case-based learning (CBL) in Medicine that enriches and enhances the learning experiences of medical students by exposing them to simulated real-world clinical contexts. We argue that computational systems in Medicine should not merely provide a passive outlay of information, but instead promote active engagement through an immersive learning experience. This is achieved through a digital platform that renders a virtual patient simulation, which allows students to assess, diagnose, treat and test patients as they would in the real-world.
Ameen, S.; Chung, H.; Han, S.; Kang, B.
This paper explores the feasibility of implementing a model for an open domain, automated question and answering framework that leverages Wikipedia’s knowledgebase. While Wikipedia implicitly comprises answers to common questions, the disambiguation of natural language and the difficulty of developing an information retrieval process that produces answers with specificity present pertinent challenges. However, observational analysis suggests that it is possible to discount the syntactical and lexical structure of a sentence in contexts where questions contain a specific target entity (words that identify a person, location or organisation) and that correspondingly query a property related to it. To investigate this, we implemented an algorithmic process that extracted the target entity from the question using CRF based named entity recognition (NER) and utilised all remaining words as potential properties. Using DBPedia, an ontological database of Wikipedia’s knowledge, we searched for the closest matching property that would produce an answer by applying standardised string matching algorithms including the Levenshtein distance, similar text and Dice’s coefficient. Our experimental results illustrate that using Wikipedia as a knowledgebase produces high precision for questions that contain a singular unambiguous entity as the subject, but lowered accuracy for questions where the entity exists as part of the object.