— Patient-level outlier detection using knowledge graphs

Join us in our exploratory journey to leverage healthcare data with knowledge graphs to detect outliers, improve data quality and even generate insight.

Apply until 16 April 2023 / Xplorers Camp on 08 May 2023 

Question to be solved

  • How can we improve the mapping of semantic concepts (e.g. patient – drug relationship) with Knowledge Graphs (KG)?
  • How do we characterize outliers within patient-level data using knowledge graphs?
    • Identify potential data quality issues
    • Generate hypotheses about patient subpopulations

General background

Medical informatics provides the contextual bridge for healthcare data from source to insight, through a combination of people, process, and technology. With this project you will join a diverse group of clinical and technical experts who scale Roche’s ability to leverage healthcare data across the healthcare value chain.


In healthcare, the data is generally stored in relational databases; however, taking a KG approach can leverage before unforeseen relationships, detect outliers and even improve the mapping of semantic concepts. However, one of the first steps is constructing a harmonized KG from disparate data sources and being able to validate the quality of the graph.


This exploratory project will dive deeper into knowledge graphs of patient-level healthcare data with a focus on state of the art outlier detection. Outliers may reflect (but not limited to) patients of interest, novel disease groups, disease subtypes, or data quality issues.

Data types & technologies

  • Data standards
    • Common data models (e.g., FHIR, OMOP)
    • Standardized vocabularies/terminologies (e.g., ICD-10, LOINC, RxNorm, SNOMED, UMLS)
  • Graphs
    • Semantic web technologies (e.g., SPARQL, RDF)
    • Labeled property graphs
    • Graph algorithms
  •  Programming
    • Python and associated ML/DL libraries (e.g., pandas, PyTorch, etc.)
  • Data Eng
    • ETL/ELT pipelines using Airflow or similar
    • CI/CD
    • Version control
  • Data science / analytics
    • Graph neural networks / graph embeddings

Needed skills

  • Interested / experienced in Medical Informatics
  • Familiar with some of the above-mentioned data types & technologies
  • Fast-learner, passionate about continually adapting your skills and knowledge, agile, curious and IT savvy
  • Strong interpersonal, analytical and intercultural awareness


Fabio Eglin
RWD Analytics Engineer


Andrew Nguyen
Section Lead, Medical Informatics


Form of cooperation

Preferred scale: 6-12 months full-time (flexible models are also possible)
Possible format: Working student, internship or master thesis

How to present your idea

Please demonstrate your approach to the problem using 3 to 5 slides. We do not expect a bullet-proof solution, we are rather interested in the way you would tackle the given challenge.

By sending this to us via the submit button you agree to the following:


  • you confirm that you are the author of the submission and entitled to dispose of rights of use and exploitation of the contents of your submission, and that you have not yet granted any rights of use and exploitation to third parties that would be infringed by your submission.
  • you grant to Roche Diagnostics GmbH the unrestricted, sublicensable and exclusive right  to use and exploit your submission by all means known today or in the future. This includes without limitation the rights to reproduce, distribute, and exhibit your submission, as well as the right to communicate your submission to the public. You also grant to Roche Diagnostics GmbH the right to edit the submission, to translate it, and to create abbreviations and summaries (abstracts); the aforesaid rights to use and exploit also apply to such edited versions, translations, abbreviations and summaries. 
  • Roche Diagnostics GmbH will designate you as the author of the submission, and will recognize and respect your moral rights in the submission.
  • the relationship you enter into by sending this via the submit button is governed by the laws of the Federal Republic of Germany, and the courts of Germany have international jurisdiction for any disputes arising under or in connection with this relationship.


Any problems or questions? Please contact us: healthcare.xplorers@roche.com


Further information on our privacy policy can be found here.


Any problems or questions? Please contact us: healthcare.xplorers@roche.com


Further information on our privacy policy can be found here.