Machine learning in cancer research

Machine learning has gained popularity because data availability is growing quickly. Many of the Cancer Registry's projects use large datasets and our use of machine learning for research is increasing

Last updated: 3/28/2023

Artificial intelligence and machine learning have made great strides in recent years. Big data, increasingly powerful computers and improved algorithms have contributed to these developments.

Machine learning is a type of artificial intelligence where computers learn on their own. This can be either supervised learning, where computers are provided information about the “answers”, or unsupervised learning, where computers search for patterns themselves. These methods have great potential in cancer research, and the use of machine learning in epidemiology and screening is a key focus area in the Cancer Registry's strategy for 2020-2024.

Classification of cancer biomarkers

In "supervised learning", algorithms are used on datasets that consist of both questions and answers, where the model learns to predict the correct answer, often based on large amounts of data. One form of supervisor learning is classification, which is the ability to categorize an outcome, such as classifying different clinical groups like ‘healthy people’, ‘precursors’, or ‘cancer’. For example, in one study the classification is based on transcription patterns from small RNA assays, and in another, the classification is based on bacterial profiles from bowel screening participants. The patterns that best distinguish the clinical groups are potential biomarkers for early cancer detection.

It is important to distinguish real biomarkers from random patterns, therefore the data sets are divided into random training and test sets repeatedly. By repeating the learning, and then testing the result on the remaining part of the dataset, we get robust results that can be tested further in other study populations.

We have also used so-called unsupervised learning (where the machines have to look for patterns in the data without any exact and correct answer), to identify small RNA transcription patterns in serum from lung, breast, and colon cancer patients decades before diagnosis. This is most evident in lung cancer and confirms previous studies that show that such patterns in lung cancer are dynamic in the decade before diagnosis.

Breast cancer screening

The potential for machine learning is also great in breast cancer screening. Section for breast cancer screening at the Cancer Registry of Norway is involved in several projects to gain knowledge about advantages and disadvantages of using machine learning systems in BreastScreen Norway - knowlegde that can inform future decisions about implementing these methods.

Machine learning may help us in various areas of the screening process. For example, we know that some signs or patterns on the screening images are evaluated as normal by the radiologists, but appear later as breast cancer. We want to find out whether machine learning can help us become even better at detecting these breast cancers. And, if we find more breast cancers, we also need to know more about the tumour characteristics of these.

Machine learning can also assist the radiographers, for example in their assessment of image quality. We also need to consider ethical and legal aspects related to a possible future implementation in BreastScreen Norway.

BreastScreen Norway has a large and unique database, with images from over four million screening examinations, and additional information about the screening and breast cancer diagnosis. We are thus in a very good position to test both already developed machine learning systems, and to develop our own machine learning system that is adjusted to Norwegian women.

Cervical screening

To investigate whether one can improve cervical cancer screening, and make it more personalized, we are using machine learning to analyse large amounts of data from a variety of data sources. The Cancer Registry retains data on screening history, such as answers from tests and follow-up examinations, treatments, HPV vaccination status, and cancer diagnosis. From surveys, we collect information about smoking, alcohol, reproductive history, and sexual health. In addition, clinical information is collected on the types of HPV that infect the cervix.

Several different machine learning methods are used to find out which model best predicts individual risk of cervical cancer. Based on this model, it is desired to determine more individualized time intervals for cervical screening. The goal of this research is to go from a "one size fits all" standardized cancer screening algorithm, to a more personalized screening programme which accounts for individual risk level. This research takes place through close collaboration between the Research Department, the Department for Register Informatics, and CervicalScreen Norway.

Ongoing projects 

JanusRNA - identification of early cancer biomarkers

The study uses machine learning as a tool for classifying small-non-coding RNA as potential early biomarkers for a number of cancer forms (lung, prostate, breast, ovary, colon, rectum, testis, gallbladder, and uterine). The study is based on sequencing data from prediagnostic samples from Janus Serumbank.

Microbiota and lifestyle in colorectal cancer screening (CRCbiome)  

The study examines whether there is an interaction between bowel bacteria, lifestyle, and colorectal cancer. Machine learning will be used, among other things, to classify samples from healthy people, pre-stages, or cancer.

Personalised screening for cervical cancer

The goal for this project is to create more flexible cancer prevention by moving from standardised recommendations, to recommendations based on a personal risk assessment. By combining knowledge from the medical and computer technology worlds, we develop an algorithm that, using health data, tailors recommendations for cervical cancer screening, based on the individual's risk profile.

Development of AI algorithms in BreastScreen Norway

A research project in collaboration with the Norwegian Computing Center, which uses data from BreastScreen Norway to develop a machine learning system.

Advantages and disadvantages of artificial intelligence in BreastScreen Norway

Through retrospective studies we will investigate advantages and disadvantages of using artificial intelligence in the assessment of screening mammograms in BreastScreen Norway.

BADDI: Artificial intelligence in screening with standard mammography and tomosynthesis

This project will add knowledge about the use of artificial intelligence to detect breast cancer in screening with standard mammography and with tomosynthesis.

AIMS Norway: A randomized controlled trial

In this study, we will investigate whether artificial intelligence in combination with radiologists is as good as or better at detecting breast cancer than the current standard procedure in BreastScreen Norway, where two radiologists evaluate the images.