Machine Learning in BreastScreen Norway

- A research project aimed at improving the efficiency and quality of the national screening program – BreastScreen Norway - by combining automatic image analysis and radiological expertise.
Last updated: 12/30/2021

Background

Breast cancer is the most common cancer among women worldwide. Preventing breast cancer is difficult on an individual level, but early detection through screening with mammography is an effective way to reduce breast cancer related deaths.

The standard screening procedure in BreastScreen Norway takes x-ray pictures of each breast (mammograms) from two angles. Two radiologists independently review all mammograms. If either of the radiologists determines there is a slight chance that a woman has breast cancer, a consensus meeting is held to decide whether the woman needs to be recalled for further assessment.

Most women attending screening do not have any signs of breast cancer – 93% of screening mammograms show no signs of breast cancer. These cases are not selected for consensus meetings or recalls. Still, reviewing the mammograms takes time. As a result, today's radiologists spend a substantial amount of their clinical time reading normal mammograms with no signs of breast cancer.

With recent advancements in machine learning, there is a potential to improve the screening program by allowing radiologists to focus on women who are recalled for further assessment, and women with clinical symptoms of breast cancer, such as lump or retraction.

Objective

The aim of this project is to develop an automated method to review mammograms by combining machine-based image analysis with radiological knowledge and expertise.

The study team will develop an algorithm that uses artificial intelligence (deep learning) to learn how to recognize patterns and make independent decisions. Specifically, by “studying” mammograms and related information about screening history and previous breast cancer diagnoses, this algorithm will learn how to recognize patterns in mammograms that may indicate breast cancer. In this way, the algorithm can be used to develop an automatic system to identify cases that clearly do not show signs of breast cancer – referred to as negative screening examinations. 

The goal is for the final algorithm to be able to detect 70% of all negative screening examinations. An algorithm with this ability has the potential to substantially reduce radiologists’ screening workload so that they can focus on the remaining 30% of cases that may show signs of breast cancer. These are more challenging cases to interpret and will be read in the same way as is done today: manually by two independent radiologists.

This project also offers the opportunity to develop an automated image analysis algorithm to help health care personnel who take the mammograms (radiographers) assess image quality, perform technical quality control, and perform systematic analyses to identify changes in the breast over time. Such an algorithm could increase the quality of screening services offered by BreastScreen Norway.

Data

To develop the algorithm, large amounts of digital image data from screening examinations are required along with information on radiological assessments, as well as any positive or negative findings in and outside the screening program (screening information).

The breast centres involved in the project have performed over 650,000 digital screening examinations, corresponding to more than 2.5 million digital mammograms. The image data is stored locally at breast centers around the country, while the screening information is stored at the Cancer Registry. These data will be merged to create a unique collection of data, which will be used to teach the algorithm how to identify negative mammograms.

This project use data from women who have allowed that their personal data related to negative screening results be permanently stored at the Cancer Registry, in accordance with the Cancer Registry Regulations (Kreftregisterforskriften). The project team will not contact women whose data is used in the project. It will not be possible to identify individuals from any published study results.

Organisation

The Cancer Registry is the project leader and responsible for obtaining ethical approval(s) for the project, data collection and delivery to the algorithm developers, clinical testing, and drafting plans to potentially implement the algorithm in the screening program.

The Norwegian Computing Center (Norsk Regnesentral) is responsible for developing the algorithm that will analyze the mammograms. The Center has professional knowledge of image analysis and machine learning.

The breast centers will contribute radiological expertise and practical knowledge about screening. These centres are the regional specialists on breast cancer screening and diagnosis.

The University of Tromsø will function as an important advisor on IT-systems for biological and medical applications and will also supervise master students on related topics. 

Status 

The project received a "pilot dataset" from the University Hospital of Northern Norway in the autumn of 2018. In 2020 we received mammograms from the breast centers at St. Olavs Hospital and Helse Møre og Romsdal, and in February 2021 from the breast center at the University Hospital of Northern Norway. The Cancer Registry has recently also received data from four health trusts in Helse Sør-Øst. We are currently processing these images, assessing data quality and removing personally identifiable information. The next step will be transferring these data to the Norwegian Computing Center, which is data processor in the project.

In early phases of the project, the data set was too small to develop and train our own models, and we therefore used a pre-trained model for testing and development of the algorithm. With the data received in 2020 and 2021, we have had the opportunity to develop and train a new model based on Norwegian data. This model, trained from scratch by the Norwegian Computing Center, will be further developed as soon as they receive larger sets of data.