Project Proposals

The following is a list of projects available for MLBD MRes students in the academic year - 2024/25

Algorithmic optimization of deformable mirrors for high-power laser experiments.

Project code: MLBD_2024_1 Supervisors(s): Roland Smith (Physics/Light)

Note

This project can host up to 1 pair(s) of students.

Deformable mirrors use multiple computer controlled actuators to bend an optical surface, sometimes with an accuracy of just a few nanometer, and are used in instruments such as the James Webb telescope to radically improve the performance of an imaging system. In a high power laser system these devices can be used to correct subtle spatial aberrations or temporal phase in a multi-terawatt, short-pulse laser beam and significantly improve its performance, or create new tools that allow for shaping of an optical pulse on few femtoscond timescales and "search" for new physics.

An “obvious” approach is to us a deformable mirror to simply optimize a laser focal spot in space, or pulse shape in time. Rather counter intuitively, many interesting and inherently non-linear processes driven by a laser (e.g. MeV particle acceleration or filamentation) can also benefit from a non-ideal spot or pulse shape. A major challenge in using this technique is that the mapping of individual “control values” from a computer to a real world mirror surfaces – and then to a physical process is inherently non-linear and in some cases not well understood. The “search space” is also very large, a 9 actuator mirror with a 12-Bit control system has ~3x10E32 different configurations, a 15 actuator system expands this to ~10E54 ! Finally, we also need to teach a computer to recognize “good” and “bad” results as it learns about the system, and avoid getting stuck in a local minimum. That could use a simple algorithm to give a single quality metric of a focal spot image, but for complex images with lots of fine-structure or an entire experiment with many additional control parameters, a neural net or other machine learning techniques such as Bayesian optimisation might have significant speed and “robustness” advantages. We might also need to use different ways to describe the problem, e.g. using a sum of Zernike polynomials to represent a complicated spatial phase profile in a “real” laser beam rather than just tweak a collection of mirror control values.

This project will use genetic (GA), Bayesian and other algorithmic approaches to optimize the “shape” in space and / or time of single and multiple real-world, lab based mirror systems and use this to control and if time allows, extend this to optimize high-power laser experiments. We will also investigate different image recognition techniques (assorted “direct” algorithms versus neural nets) to identify “good” and “bad” laser beams. We may also use machine optimization of finite element structural models of “new” mirror systems that can potentially be built and tested during the project, e.g. to exploit new actuator configurations.

Note

Any useful pre-requisite modules/knowledge? : N/A

Using machine learning to constrain the impact of clouds on climate change

Project code: MLBD_2024_2 Supervisors(s): Paulo Ceppi (Physics/Space, Plasma and Climate)

Note

This project can host up to 2 pair(s) of students.

Clouds are one of the main uncertainties for future climate change. With global warming, clouds change and so does their impact of the radiation budget (the difference between absorbed sunlight and emitted infrared), which has a knock-on effect on climate change known as cloud feedback. This feedback is poorly understood as climate models cannot simulate clouds reliably. The project will involve applying statistical learning techniques (e.g. ridge regression) to cloud-radiative data from satellite observations and climate model simulations. The aim will be to better quantify how clouds respond to environmental changes from present- day data, so as to more accurately predict how clouds will change with global warming. The project will build on a successful initial study by Ceppi and Nowack 2021, PNAS link

Note

Any useful pre-requisite modules/knowledge? : Atmospheric Physics

Exoplanet demographics

Project code: MLBD_2024_3 Supervisors(s): James Owen (Physics/Physics of the Universe)

Note

This project can host up to 1 pair(s) of students.

We have discovered thousands of planets around other stars. However, these methods are highly biased. Thus we need to use advanced Bayesian statistical methods to account for these biases and understand what the true planet population is. There are numerous open problems in exoplanet demographics, many related to analysing planet detections from NASA'S Kepler mission.

Note

Any useful pre-requisite modules/knowledge? : Astrophysics

Using Atmospheric Muons for Security and Customs

Project code: MLBD_2024_4 Supervisors(s): Nick Wardle (Physics/Physics of Particles)

Note

This project can host up to 1 pair(s) of students.

Traditional security checks at airports typically require manually investigating cargo or using X-ray technology to scan the contents of items coming through borders. While X-rays are non-intrusive, they require a source and regular maintenance which can be costly.

Muon tomography is an alternative that makes use of a completely free and natural source - cosmic ray muons, that when coupled with particle detection technology (Hodoscopes instrumented with photo-multiplier tubes) and machine learning, can reconstruct 3D images of materials inside the cargo. Unlike X-ray technology, information from the muons can also be used to identify certain materials, which allows customs agents to detect contraband substances. The technology can be mounted within portable containers making them easy to use.

This project, in partnership with the Horizon funded CosmoPort consortium [1], will involve developing machine learning algorithms to better reconstruct the muon tracks, the 3D images, and identify materials from the data. We investigate use a range of techniques from simple classifiers, to convolutional/graph neural-networks and make use of both real and simulated datasets to train models and evaluate their performance.

[1] link

Note

Any useful pre-requisite modules/knowledge? : Nuclear and Particle Physics would be beneficial.

No-fit new physics search

Project code: MLBD_2024_5 Supervisors(s): Mark Smith (Physics/Physics of Particles)

Note

This project can host up to 1 pair(s) of students.

Particle physics analyses often require complex multi-dimensional fits to measure observables which must then be interpreted by theorists to assess the underlying physics parameters. In this project we will attempt to use machine learning to directly infer the physics parameters of a data set directly, without having to perform any fits.

Note

Any useful pre-requisite modules/knowledge? : Particle physics

Multi-year Tropical Cyclone Prediction

Project code: MLBD_2024_6 Supervisors(s): Ralf Toumi (Physics/Space, Plasma and Climate)

Note

This project can host up to 1 pair(s) of students.

Tropical cyclones are one of the most dangerous natural hazards now and more so in the future. Much about their fascinating genesis and evolution remains insufficiently understood (1). It is proving challenging to model this phenomenon because of the wide range of scale in time and space as well as the vast range of physical processes involved. Everything we know about atmospheric physics affects tropical cyclones. Synthetic or stochastic models are extremely powerful tools for risk assessment (2). In this project you will join the tropical cyclone research group to build a new global stochastic model of tropical cyclones called IRIS (Imperial College Storm Model). We want this model to be Physics informed, not just statistical. The aim is for IRIS to make seasonal and multi-year predictions and simulate the impact of climate change. A version of IRIS is now also run on smartphones by the general public to create the largest open and free database of global tropical cyclone risk(3). You will join the largest research group in Europe working on tropical cyclones.

For this project we want to apply machine learning to predict the key environmental conditions that drive tropical cyclones such as the sea surface temperature. The aim is to make skilful predictions of these fields on seasonal to multi-year time scales.

link link link

Note

Any useful pre-requisite modules/knowledge? : No prerequisite

Machine Learning for Mass Spectrometry-Based Metabolomics

Project code: MLBD_2024_7 Supervisors(s): Robbie Murray, Yuchen Xiang (Physics/Light)

Note

This project can host up to 1 pair(s) of students.

The metabolome of an organism is influenced by environmental factors and intracellular regulation, reflecting its physiological state. Metabolomics, particularly through the application of mass spectrometry (MS), plays a crucial role in understanding disease progression in clinical settings and in estimating metabolite overproduction for metabolic engineering. Among the various MS technologies, our lab specifically employs laser-desorption rapid evaporative ionisation mass spectrometry (LD-REIMS) to analyze extensive biological sample sets, such as cancer biopsies. This technique allows us to gather vast datasets essential for evaluating and classifying the samples into categories like healthy or diseased.

Mass spectrometry-based metabolomics, however, presents significant analytical challenges. The data structures are inherently complex, and the interactions among metabolites are nonlinear, complicating the extraction of meaningful insights. To address these challenges, machine learning methods have become increasingly popular. These methods are well-suited for MS data analysis due to their capability to represent nonlinear relationships and process large, heterogeneous data sets swiftly and efficiently.

This project will explore different machine learning based techniques that can be used on the complex datasets we gather in our LD-REIMS platform in the Physics Department

Note

Any useful pre-requisite modules/knowledge? : N/A

Identifying Low-mass Dark Matter Events with Machine Learning

Project code: MLBD_2024_8 Supervisors(s): Kelsey Oliver-Mallory (Physics/Physics of Particles)

Note

This project can host up to 1 pair(s) of students.

Dark matter is one of the most critical topics in modern physics, and the dual-phase xenon time projection chamber (TPC) is the foremost technology probing the identity of this substance. Xenon TPCs were originally selected for their strong response to heavy particles like the WIMP, a dark matter candidate that has long been favored by the physics community because it arises naturally in theories of the early universe. However, successive experiments have yet to detect such a particle and, as a consequence, the community has turned its eye toward a variety of well-motivated lower-mass alternatives.

In this energy range, xenon TPCs observe significant backgrounds that are difficult to model and can obscure dark matter signals. However, machine learning techniques have shown promise in recognizing minor features of the background events that differentiate them from those of dark matter. This project will involve testing and employing a variety of machine learning techniques – boosted decision trees, anomaly finding, convolutional neural networks, generative adversarial networks – to optimize xenon TPCs for low-mass dark matter searches. These will be used to mitigate backgrounds and generate realistic low-energy events.

Note

Any useful pre-requisite modules/knowledge? : N/A

Using Machine Learning Density Estimation to Measure the Properties of Fundamental Particles at the CMS Experiment

Project code: MLBD_2024_9 Supervisors(s): George Uttley (Physics/Physics of Particles)

Note

This project can host up to 1 pair(s) of students.

This project is to use simulation-based inference (in particular likelihood-free inference) techniques to analyse proton-proton collision data collected by the CMS experiment. CMS is one of two general-purpose detectors on the CERN LHC beamline. It is best known for the discovery of the Higgs boson particle (with the ATLAS experiment) in 2012. The machine learning methods that will be utilised are density estimators. One example of this is a normalising flow, which is a set of chained invertible neural networks. These are very powerful tools for learning the probability density of multidimensional datasets and are best known for their use in image generation. The learned density will then be used to build an extended unbinned likelihood function. Maximising this quantity will then give you an optimal measurement of a physical variable. This is a novel data analysis technique which has not been used before in a CMS analysis.

The physics scope suggested for this project is a measurement of the top quark's mass. The top quark's properties play a pivotal role in the Standard Model (SM) of particle physics. A precision measurement of the mass is essential for testing the SM's predictions, refining fundamental parameters and detecting potential signs of new physics beyond our current understanding. The additional data from the ongoing Run-3 of the LHC and the use of simulation-based inference could lead to a significantly improved measurement of this fundamental quantity in nature.

Note

Any useful pre-requisite modules/knowledge? : Courses about experimental particle physics would be beneficial.

Fitting DESI galaxy spectra with SPS models and simulation-based inference

Project code: MLBD_2024_10 Supervisors(s): Boris Leistedt (Physics/Physics of the Universe)

Note

This project can host up to 1 pair(s) of students.

The goal of this project is to fit DESI galaxy spectra with modern Stellar Population Synthesis models accelerated with machine learning emulators and recently-developed simulation-based techniques for galaxy population inference. This will entail 1) understanding the DESI data, in particular how to manipulate the measured spectra and noise model in a likelihood function link 2) calling Stellar Population Synthesis models, in particular via emulators link 3) running classic MCMC on a subset of spectra and interpreting the results, 4) exploring compression methods and simulation based inference, in particular from link and comparing with the MCMC) This is a novel approach to infer the properties of the galaxy population from large data sets without fitting individual objects (here spectra).

Note

Any useful pre-requisite modules/knowledge? : N/A

Enhancing Neutrino-Nucleus Interaction Modelling Through Machine Learning

Project code: MLBD_2024_11 Supervisors(s): Monireh Kabirnezhad (Physics/Physics of Particles)

Note

This project can host up to 1 pair(s) of students.

The precise measurement of neutrino properties is among the highest priorities in fundamental particle physics, involving many experimental and theoretical efforts worldwide. As experiments heavily rely on neutrino interactions with nucleons within the nuclear environment, upcoming advancements in experiments like DUNE and Hyper Kamiokande necessitate a deeper understanding and more precise modelling of hadronic and nuclear physics underlying these interactions. These are typically represented as interaction "cross-section" models in neutrino event generators, crucial for all phases of experimental analyses. Theoretical uncertainties in these models are pivotal in interpreting results accurately.

The most challenging aspect of cross-section calculation lies in the hadron tensor, which is composed of the hadron current in neutrino-nucleus interactions. This tensor contains vital nuclear information essential for cross-section analysis and neutrino oscillation measurements. However, its computation is often prohibitively expensive. Currently, event generators utilise pre-calculated tables for different kinematics, sacrificing our ability to estimate systematic uncertainties from theoretical models.

This project aims to explore methods to retain systematic information within a fast simulation, ensuring a more comprehensive understanding of uncertainties. Additionally, the project will investigate how modern computing methods can enhance the accuracy and breadth of neutrino-nucleus interaction modelling.

Note

Any useful pre-requisite modules/knowledge? : Particle physics and nuclear physics or something equivalent would be beneficial.

Measuring the energy spectrum of intense, high energy gamma rays

Project code: MLBD_2024_12 Supervisors(s): Stuart Mangles (Physics/Space, Plasma and Climate)

Note

This project can host up to 1 pair(s) of students.

In our experiments we use Laser Wakefield accelerated electron beams to produce high energy (MeV- GeV) gamma rays. The flux of these is very high so many standard methods for gamma spectroscopy do not work.

Our current detector is based on a stack of scintillating crystals. As the gamma rays penetrate the stack they deposit energy in the crystals which makes them produce light. By measuring the profile of light emitted by these we get information about the energy of the incident gamma rays - essentially the deeper the gamma rays penetrate the stack, the higher the gamma ray energy (K Behm, Review of Scientific Instruments 2018).

However its not that simple, as high energy gamma rays interact with matter in a number of ways (in particular they produce showers of photons and charged particles through the processes of Compton scattering and pair-production). Using codes like Geant4 or G4Beamlines, we can model the response of our detector to a set of mono-energetic gamma rays, and so can build a forward model of the response to an arbitrary input spectrum.

Inverting this process, i.e. working out the spectrum from the measured signal on the detector, is not straight forward as the response to a few high energy gammas is similar to lots of lower energy ones - the problem is “ill posed”.

Our current method for measuring the spectrum uses a physics based parameterised model for the gamma ray spectrum. It is then straightforward to use the forward model to find the parameter(s) that best fit the measured data. However, this relies on the assumption that the physics model is a good description of the spectrum - and does not allow us to test different physics models for gamma ray production (such as in our work on radiation reaction).

In this project you will build a model of the detector in G4Beamlines (of Geant4 if you prefer) and use it to generate your own forward model of our existing detector. You will then investigate other methods, such as machine-learning and Bayesian inference, to infer the input spectral shape. Ideally this will allow us to remove the assumption that we know the shape of the spectrum a priori.

The project will introduce you to the physics and simulation of high energy particle passage through matter, using standard codes used by HEP groups around the world and modern data analysis methods including machine learning.

Note

Any useful pre-requisite modules/knowledge? : No

Characterising satellite shapes using ground-based observations

Project code: MLBD_2024_14 Supervisors(s): Michael Peel (Physics/Physics of the Universe)

Note

This project can host up to 1 pair(s) of students.

Satellites are now being frequently launched into low earth orbit as part of satellite constellations such as Starlink, OneWeb, etc., with over 5,000 launched in the last few years, and over an order of magnitude more expected over the next few years. Their design and apparent size are often changing due to their rapid development and deployment cycles. They can be very bright optically, and vary over time. This information can be used to reverse-engineer the approximate shape of the satellite, by comparing them with satellites of known shapes.

Using observational data from instruments like MMT-9 link and others, you will use machine learning models trained on known satellite designs, and any additional information such as Bidirectional Reflectance Distribution Function (BDRF) data, to identify the most likely layouts of new satellites.

This project will be carried out in coordination with the IAU CPS link in particular with Siegfried Eggl at the University of Illinois.

Note

Any useful pre-requisite modules/knowledge? : N/A

Machine-learning led design of nanomagnetic arrays for neuromorphic computing

Project code: MLBD_2024_15 Supervisors(s): Kilian Stenning (Physics/Matter)

Note

This project can host up to 1 pair(s) of students.

The field of neuromorphic computing aims to offload energy-intensive machine learning problems on to physical systems to reduce the carbon footprint of AI. We have recently discovered a method of using nanomagnetic arrays for neuromorphic computing. The system is able to accurately forecast chaotic time-series with strong potential for advanced time-series processing at low powers. Our experimental system is currently unoptimized and we have results showing that array design strongly influences performance. Due to the length of time taken to fabricate and measure samples, optimisation via experimental means is challenging.

We have developed an in-house dipolar simulation model (python) of these arrays to test computational performance. Using this model, we now have the freedom to design new types of arrays to explore what makes a good nanomagnetic network for neuromorphic computing. This project will involve designing new arrays to uncover why certain nanomagnetic physics enables computation. You will design arrays in a systematic manner and test their computational capability. The parameter space is large and therefore it may be more appropriate to use machine-learning techniques to explore this parameter space efficiently. For example, evolutionary algorithms can be used to design new types of arrays with new functionality.

Reasonable python skills are needed to perform the simulations. Knowledge of machine learning techniques is a bonus. Knowledge of C / C++ will also be advantageous.

Note

Any useful pre-requisite modules/knowledge? : N/A

Novel approaches to spacecraft instrument data cleaning

Project code: MLBD_2024_16 Supervisors(s): Tim Horbury (Physics/Space, Plasma and Climate)

Note

This project can host up to 1 pair(s) of students.

The Solar Orbiter spacecraft is exploring the inner solar system and carries two magnetic field sensors built here in the Physics Department. They measure the very small magnetic fields in space, which we use to understand the Sun's magnetic field, the solar wind and fundamental plasma processes such as turbulence and shocks. The spacecraft itself generates magnetic fields (for example, through currents in wires) and these contaminate the data: we have various approaches to doing this but they are not very sophisticated. In this project you will work directly with the engineers operating the magnetometer to develop new, machine leaning based approaches to identifying and removing spacecraft noise. If this goes well, I would expect your work to be incorporated into our routine data processing before we release data to the public.

Note

Any useful pre-requisite modules/knowledge? : Space physics would be helpful but is not essential.

Machine learning approaches to spacecraft calibration

Project code: MLBD_2024_17 Supervisors(s): Tim Horbury (Physics/Space, Plasma and Climate)

Note

This project can host up to 1 pair(s) of students.

In 2025, NASA's IMAP spacecraft will launch, carrying a magnetic field instrument built here in the Physics Department. The spacecraft spins and we have developed methods to calibrate the data in this case. One challenge is picking the best intervals of data to use. In this project, you will work with the engineering team and use a machine learning approach to improve the calibration algorithm so we can produce high quality scientific data. IMAP is scheduled to launch in April 2025: before then we will use test data from other missions, but after launch you will be one of the first people to use in-flight data from this mission.

Note

Any useful pre-requisite modules/knowledge? : Space physics would be helpful but is not essential.

Data science to understand the ocean's role in climate change

Project code: MLBD_2024_18 Supervisors(s): Heather Graven (Physics/Space, Plasma and Climate)

Note

This project can host up to 1 pair(s) of students.

In this project we will analyse ocean data that are being used to understand and predict the ocean's role in mitigating climate change by taking up heat and carbon. We will analyse various measurements made in the ocean interior and at the surface to better integrate these into data-based models. We will work with data providers and collaborators within a larger international project with collaborators in the US, Germany, Belgium and South Africa.

Note

Any useful pre-requisite modules/knowledge? : Beneficial but not required - Atmospheric Physics

Dynamical clustering of global oceanic observations

Project code: MLBD_2024_19 Supervisors(s): Arnaud Czaja (Physics/Space, Plasma and Climate)

Note

This project can host up to 2 pair(s) of students.

The oceanographic community has benefited greatly from the development of global observing systems either measuring sea level through satellite altimetry from the 1990s and surface to 2000m depth hydrography through the ARGO float program since 2005. These datasets provide high quality physical variables which capture a very rich range of dynamical behaviour, from the “mesoscale” with lengthscale of a few 100kms to the “basin scale” (several 1,000 of kms).

To understand how the global ocean is changing and responding to anthropogenic greenhouse gases emission, we need to be able to identify the different dynamical regimes shaping this response. In this project we wish to do so by applying unsupervised ML k-means algorithm and information criteria model selection (Sonnewald and Lguensat, 2021) to the so-called “thermocline equation”.

The project does not require prior knowledge of physical oceanography or fluid dynamics but a strong interest in diving into a wealth of oceanographic data and in making sense of them using a combination of physics and machine learning techniques.

References:

Sonnewald, Maike, and Redouane Lguensat. "Revealing the impact of global heating on North Atlantic circulation using transparent machine learning." Journal of Advances in Modeling Earth Systems 13, no. 8 (2021): e2021MS002496.

Note

Any useful pre-requisite modules/knowledge? : Hydrodynamics could be helpful but not required for this project

New approaches to estimating greenhouse gas emissions in London

Project code: MLBD_2024_20 Supervisors(s): Heather Graven (Physics/Space, Plasma and Climate)

Note

This project can host up to 2 pair(s) of students.

This project will use new computational approaches and new data sources to estimate greenhouse gas emissions in London, particularly CO2 and CH4. The project will address new needs and opportunities to better understand greenhouse gas emissions for London city, for companies and for citizens, as there are new reporting rules for emissions and new policies to achieve net zero emissions. New data sources include satellite data, atmospheric measurements, and other big data such as internet or map searches. See work by ClimateTrace for examples in this field.

Note

Any useful pre-requisite modules/knowledge? : Atmospheric physics would be beneficial

PyTorch based inverse-design of neuromorphic computing hardware.

Project code: MLBD_2024_21 Supervisors(s): Will Branford (Physics/Matter)

Note

This project can host up to 1 pair(s) of students.

The energy cost of machine learning doubles every 3.4 months and is already great than the entire usage of Argentina. 21% of global energyproduction is predicted to be expended on IT by2030. While performing machine learning on standard (von Neumann) architecture computers is exceptionally powerful, it is also very inefficient. There is a drive towards hardware which is moresuited to machine learning and can reduce thisenergy cost. The original machine learning algorithms were developed by Sherrington and Kirkpatrick to describe magnetic arrays. Becauseeachmagnetic dipole interacts with all its neighbours at no energy cost a nanomagnetic array is naturally massive parallel 'neural network'. The supervisor’s group in EXSS Physics make nanoscale magnetic arrays for neuromorphic computation hardware1. There was a recent publicationshowing that Pytorch based methods could beused to design magnetic configurations with the based performance in a package called SpinTorch2.The supervisor is working with the original coders of SpinTorch on further development of thepackage and its use to inverse-design the magneticpattern we use as the scatterer. The SpinTorch designers were able to show that the design of a neural network hardware, where all neuromorphic computing functions, includingsignal routing and nonlinear activation areperformed by spin-wave propagation and interference. Weights and interconnections of the network arerealized by a magnetic-field pattern that is applied on the spin-wave propagating substrateand scatters the spin waves. The project would consist of modifying the SpinTorch code such that is can simulate the exact structures made in the EXSS group, the then use thesoftware machine-learning to inverse-design thebest magnetic array geometry and magnetic pattern to perform specific neuromorphic computingtasks in hardware. An overview of the research can be seen in the OASIS seminar series talks by Gyorgy Csaba (SpinTorch) and Kilian Stenning (Supervisor’s group, neuromorphic reservoir link J. C. et al. arXiv:2107.08941 (and Nature Nano (2022)). 2Papp, A. et al. Nature Communications 12, 6422 (2021).

Note

Any useful pre-requisite modules/knowledge? : N/A

Statistical machine learning for optimal control of driven-dissipative quantum systems

Project code: MLBD_2024_24 Supervisors(s): Florian Mintert (Physics/Light)

Note

This project can host up to 1 pair(s) of students.

Quantum systems can be used for technological applications. Turning a quantum system into an actual device requires accurate control of the system's dynamics. With many quantum systems, we have a hard time learning how to control their dynamics with simulations, because of the mere effort to simulate quantum dynamics. It is thus highly interesting to learn how to control a quantum system based on an experiment rather than with theoretical simulations. Statistical machine learning is well suited to learn from small samples of experimental data. Goal of the project is to explore the use of Bayesian optimisation for a system of a trapped ion interacting with a quantised light field. Controlling the ion we can try to create states of non-classical light. Goal of the project will be to use Bayesian optimisation to learn how to control the ion in order to realise non-classical states of light with desired properties. In this project you would actually simulate the system dynamics, but we would like to explore the benefits of using Bayesian optimisation in an experiment.

Note

Any useful pre-requisite modules/knowledge? : quantum information, quantum optics

'It's nothing...' (Responses to head injuries - legal, professional, medical)

Project code: MLBD_2024_25 Supervisors(s): William Proud (Physics/Matter)

Note

This project can host up to 1 pair(s) of students.

The legal, professional and performance response to head injuries. A driver, struck on the head and rendered unconscious will not be allowed to drive for 6 months. A rugby player, is no longer allowed to play after being concussed three times. A boxer... if they can get up within ten seconds... on they go. This project will involve analyzing medical databases, legal and policy documents, 'web-scraping' and other methods to present a global view on how head injuries are dealt with at a legal, t at professional level, what is 'allowed', 'restricted' or 'prohibited' and the evidential level of support. This is a project for those who would like a cross-disciplinary project, involving a range of data sources and interpretations.

Note

Any useful pre-requisite modules/knowledge? : Students on the MLBD course will have the requisite skills.

Education data analytics and machine learning

Project code: MLBD_2024_26 Supervisors(s): Michael Fox (Physics/Space, Plasma and Climate)

Note

This project can host up to 1 pair(s) of students.

Lambda Feedback is a project at Imperial College London providing self-study exercises to students, including giving automated formative feedback on student responses. An integral part of the project is curating and employing data on how students use the system. We use a cycle of R&D to analyse the data, then develop and deploy algorithms to present the data to student and teacher users in a way that benefits learning. Currently the platform provides 60+ learning modules to thousands of students across all four faculties, with a heavy focus on engineering and natural sciences. We provide automated formative feedback to over 100k student responses each month, half of which is on handwritten mathematics. With millions of lines of high quality, curated data on student behaviour, our rich data set has a lot of possibilities to use statistical methods and/or deep learning to provide insights and actionable information to users. The student on this project will have a choice of direction, but for inspiration two examples from previous projects include: (1) machine learning to provide predictive analytics for academic outcomes, with a view to early identification of intervention needs, and/or positive feedback to students who are on-track. (2) analysis of the mathematical expressions of students, and identifying, via analysis of the mathematical abstract syntax tree, the common errors made by students and suggested feedback in those cases.

Note

Any useful pre-requisite modules/knowledge? : N/A

Model selection in radiation reaction experiments

Project code: MLBD_2024_27 Supervisors(s): Stuart Mangles (Physics/Space, Plasma and Climate)

Note

This project can host up to 1 pair(s) of students.

In this project you will explore using machine learning and Bayesian inference to interpret data from a recent experiment where we attempted to measure the effects of radiation reaction in the collision of high energy electrons with intense laser pulses. A number of models have been proposed for modelling strong field radiation reaction, but these have not been experimentally tested.

In our experiment each collision has a number of unknown parameters (eg laser intensity, offsets between electron bunch and laser), and it’s necessary to infer these for each shot. This makes determining the “correct” model of radiation reaction a real challenge.

Working with existing data you will explore various methods to try and get to a definitive answer to this important question.

Note

Any useful pre-requisite modules/knowledge? : N/A

Saving lives using AI to analyse Breast Cancer Biopsy samples.

Project code: MLBD_2024_28 Supervisors(s): Chris Phillips (Physics/Light)

Note

This project can host up to 2 pair(s) of students.

Breast Cancer is diagnosed by taking samples of the lesions, slicing them, dyeing them with vegetable dyes, and grading them subjectively by eye in a microscope. However this is a very imperfect process and it results in roughly 1/4 of the patients dying from the chemotherapy rather than the disease. This project will use AI tools to analyse the cellular images, and to give the Oncologists more certainty in their diagnosis thus avoiding unnecessary chemotherapy. We will use open source software, and on the practical side, a number of Breast Cancer Biopsy samples from clinical collaborators will be imaged and digitised. The efficacy of the AI approach will be measured quantitatively.

Note

Any useful pre-requisite modules/knowledge? : N/A

Searching for light new physics at the LHC with contrastive learning

Project code: MLBD_2024_29 Supervisors(s): Robert Bainbridge, Benedikt Maier (Physics/Physics of Particles)

Note

This project can host up to 1 pair(s) of students.

One of the primary goals of the CERN LHC is to provide evidence for physics beyond the Standard Model (BSM). Many BSM theories predict the existence of new low-mass particles, and the LHC hosts a broad and varied program to search for these particles. Examples include dark photons, axion-like particles (ALPs), and sterile neutrinos, which could help address unresolved questions such as the nature of dark matter, the strong CP problem, and the origin of neutrino masses.

Most LHC searches focus on particle masses above the GeV scale. However, many BSM theories predict particles at the MeV scale that interact feebly with SM particles and have long lifetimes, making their detection highly challenging with current techniques. Of particular interest are decays into pairs of electrons, which could reveal particles as light as 1 MeV—an unexplored mass range for LHC experiments. These electron pairs, however, may be highly Lorentz-boosted, very tightly collimated, and originate away from the interaction point, creating significant challenges for existing particle reconstruction methods.

This project aims to address these challenges using modern ML techniques to identify low-momentum electrons in data from the CMS experiment. While the details are open for discussion, potential approaches include the use of graph-based deep learning, attention networks, and contrastive learning. The ultimate goal is to develop a fast, robust solution for deployment in CMS searches, potentially through the "scouting" data stream that operates within the real-time trigger systems of CMS. The project may involve collaboration with MIT.

Note

Any useful pre-requisite modules/knowledge? : Nuclear and Particle Physics would be beneficial

Applications of AI in empirical analyses of complex atomic spectra and atomic energy levels

Project code: MLBD_2024_30 Supervisors(s): Juliet Pickering, Milan Ding (Physics/Space, Plasma and Climate)

Note

This project can host up to 2 pair(s) of students.

The spectra and energy levels of neutral and low ionisation stages of many-electron atoms (e.g. the iron-group and lanthanide elements) are of great interest in the spectroscopy of astronomical objects such as nebulae, stars, and kilonovae of neutron star mergers. For the meaningful interpretation of state-of-the-art high resolution astronomical spectra acquired with modern telescopes there is a requirement for highly accurate laboratory measured atomic level energies and spectral line wavelengths reference data. Unfortunately, incompleteness in such atomic data still plagues astronomy.

An observed atomic spectral line gives us information on only the energy separation between two atomic energy levels and the likelihood of the transition - the atomic energy levels must be reconstructed from the list of spectral lines. Level energies (relative to ground or ionisation) must be known in order to accurately determine the nature of the electron wavefunctions and produce meaningful spectral reference data. Empirically determining the exact level energies must be approached very carefully and theoretical models are used as guidance, because the complex spectra for a single ‘heavy’ element can contain up to tens of thousands of transitions.

This project will focus on exploring and developing novel machine learning methods to analyse high-resolution complex atomic emission spectra recorded in the laboratory. For example, investigating possible advantages of using neural networks over traditional methods in spectral line detection and fitting, or applying decision tree methods to match simple sets of observed transitions with corresponding theoretical predictions to empirically determine energy level properties.

The proposed research is a brand-new direction in the field of atomic spectroscopy. Promising results and methodologies will be presented to other experts of the field around the world.

Note

Any useful pre-requisite modules/knowledge? : Not pre-requisites, but could be beneficial: 1) Computational Physics (FHEQ6) contains concepts applied in current methodologies for the analysis of complex atomic spectra. 2) Foundations of Quantum Mechanics (FHEQ6) contains helpful theoretical background of atomic structure.

A unified view of sterile neutrino constraints from cosmology and particle physics experiments

Project code: MLBD_2024_31 Supervisors(s): Stefan Soldner-Rembold (Physics/Physics of Particles)

Note

This project can host up to 1 pair(s) of students.

The sterile neutrino is a hypothesised new type of neutrino that does not experience the weak interaction. Various measurements from experiments such as LSND, MiniBooNE and Neutrino-4 have shown hints of its existence, but these are in conflict with other negative observations. Sterile neutrinos would also be detectable by their imprint on the cosmic microwave background as they add additional relative degrees of freedom during structure formation in the early universe. In previous MPhys projects, we have published two papers, Eur. Phys. J. C 80, 758 (2020) and Phys. Lett. B 764, 322 (2017), which quantitatively compare these limits, something that is rarely done directly since they the fields use very different parameter spaces. Since the publication of these papers there are a significant number of new results available. The purpose of this project will be to update the analysis with all available new data, with a view to an updated publication.

This project will use data and statistical techniques. It will primarily be writing code in C++ and python to analyse the various datasets, but no expert coding knowledge beyond what should be familiar from undergraduate computing courses.

Note

Any useful pre-requisite modules/knowledge? : Some knowledge of particle physics would be useful.

Machine Learning for Dark-matter Searches in Liquid-argon neutrino detectors

Project code: MLBD_2024_32 Supervisors(s): Stefan Soldner-Rembold (Physics/Physics of Particles)

Note

This project can host up to 2 pair(s) of students.

Machine learning classifiers are becoming the standard for groundbreaking dark-matter searches. Liquid-argon Time Projection Chambers exposed to high intensitiy neutrino beams provide rich data sets including images of events, which allows application of machine learning such as Convolutional Neural Networks (CNNs). This project involves working with machine learning to look for events mediated by dark photons at the MicroBooNE experiment. The goal is to optimize the classification of signal events for new topologies and dark-matter processes by comparing and optimising different algorithms. While good performance of the algorithm is important, the selection criteria need to be understood in term of the underlying physics, as they might depend on artifacts in the simulation or the data. The work would be build on our previous results published in link

Note

Any useful pre-requisite modules/knowledge? : Some knowledge of particle physics would be useful.

High-throughput exploration of materials and processes for optimising organic solar cells

Project code: MLBD_2024_33 Supervisors(s): Jenny Nelson, Matthew Ward (Physics/Matter)

Note

This project can host up to 1 pair(s) of students.

While the performance of organic solar cells has improved rapidly, thanks to the availability of a wide range of candidate organic semiconductor materials [1], the process of optimising and testing any new material combination is time-consuming. This project is concerned with developing an approach to high-throughput optimisation of solar cell parameters using automated film fabrication and device testing along with machine learning. The overall aim of the work is to speed up the optimisation of the processing conditions (choice or ratio of materials, process parameters such as spin-coating speed, layer thicknesses, etc) in order to extract process- performance and process-stability relationships for the devices. The particular goal of the MRes project is to develop and evaluate approaches to data analysis in order to access the relationships between materials or processes and device performance, and use these relationships to predict conditions for improved behaviour. The overall project goal is to greatly reduce the quantity of material and time required to optimise a given organic solar cell architecture (either for lifetime or efficiency).

The experimental side of the project is based around a system being developed in the physics department for high-throughput fabrication and testing of organic solar cells, that includes a programmable robotic arm that can be used to prepare thin films and rapidly explore varying process parameters and a device testing chamber for high-throughput device measurements. The student will study the operation of these systems, help to develop protocols for efficient data collection, and develop machine-learning based tools to analyse data and extract structure-property process relationships. He or she will attempt to analyse the data by using and then adapting approaches from literature, using both data based and physics-based models. The student will have the opportunity to design experiments and to collect data. Students with a strong background in automation could also contribute to system building.

The work will involve the following type of activity: •Learning how databases of organic solar cell data can be analysed using ML based models, whether data-based or physics-based. •Training of a neural network using experimental data, either from literature or generated in-house by the testing of robotically fabricated organic solar cells and thin films. •Implementation of this neural network in the robotic fabrication procedure to rapidly optimise a given organic solar cell architecture for figures of merit such as power conversion efficiency, lifetime, open circuit voltage, short circuit current, and fill factor. •For students with a background in automation, contribution to developing a network connected and python controlled robotic system for automated film and device fabrication.

[1] A. Armin et al, “A history and perspective of non-fullerene electron acceptors for solar cells” Advanced Energy Materials (2021) link [2] Xabier Rodrıguez-Martınez et al., “Accelerating organic solar cell material’s discovery: high-throughput screening and big data” Energy & Environmental Science (2021) link [3] Stefan Langner et al., “Beyond Ternary OPV: High-Throughput Experimentation and Self-Driving Laboratories Optimize Multicomponent Systems” Adv. Mater. (2020) link [4] Larry Luer et al., “A Digital Twin to overcome long-time challenges in Photovoltaics”, Joule (2024) link

Note

Any useful pre-requisite modules/knowledge? : N/A

Transformer Architectures in FPGAs

Project code: MLBD_2024_34 Supervisors(s): Lauri Laatu (Physics/Physics of Particles)

Note

This project can host up to 1 pair(s) of students.

Transformers are the building blocks of state of the art machine learning models, having outperformed other architectures in language models, vision models, and more recently also in particle physics.

Many physics applications require latencies or throughput rates that exceed what GPUs are capable of. An example of this are the trigger and data acquisition (TDAQ) systems of the LHC experiments. These require the usage of more specialised chips, eg. field programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs). However, adapting transformers to run on these chips has been slow, since many of the building blocks of transformers do not perform well on these chips out of the box.

In this project the students will be implementing and evaluating different methods to adapt transformers to better suit FPGAs. The project targets Intel FPGAs using oneAPI. The implemented architectures will be done under the hls4ml framework, which is a widely used tool to convert ML models into FPGAs in the high energy physics community and it is quickly gaining popularity beyond link

Note

Any useful pre-requisite modules/knowledge? : N/A

Co-training decision trees and neural networks for high energy physics

Project code: MLBD_2024_35 Supervisors(s): Christopher Brown (Physics/Physics of Particles)

Note

This project can host up to 1 pair(s) of students.

High energy physics experiments are starting to use machine learning for their highest speed data processing where 100s TB/s need to be analysed. Typically these machine learning algorithms are trained separately and then used in isolation for different tasks in the data processing chain but all rely on each other in complex ways. This project focuses on the co-training of both neural networks and boosted decision trees together where both are optimised for different tasks at the same time in a combined training loop. The student will explore methods for co-training machine learning algorithms with combined loss functions, and how to optimise model architectures for different tasks and different hardware applications. The project will focus on one specific algorithm within the CMS high-speed trigger system with an existing neural network-based approach and will look to expand on this algorithm aiming to boost both its physics performance but also its latency and overall computing footprint. These methods are not only useful for high speed physics triggers but across any application where multiple models are being used and would benefit from being optimised as an ensemble with a common goal.

Note

Any useful pre-requisite modules/knowledge? : N/A