Project Proposals

The following is a list of projects available for MLBD MRes students in the academic year - 2022/23

(1200m) High (20m MWh) Energy (multidimensional renewable and commodities optimisation) Physics

Project code: MLBD_1 Supervisors(s): John Hassard (Physics/Particles)

Note

This project can host up to 2 pair(s) of students.

In this project 'Big data' refers to both the physical system size and the vast amount of data and optimisation required. The Solar Cyclone Tower (SCT) is a new technology (the principles of which were invented by Leonardo da Vinci) capable of up to 20m MW-h electricity, 120m $m^3$ water and Direct Air Capture (DAC) of significant quantities of CO2. It consists of a central tower, 1200m high, surrounded by a Solar Collector (effectively a greenhouse) of radius up to 5km. This vast system - currently in development in three countries - can make a significant impact in many aspects of sustainability. An important figure of merit for the SCT is the Capacity Factor (CF) - effectively the ratio of the delivered energy to the maximum energy theoretically obtainable. A solar PV might reach 30% in sunny climes, an offshore wind farm can get to 45% while nuclear fission power of 90% is not unusual. The nominal CF of the SCT is about 50%; using vast sensor arrays and continuous real-time readout, optimised ML and other technologies we hope to push CF up above 80%. The sensor outputs will include UV, Vis, IR light, wind speed, turbulence, humidity and other parameters. This project is High Energy Physics in an unusual -but extremely important - guise.

Note

Any useful pre-requisite modules/knowledge? : ML expertise an advantage, but ability to work ridiculously hard in a cross-disciplinary environment is essential.

Machine learning approaches for high fidelity determination of x-ray properties in XFEL experiments

Project code: MLBD_2 Supervisors(s): Jon Marangos (Physics/Light)

Note

This project can host up to 1 pair(s) of students.

Our earlier work demonstrated the effectiveness of applying machine learning to predicting the pulse characteristics from an x-ray free electron laser (A.Gonzalez Sanchez et al, Nat.Comm. 8, 15461 (2017)). It is now urgent to extend these approaches to apply to the successor facilities planned to operate at 1 MHz repetition rate (e.g. LCLS II operational from 2022, European XFEL operational now) where it is impractical to diagnose all x-ray properties on each shot. Likewise, new x-ray modes are now being used for attosecond pulse generation that we are using in our research (J.Duris et al, Nat.Phot. 14, 30 (2020)) and demand machine learning approaches to accurately determine the x-ray state in each pulse. The project would extend earlier approaches by applying machine learning techniques such as neural networks, decision trees and Bayesian approaches to develop tools that will be a critical tool in the anlysis of multiple future experiments and in the optimisation of XFEL facilities.

Note

Any useful pre-requisite modules/knowledge? : Nothing specific

Can machine learning control a NanoPhotonics laser?

Project code: MLBD_3 Supervisors(s): Riccardo Sapienza (Physics/Matter)

Note

This project can host up to 2 pair(s) of students.

Complex laser, as for example network lasers built from interconnected nanoscale photonic waveguides, emit laser light over many unpredictable frequencies and in many directions [1]. We recently discovered that the spatial variation of light amplification, controlled by an external laser beam (the pump), provide ways to control the network modes and the lasing spectrum [2]. In this project, you will employ machine learning to find pump illuminations spatial patterns to control the laser. You will look for specific properties such as single mode lasing, tunable emission wavelength, and more complex lasing profiles. The semiconductor network lasers you design will be fabricated by our collaborators at IBM Zurich and characterised at Imperial using a home-built spectroscopy setup. The long term goal is to realise a programmable laser for on-chip optical information processing.

[1] Determining random lasing action, R. Sapienza, Nature Reviews Physics 1, 690–695 (2019) [2] A nanophotonic laser on a graph, M Gaio, et al. Nature Communications 10, 226 (2019)

Note

Any useful pre-requisite modules/knowledge? : general background in ML and laser theory are useful but not essential.

Detecting signals of immune evasion in Sars-Cov-2

Project code: MLBD_4 Supervisors(s): Barbara Bravi (Mathematics/Biomathematics)

Note

This project can host up to 1 pair(s) of students.

We will design a machine-learning method to identify regions of the evolving Sars-Cov-2 virus that are subject to immune pressure and hence whose evolution can be putatively driven by immune evasion.

Note

Any useful pre-requisite modules/knowledge? : N/A

Machine Learning Optimisation of Plasma Accelerators

Project code: MLBD_5 Supervisors(s): Zulfikar Najmudin (Physics/Space, Plasma ad Climate)

Note

This project can host up to 1 pair(s) of students.

Plasma accelerators are a technique to reduce the scale of particle accelerators by using the large accelerating fields that are possible within plasmas. One way to generate these fields is by using plasmas to cause space-charge separation of the electrons and ions within the plasma. However, extremely high energy densities and thus high laser intensities are required to operate the accelerators. As a result, the operation of the accelerators is very sensitive to the many input parameters and optimisation of these devices can be very challenging. But the optimisation of a highly non-linear multivariate system is exactly the kind of problem for which machine learning techniques can be perfectly suited.

We have recently begun exploring machine learning to optimise plasma accelerators and found immediate benefits in terms of time taken to initiate operation of the accelerator, greater versatility in the use of the accelerator by optimising for different outputs, and even in better understanding of the physics of the accelerators by finding hitherto undiscovered relations between input parameters. This is especially the case when random (stochastic) variations can often hide these dependencies.

In this project, we will apply machine learning techniques to developing plasma accelerators. First, through the use of ML directed simulations, we will be able to optimise different defined outputs of a laser-plasma accelerator. Control of the pulse shape, both temporally and spatially, as well as of the target condition, will allow novel accelerator geometries to be investigated. If time permits, these optimisations will then be implemented in our plasma accelerator development laboratory through the programming of the inputs of the devices as well as automated control of the experiments.

R.J. Shalloo, et al, Automation and control of laser wakefield accelerators using Bayesian optimization, Nat. Commun. 11 (2020) 6355. link

Note

Any useful pre-requisite modules/knowledge? : Yes Plasma Physics, Computational Physics, Lasers and Laser Technology would be beneficial.

Efficient Markov Chain Monte Carlo for tricky Likelihoods

Project code: MLBD_6 Supervisors(s): Patrick Dunne (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

Exploring complicated parameter spaces efficiently is a technique key to many cutting edge analyses in high energy physics. For example, in my field of neutrino-oscillation physics we hope to make statements at the 5 sigma level on matter-antimatter asymmetry in the near future. The likelihood space that must be explored typically has ~750 dimensions and is highly non-Gaussian.

The technique I use to carry out parameter estimation is Markov Chain Monte Carlo (MCMC). There are many MCMC algorithms on the market, each with its own advantages and disadvantages. Metropolis-Hastings is commonly used as it is robust to target distributions with pathologies such as discontinuities, but it's not the fastest. Hamiltonian MCMC is a much faster algorithm particularly with high dimensionality, but it requires the evaluation of derivatives making discontinuities in the target distribution hard to handle. This project will study ways of calculating approximate derivatives for use with Hamiltonian MCMC in an attempt to allow the speed benefits of this algorithm to be used with a wider set of target distributions. The students are also encouraged to explore other MCMC algorithms and compare their benefits to those suggested above.

Hardware acceleration is also a possible line of enquiry for this project and we already have significant experience in my group of GPU acceleration and FPGA programming.

Note

Any useful pre-requisite modules/knowledge? : Not sure what these are but good C++ or python knowledge is essential

Machine Learning and Causal Sets

Project code: MLBD_7 Supervisors(s): Yasaman Kouchekzadeh Yazdi (Physics/Theory)

Note

This project can host up to 1 pair(s) of students.

Causal set theory is an approach to quantum gravity where spacetime (classically described by a Lorentzian metric and manifold) is composed of fundamentally discrete elements and the causal relations among them. The physical information in a causal set is often stored in a nonlocal and global manner. This renders handling large causal sets computationally challenging (the matrices one works with can easily have dimensions of millions x millions). A fruitful direction of exploration is to use machine learning techniques in causal set theory. Applying ML methods to causal sets would have practical benefits and could even shed light on new physics. This project will explore two applications of ML to causal sets: 1) ML algorithms can reduce complex and high-dimensional problems (or matrices) to simpler and lower-dimensional ones. We will explore how well we can encode the information in a causal set in a lower-dimensional quantity. 2) We will explore whether or not and how well causal sets of different types (manifoldlike vs. non-manifoldlike, different dimensions, curved vs. flat) can be distinguished from one another using ML techniques.

Note

Any useful pre-requisite modules/knowledge? : Some knowledge of general relativity would be beneficial

The demographics of extra-solar planets

Project code: MLBD_8 Supervisors(s): James Owen (Physics/Astro)

Note

This project can host up to 1 pair(s) of students.

In the previous decade we discovered thousands of extra-solar planets (exoplanets). We know several basic facts: exoplanets are incredibly common (most stars host many) and our Solar-System architecture is rare. More detailed questions are hindered by strong observational biases and incomplete data-sets. In this project you will use hierarchical inference modelling to explore the properties of exoplanets; unveiling clues as to their origins.

Note

Any useful pre-requisite modules/knowledge? : N/A

Hardware implementations of Machine Learning algorithms for the DUNE Near-Detector

Project code: MLBD_9 Supervisors(s): Ioannis Xiotidis, Patrick Dunne (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

The Deep Underground Neutrino Experiment (DUNE) is one of the leading future particle physics experiments in the world. Its successful completion and operation will provide important information about the observed asymmetry between matter and anti-matter in the universe along with other interesting New Physics phenomena never probed before. Imperial College holds a leading position in the experiments Near Detector Data Acquisition (ND-DAQ) sub-system, with an active contribution in the design and development of the High-Pressure Gas Time Projection Chamber (HPgTPC) readout electronics DAQ system. The system will be based on custom hardware boards hosting Field Programmable Gate Arrays (FPGAs), responsible for formatting the incoming raw digitized detector data into a commercial based protocol providing data to the software-based event building and selection system. Since the system commissioning is imminent to the end of the decade students joining the project will be part of a large collaboration exploiting alternative event selection techniques and algorithms implemented in FPGAs with the use of High-Level Synthesis (HLS). In more detail different Machine Learning implementations will be evaluated initially on Monte Carlo and subsequently on real physics data obtained during the 2022 test-beam at FermiLab. Main goal for the project will be to identify potential ML use cases implemented in hardware that will allow the software-based selection system to perform more elaborate selections in a more power efficient environment.

Note

Any useful pre-requisite modules/knowledge? : Not necessarily, however having attended Nuclear & Particle Physics would good. Also, basic knowledge of digital electronics wouldn't harm.

Data reduction and mining of global magnetosphere simulations for dynamical modes

Project code: MLBD_10 Supervisors(s): Martin Archer (Physics/Space, Plasma ad Climate)

Note

This project can host up to 2 pair(s) of students.

Earth’s magnetosphere, formed by the interplay of plasma streaming from the Sun with our planet’s intrinsic magnetic field, is a complex dynamical environment that has direct impacts upon our everyday lives through the threat of space weather on technological infrastructure. One way solar wind energy, momentum, and mass may be transferred into and around our space environment is through magnetohydrodynamic plasma waves. These waves form different resonances within the magnetosphere, akin to different types of musical instruments but at so-called ultra-low frequencies and wavelengths many times the size of the Earth. Cutting-edge global magnetohydrodynamic simulations provide one avenue of understanding the complex dynamics of the solar wind - magnetosphere interaction. However, they produce vast quantities of data which are difficult to navigate. This project will therefore aim to apply machine learning techniques to reduce the data output of these simulations into its key components and mine them for the different dynamical modes present. You will determine the global structure, properties, and evolution of each mode, comparing and contrasting these results with simple analytic theory.

Note

Any useful pre-requisite modules/knowledge? : Plasma Physics and Space Physics would be beneficial but not required

Using machine learning to constrain the impact of clouds on climate change

Project code: MLBD_11 Supervisors(s): Paulo Ceppi (Physics/Space, Plasma ad Climate)

Note

This project can host up to 1 pair(s) of students.

Clouds are one of the main uncertainties for future climate change. With global warming, clouds change and so does their impact of the radiation budget (the difference between absorbed sunlight and emitted infrared), which has a knock-on effect on climate change known as cloud feedback. This feedback is poorly understood as climate models cannot simulate clouds reliably.

The project will involve applying statistical learning techniques (e.g. ridge regression) to cloud-radiative data from satellite observations and climate model simulations. The aim will be to better quantify how clouds respond to environmental changes from present-day data, so as to more accurately predict how clouds will change with global warming. The project will build on a successful initial study by Ceppi and Nowack 2021, PNAS link

Note

Any useful pre-requisite modules/knowledge? : Atmospheric Physics

Using Machine Learning to classify Higgs boson interactions

Project code: MLBD_12 Supervisors(s): Nicholas Wardle, Jonathon Langford (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

Data collected during Run 2 of the Large Hadron Collider (2016-2018) has seen us make significant strides in understanding the Higgs boson and how it interacts with other particles. In particular, the clean final-state signature of the diphoton decay channel has been extremely useful in measuring some of the rarer Higgs boson processes that occur at the LHC. Run 3 of data-taking (beginning this year and ending in 2025) will see us at least double our current dataset. Despite this, the gain in sensitivity from purely increasing the statistics will not be a game-changer in the field. Instead, we must turn our attention to more sophisticated analysis techniques to isolate Higgs boson events. In this project we will use cutting-edge machine learning algorithms to classify Higgs boson events using the diphoton decay channel at the CMS experiment. The aim is to develop a novel "all-in-one" ML classifier to improve our simultaneous measurement of many Higgs boson cross sections.

Note

Any useful pre-requisite modules/knowledge? : Advanced Particle Physics

Optimization of X-ray Pulses from laser-plasma interactions

Project code: MLBD_13 Supervisors(s): Stuart Mangles (Physics/Space, Plasma ad Climate)

Note

This project can host up to 1 pair(s) of students.

Warm Dense Matter, i.e. matter at solid density and 10,000 K is not well described by plasma or condensed matter theory. It is found in astrophysical situations such as the centre of gas giant planets and can be created in the lab by intense laser-matter interactions. One way to produce WDM for study in the lab is by irradiating solid metals with X-rays which rapidly heat the material before it expands. This creates a non-equilibrium WDM state where the electron and ion sub systems have different temperatures. The rate and mechanisms by which the ions and electrons reach equilibrium is not well understood.
In this project you will use numerical radiation hydrodynamics codes (e.g FLASH) and atomic physics codes (eg FLYCHK) to predict and optimise a laser-plasma interaction used to produce X-rays suitable for creating WDM. Current experiments convert a few percent of incident laser energy into X-rays, and an X-ray pulse which is tens of picosecond in duration.
This project will explore the use of machine learning method to optimise the laser pulse shape and target design to significantly decrease the pulse duration of the emitted x-rays while maintaining overall efficiency.

Note

Any useful pre-requisite modules/knowledge? : N/A

Digital Pathology and AI for the Image Analysis of Breast Cancer Biopsies

Project code: MLBD_14 Supervisors(s): Chris Phillips (Physics/Matter)

Note

This project can host up to 2 pair(s) of students.

We have developed a technology, Digistain, that has proven very effective at diagnosing Breast Cancer and informing treatment choices. It uses spectroscopy to measure chemical changes caused by the onset of the disease. At the moment though , we still need a human to select the area of the tissue ( the "region of interest" , RoI) we want to analyse. New open source software is emerging that can outline the cells in the images, and perform a range of statistical analyses on them. This project will examine the extent to which this software can itself select the RoI for us.

Note

Any useful pre-requisite modules/knowledge? : N/A

Machine Learning for a Large, Multiwavelength Galaxy Survey

Project code: MLBD_15 Supervisors(s): David Clements (Physics/Universe)

Note

This project can host up to 1 pair(s) of students.

Understanding the evolution of galaxies, and active galactic nuclei (AGN) over cosmic time requires the analysis of large samples of galaxies studied at multiple different wavelengths over large areas of the sky. The SERVS/DeepDrill project [1] is one of the largest such project, observing many of the most well studied extragalactic fields at 3.6 and 4.5 microns in wavelength using the Spitzer Space Observatory.

While the full analysis of this data set is not yet complete, a catalog of 2 million galaxies has been produced for a total of 9 square degrees (eg. [2]) which will be the basis for this project, though further data is also available for possible extensions. This data includes sensitive imaging over 11 bands from the optical to near infrared (u band to 4.5 microns). Additional data at longer wavelengths for these fields from eg. Spitzer and the Herschel Space Observatory is also available.

This catalog clearly represents a very rich resource for the study of galaxies and related phenomena, such as AGN and galaxy clusters. Various studies have already been undertaken, but the potential of applying machine learning methods to this data set have yet to be investigated. This is the role of the current project.

There are a number of goals that can be pursued as part of this project using machine learning methods. These include:

the search for unusual sources through machine learning based outlier analysis. The nature of these outliers is unclear - as outliers they are by definition unexpected sources - but they may include dust obscured AGN, dusty star-forming galaxies, and strong emission line galaxies.
photometric redshift estimation through machine learning. Redshift, which in extragalactic astronomy is effectively synonymous with distance, is a key property in understanding any extragalactic object. The best way to establish a redshift is through a spectroscopic observation but this is not possible for a sample size as large or as deep as this. Redshifts can, however, be estimated from the kind of photometric data available for this sample. There are various methods available, but a purely machine learning approach for this dataset, using the sub-sample where spectroscopic redshifts are available as a training set, could have considerable benefits.
machine-learning classification of sources. Most classification of sources in these surveys has been based on assumptions informed by our current knowledge of astrophysics. A machine learning approach that uses the data itself to divide the sample into different classes, such as k-means, may find new insights into the galaxy population.
a search for galaxy clusters combining positional and photometric information. Galaxy clusters are the largest gravitationally bound structures in the universe. They are usually identified in optical/IR surveys through positional data alone, but the multiwavelength nature of this data set allows a better search for clusters using colour information in addition to positions. A clustering search in the multidimensional positional and photometric space could find new clusters, and bring new insights into galaxy clusters

More broadly, this project represents an opportunity to apply machine learning techniques to a unique, large astronomical data, with great potential for new discoveries, improvements in our knowledge of galaxies and other phenomena, and for applying novel machine approaches in astrophysics.

[1] link link

Note

Any useful pre-requisite modules/knowledge? : The astrophysics course may be useful, and cosmology course may provide some useful extra context, though neither is required.

Searching for the Migdal effect in nuclear scattering using Machine Learning

Project code: MLBD_16 Supervisors(s): Henrique Araujo (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

When an atomic nucleus recoils after being hit by a neutral particle (e.g. a neutron) it is normally assumed that the atomic electrons follow instantaneously. In fact, there is a small probability that the delayed response of the atomic electrons can lead to ionisation of the atom. This quantum mechanical effect predicted by Arkady Migdal many decades ago has not been measured experimentally. The MIGDAL experiment link is aiming at a first observation by firing fast neutrons at an Optical Time Projection Chamber filled with a low-pressure gas. This detector will allow us to “photograph” tracks in three dimensions to identify the rare Migdal topology: a nuclear recoil track plus an electron recoil track from a common vertex. The project consists of implementing ML algorithms to simulated data and then to real data to identify and characterise these Migdal events in the full dataset. This is a hot topic in direct dark matter searches – low-background experiments looking for neutral particle scattering in underground experiments: it is possible that a very faint nuclear recoil goes undetected by falling below the threshold of the experiment, but the Migdal electron can still be recorded, giving sensitivity to lighter dark matter particles.

Note

Any useful pre-requisite modules/knowledge? : N/A

Optimising the Solar Cyclone Tower for water, electricity, Direct Air Capture of CO2 and changing the global energy infrastructure.

Project code: MLBD_17 Supervisors(s): John Hassard (Physics/Particles)

Note

This project can host up to 3 pair(s) of students.

The Solar Cyclone Tower (SCT) is a novel technology based on the simple premise that hot air rises, but now evolved into a technology capable of providing sustainable solutions in electricity generation, water production, CO2 capture and a range of other applications. In recent months, a South Asian country has adopted the technology, and we are preparing to further develop it for installation in deserts and off-shore. This project will work with a wide range of tools, including a novel approach allow machine-learning greatly to improve the computational fluid dynamics which underlie the SCT. Other projects will include the development of CO2 capture technologies (on a globally-significant way), free-space laser communications and a vast sensor array feeding a novel multidimensional Monte Carlo system optimisation program.

Note

Any useful pre-requisite modules/knowledge? : Environmental Physics, Entrepreneurship for Physicists, and computing would be good. Electronics skills, biotechnology insights and engineering skills would be advantageous, but not essential.

Identifying Low-mass Dark Matter Events with Machine Learning

Project code: MLBD_18 Supervisors(s): Kelsey Oliver-Mallory (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

Dark matter is one of the most critical topics in modern physics, and the dual-phase xenon time projection chamber (TPC) is the foremost technology probing the identity of this substance. Xenon TPCs were originally selected for their strong response to heavy particles like the WIMP, a dark matter candidate that has long been favoured by the physics community because it arises naturally in theories of the early universe. However, successive experiments have yet to detect such a particle and, as a consequence, the community has turned its eye toward a variety of well-motivated lower-mass alternatives.

In this energy range, xenon TPCs observe significant backgrounds that can obscure dark matter signals. However, machine learning techniques have shown promise in recognising minor features of the background events that differentiate them from those of dark matter. This masters project will involve testing and employing a variety of machine learning techniques–boosted decision trees, anomaly finding, neural networks–to optimize xenon TPCs for low-mass dark matter searches.

Resources

Improving sensitivity to low-mass dark matter in LUX using a novel electrode background mitigation technique link

Note

Any useful pre-requisite modules/knowledge? : N/A

Reservoir computing with nanomagnetic arrays

Project code: MLBD_19 Supervisors(s): Will Branford (Physics/Matter)

Note

This project can host up to 2 pair(s) of students.

The energy cost of machine learning doubles every 3.4 months and is already great than the entire usage of Argentina. 21% of global energy production is predicted to be expended on IT by 2030. While performing machine learning on standard (von Neumann) architecture computers is expectionally powerful, it is also very inefficient. There is a drive towards hardware which is more suited to machine learning and can reduce this energy cost. The original machine learning algorithms were developed by Sherrington and Kirkpatrick to describe magnetic arrays. Because each magnetic dipole interacts with all its neighbours at no energy cost a nanomagnetic array is naturally massive parallel 'neural network'. One powerful application of nanomagnetic arrays is reservoir computing, where much of the computational heavy lifting is transferred to the physics but there is a standard computation machine learning performed on the outputs of the reservoir. The supervisor's group have large datasets available where nanomagnetic arrays have been trained using encoded field sequences and multiple reservoir outputs recorded. This project would test different machine learning strategies (e.g linear regression, ridge regression) to train the weights of the reservoir output in making predictions. An overview of the research can be seen in the SPICE seminar series talk, where the machine learning by reservoir computation is at the link

Note

Any useful pre-requisite modules/knowledge? : N/A

PyTorch based inverse-design of neuromorphic computing hardware.

Project code: MLBD_20 Supervisors(s): Will Branford (Physics/Matter)

Note

This project can host up to 2 pair(s) of students.

The energy cost of machine learning doubles every 3.4 months and is already great than the entire usage of Argentina. 21% of global energy production is predicted to be expended on IT by 2030. While performing machine learning on standard (von Neumann) architecture computers is exceptionally powerful, it is also very inefficient. There is a drive towards hardware which is more suited to machine learning and can reduce this energy cost. The original machine learning algorithms were developed by Sherrington and Kirkpatrick to describe magnetic arrays. Because each magnetic dipole interacts with all its neighbours at no energy cost a nanomagnetic array is naturally massive parallel 'neural network'. The supervisor’s group in EXSS Physics make nanoscale magnetic arrays for neuromorphic computation hardware1. There was a recent publication showing that Pytorch based methods could be used to design magnetic configurations with the based performance in a package called SpinTorch2. The supervisor is working with the original coders of SpinTorch on further development of the package and its use to inverse-design the magnetic pattern we use as the scatterer. The SpinTorch designers were able to show that the design of a neural network hardware, where all neuromorphic computing functions, including signal routing and nonlinear activation are performed by spin-wave propagation and interference. Weights and interconnections of the network are realized by a magnetic-field pattern that is applied on the spin-wave propagating substrate and scatters the spin waves.
The project would consist of modifying the SpinTorch code such that is can simulate the exact structures made in the EXSS group, the then use the software machine-learning to inverse-design the best magnetic array geometry and magnetic pattern to perform specific neuromorphic computing tasks in hardware. An overview of the research can be seen in the OASIS seminar series talks by Gyorgy Csaba (SpinTorch) and Kilian Stenning (Supervisor’s group neuromorphic reservoir link J. C. et al. arXiv:2107.08941 (accepted Nature Nano) (2022). 2Papp, A. et al. Nature Communications 12, 6422 (2021).

Note

Any useful pre-requisite modules/knowledge? : N/A

A new global tropical cyclone data set

Project code: MLBD_21 Supervisors(s): Ralf Toumi (Physics/Space, Plasma ad Climate)

Note

This project can host up to 2 pair(s) of students.

Tropical cyclone ae the amongst the mot damaging natural hazard. Historical observations of tropical cyclones (1) have proven to be of great value to scientific research, to forecasting, and to many industries, such as insurance and reinsurance. But in almost all cases, the observations were made in support of immediate forecasting needs and were not quality controlled with an eye toward the uniformity and consistency that we demand of climatological datasets. Yet the increasing use of such data for risk assessment and in the detection of trends and variability warrants both a careful reanalysis of existing data and the application of uniform standards to future observations. In this project you will apply and test machine learning techniques (such as CNN) to global weather reanalysis (2) to create the first near real -time, hourly resolution tropical cyclone data (intensity and size) going back to 1979 and constantly updated. (1) link link

Note

Any useful pre-requisite modules/knowledge? : No

Simulation of the performance of LhARA, the Laser-hybrid Accelerator for Radiobiological Applications

Project code: MLBD_22 Supervisors(s): Kenneth Long (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

The Laser-hybrid Accelerator for Radiobiological Applications (LhARA, see link is a novel accelerator optimised to serve a systematic programme of radiobiology. The beam will be sent either to an end-station for in-vitro experiments or injected into a post accelerator to boost its energy to that required for in-vivo or high energy in-vitro experiments. The technology developed to serve LhARA has the potential to transform clinical practice in particle-beam therapy.

A student taking this project will work on the development of the simulation of LhARA; the beam line, its instrumentation, and/or the design and evaluation of the experimental end stations. The low-energy beam line is particularly challenging, needing very low mass or non-intercepting diagnostics. The fixed-field accelerator proposed for LhARA requires novel instrumentation to be developed using techniques that have synergy with those required for many other accelerator facilities. The group is active in international collaborations and the successful applicant will have the opportunity to work with personnel at CERN or in Paris.

Note

Any useful pre-requisite modules/knowledge? : No course is considered as a prerequisite. The Physics of Medical Imaging and Radiotherapy course may be an advantage in the simulation of the end stations or if the projection of the technology onto the clinical application is considered.

Computational Fluid Dynamics and atmospheric water extraction

Project code: MLBD_23 Supervisors(s): John Hassard (Physics/Particles)

Note

This project can host up to 3 pair(s) of students.

A team from Imperial College is leading on developing the 1200m tall Solar Vortex Tower, and we are building at least 6, starting in Rajasthan, Northern India. Each SVT will extract around 200million cubic meters of water per year from the atmosphere, and just 30 of them will largely remove the present water shortages experienced in Rajasthan and ultimately, more systems will address the same problem elsewhere in India. This humidity is induced and increased by the SVT: we have vast saline ponds under the SVT solar collector. Our work is to maximise production of this humidity, and its subsequent extraction. One of the tools we use is computation fluid dynamics- performed on arguably the most extreme example of large-scale structures ever built. While much of our work rests on physics understood for hundreds of years, we will be using pioneering machine learning tools and vast sensor arrays to accelerate and optimise our water production systems.

Note

Any useful pre-requisite modules/knowledge? : Environmental physics, computational physics and high energy physics would be useful.

The Quantum Black Butterfly and Organic Photovoltaics

Project code: MLBD_24 Supervisors(s): John Hassard (Physics/Particles)

Note

This project can host up to 2 pair(s) of students.

A team from Imperial College is leading on developing the 1200m tall Solar Vortex Tower, and we are building at least 6, starting in Rajasthan, Northern India. Each SVT will produce over 28m MW-h of electricity and in addition, extract around 200million cubic meters of water per year from the pre-humidified atmosphere. The majority of that huge amount of electricity (equivalent to several nuclear power plants, but much less expensive and with no radioactive waste or weapons potential) comes from the Quantum Black Butterfly and the Organic Photovoltaics. The first was invented by Prof Keith Barnham, and the second by Prof Jenny Nelson, both of the Physics Department at Imperial College. The QBB has the potential to double the conversion efficiency of solar power over conventional technology; the latter has many advantages in deployment and scaling to the necessary vast arrays we are planning. This work will help develop the technology so as to be able to install it in vast areas within the next 3 years.

Note

Any useful pre-requisite modules/knowledge? : Solid state, computational physics, environmental physics.

Bayesian Data Assimilation in Inertial Confinement Fusion

Project code: MLBD_25 Supervisors(s): Aidan Crilly (Physics, Plasma Group)

Note

This project can host up to 1 pair(s) of students.

Recent Inertial Confinement Fusion (ICF) experiments have reached target gains of 0.7 and have entered the ignition regime. Validating our physical understanding and models of this ignition regime is a critical step on the path to higher gains and fusion energy. Analysing experimental ICF data to extract physical parameters involves solving a non-unique inverse problem. High dimensional inverse problems are difficult and therefore simplified physical models are often used, belying the true complexity of the underlying physical system. Typically, ICF diagnostic data is analysed independently with reduced models to extract, potentially inconsistent, integrated quantities which are then compared to build up an understanding of the experiment. A more detailed integrated data analysis simultaneously combining all diagnostic data (including scalar, imaging, and time-resolved data) would increase the accuracy of inference and allow more physics parameters to be extracted from the data including their spatial and temporal evolution. ML/AI methods are well suited to the task of combining diagnostic information from multiple sources. In particular, Bayesian methods allow uncertainties arising from experimental data and modelling to be included in the inference process. These Bayesian data assimilation methods can be tested and trained on synthetic data produced from hydrodynamic simulations. Synthetic diagnostic models will be used to produce synthetic experimental data in the forward process. The Bayesian inference framework will then solve the inverse problem and these results can be compared to the hydrodynamic simulation directly. By working with synthetic data, the consistency between the forward and inverse problems can be validated.

Note

Any useful pre-requisite modules/knowledge? : N/A

Using methods from statistical physics and big data to study neuronal avalanches

Project code: MLBD_26 Supervisors(s): Henrik Jeldtoft Jensen (Mathematics/Mathematical Physics)

Note

This project can host up to 2 pair(s) of students.

In recent years, the statistical physics of avalanches – as observed in earthquakes, sandpiles and forest fires – has been used to explain the dynamics of bursts of neuronal activity observed spontaneously in brain tissue in culture and in vivo (Beggs & Plenz, 2003; Hahn et al., 2010; Petermann et al., 2009). However, a crucial unanswered question is how neuronal avalanches affect behaviour, and in particular cognitive phenomena such as memory and decision-making which we might expect to be driven by internal processes. Altered avalanche dynamics have been implicated in cognitive dysfunction caused by schizophrenia (Seshadri et al., 2018).

We will use the data from two photon mesoscope imaging of the brain of a head-fixed mouse running on a floating maze platform (Go et al., 2021), and develop approaches based on statistical physics, information theory, and graph theory for understanding how long-range interactions underpin brain function during cognitive load. We will employ a data-driven dimensionality reduction technique based on a renormalisation procedure (Meshulam et al., 2019), combined with tools of information-theoretic causality analysis. Numerical simulations of a variety of network models will help identify the features of inhibitory neurons fundamental for task performance, by varying their connectivity, role in network structure and physiological properties.

References:

Beggs, J. M., & Plenz, D. (2003). Neuronal Avalanches in Neocortical Circuits. Journal of Neuroscience, 23(35), 11167–11177. Go, M. A., Rogers, J., Gava, G. P., Davey, C. E., Prado, S., Liu, Y., & Schultz, S. R. (2021). Place Cells in Head-Fixed Mice Navigating a Floating Real-World Environment. Frontiers in Cellular Neuroscience, 15, 19. Hahn, G., Petermann, T., Havenith, M. N., Yu, S., Singer, W., Plenz, D., & Nikolić, D. (2010). Neuronal avalanches in spontaneous activity in vivo. Journal of Neurophysiology, 104(6), 3312–3322. Meshulam, L., Gauthier, J. L., Brody, C. D., Tank, D. W., & Bialek, W. (2019). Coarse graining, fixed points, and scaling in a large population of neurons. Physical Review Letters, 123(17), 178103. Petermann, T., Thiagarajan, T. C., Lebedev, M. A., Nicolelis, M. A. L., Chialvo, D. R., & Plenz, D. (2009). Spontaneous cortical activity in awake monkeys composed of neuronal avalanches. Proceedings of the National Academy of Sciences of the United States of America, 106(37), 15921–15926. Seshadri, S., Klaus, A., Winkowski, D. E., Kanold, P. O., & Plenz, D. (2018). Altered avalanche dynamics in a developmental NMDAR hypofunction model of cognitive impairment. Translational Psychiatry 2017 8:1, 8(1), 1–12.

Note

Any useful pre-requisite modules/knowledge? : N/A

Quantum control with statistical machine learning

Project code: MLBD_27 Supervisors(s): Florian Mintert (Physics/Light)

Note

This project can host up to 1 pair(s) of students.

All tasks of quantum information processing require accurate control over the quantum mechanical hardware. Given the high level of imperfection of existing hardware, it is a very challenging task to learn how to control and actual device. Statistical machine learning can be used to rapidly learn how to operate a quantum device. Goal of this project is to implement learning strategies that help us to learn how to operate quantum hardware as data-efficiently as possible.

Note

Any useful pre-requisite modules/knowledge? : quantum information or foundations of quantum mechanics is helpful, but not mandatory

Machine learning algorithmic optimisation of deformable mirrors in high-power laser experiments.

Project code: MLBD_28 Supervisors(s): Roland Smith (Phyics / Plasma Physics)

Note

This project can host up to 1 pair(s) of students.

Deformable mirrors use multiple computer controlled actuators to bend an optical surface with few nanometer precision. This can be used to correct subtle spatial or temporal phase aberrations in a multi-terawatt, short-pulse laser beam and significantly improve its performance, e.g. by optimizing the laser focal spot or pulse shape in time. Rather counter intuitively, many interesting processes driven by a laser (e.g. particle acceleration or filamentation) can also benefit from a non-ideal spot or pulse shape. A major challenge in using this technique is that the mapping of “control values” from a computer to a real world mirror surfaces – and then to a physical process is highly non-linear and in some cases not well understood. The “search space” is also very large, a 9 actuator mirror with a 12-Bit control system has ~ 3x10E32 different configurations, a 15 actuator system expands this to ~10E54 ! Finally, we also need to teach a computer to recognize “good” and “bad” results as it learns about the system, that could use a simple algorithm to give a single quality metric of a focal spot image, but for complex images with lots of fine-structure or an entire experiment a neural net or other technique machine learning might have significant speed and “robustness” advantages.

This project will use Genetic (GA), Bayesian and other algorithmic approaches to optimize the “shape” in space and / or time of single and multiple mirror systems and use this to control high-power laser experiments. We will also investigate different image recognition techniques (assorted “direct” algorithms versus neural nets) to identify “good” and “bad” laser beams. If time allows we may also use machine optimization of finite element structural models of “new” mirror systems that can potentially be built and tested.

Note

Any useful pre-requisite modules/knowledge? : Laser Technology is relevant but not essential

Machine learning molecular design

Project code: MLBD_29 Supervisors(s): Jarvist Moore Frost (Chemistry)

Note

This project can host up to 1 pair(s) of students.

A covalently bonded molecule can be fully defined by the molecular graph - the bonds (edges) between atoms (vertices). Molecular design is extremely challenging as the combinatorial space of chemistry is enormous. We have techniques, based in quantum mechanics, which allow numerical prediction of properties of a specific molecule, but which can take an enormous amount of computer time.

We will initially build on general work on variational graph auto-encoders link ), but building a codebase and experience with using these techniques for molecular data. Variational auto-encoders (see link 9 for a good blog post introduction) are a Bayesian machine-learning method which build a probabilistic interpretation of the learned latent space. Once you have this representation, you can move around in latent space, and 'decode' from this vector to the molecular graph representing a unique molecule.

There are large (10'000-100'000 molecule) quantum mechanic based datasets, <1000 molecule experimental data sets, and the option to generate our own data (active learning) from quantum mechanical calculations. Ideally we would write codes in the Julia programming language, and contribute to the open-source Chemellia atomic machine learning framework link ), which already has an implementation of a graph neural network .

In terms of practical application, any successful technique developed will be used to suggest drug candidates for the Open Source Antibiotics link ) and COVID Moonshot link ) projects. There is the possibility of paying for a bespoke synthesis and assay of particularly exciting predictions.

This project is ideal for a student who is interested in combining molecules and machine learning.

A recent summary book chapter of graph neural networks for molecular applications has recently been made available: Wang, Y., Li, Z., Farimani, A. B. (2022, September 12). Graph Neural Networks for Molecules. arXiv. link

Note

Any useful pre-requisite modules/knowledge? : N/A

Blockbuster drug design by blocking transition states

Project code: MLBD_30 Supervisors(s): Jarvist Moore Frost (Chemistry)

Note

This project can host up to 1 pair(s) of students.

Much of a life-form's genes code for enzymes: proteins which catalyse a reaction. Catalysis occurs by lowering the free energy of a transition state: these enzymes have evolved to lower this free energy, either by 'tight-binding' (an enthalpy effect) or by curious exchange of vibrational quanta of energy (an entropy effect). A transition state analogue (TSA) is a chemical mimic for the out-of-equilibrium transition state. Some of these TSAs bind so well to the enzyme that they are almost perfect inhibitors of the enzyme: the time frame of their dissociation is longer (> days) than the normal recycling of proteins.

If you could predict these transition states for a given enzyme, you would have an extremely powerful drug design tool[1]. Much antibiotic resistance is due to bacteria expressing beta-lactamase, enzymes which chop up the characteristic beta-lactam structure of most antibiotics. If you could block these enzymes, you fix at a stroke much antibiotic resistance[2]. Viruses often have a highly conserved (i.e. it can't be easily evolved around) Protease enzyme which is required to chop up and form the viral proteins. If you can block this, you have an antiviral medicine.

Of course, predicting these states in a full blown enzyme is a massive problem. Enzymes are enormous, consisting of tens of thousands of atoms, and finite temperature effects are important, requiring thermodynamic integration (molecular dynamics). Really, this work is best left for the bona fide computational biologists. But can our rather simplistic yet in some ways quite powerful physicist's tooling make some progress?

In this project we will investigate the physics of toy systems representing biological enzymes, and their transition states, particularly with regard to beta-lactam and thereby antibiotic resistance, and with regard to viral Protease (such as the 3CL-Pro of COVID-19). Pharmaceutical companies are not interested in developing new antibiotics as killjoy doctors plan to use them responsibly and keep them as weapons of last resort, so there's no profit to be had. So we academics better step up.

This project would suit students who have an interest in statistical and/or quantum mechanics, and are skilled in theory and/or programming. If we find something that is useful, there's the potential for it to be applied in collaboration with KUANO.ai who are trying to use this transition-state method to design drugs more effectively.

This won't be a 'plain vanilla' machine learning project where we play with some neural networks and/or adapt the architecture, but much more 'physics inspired machine learning' and involving methods from computational physics.

[1] "Enzymatic Transition States and Drug Design", Vern L. Schramm, Chem. Rev. 2018, 118, 22, 11194-11258 link

[2] "Mechanisms of Antibiotic Resistance: QM/MM Modeling of the Acylation Reaction of a Class A beta-Lactamase with Benzylpenicillin", Johannes C. Hermann, Christian Hensen, Lars Ridder, Adrian J. MulhollandH, ans-Dieter Höltje, J. Am. Chem. Soc. 2005, 127, 12, 4454-4465 link

Note

Any useful pre-requisite modules/knowledge? : N/A

Machine learning intermolecular transfer integrals with compact atomic cluster representations

Project code: MLBD_31 Supervisors(s): Jarvist Moore Forst (Chemistry)

Note

This project can host up to 1 pair(s) of students.

Charge transfer in organic semiconductors occurs by nonadiabatic 'hops' of the electron. The rate of this process is a function of the molecular orbital overlap between the molecules [Nelson2009]. This can be calculated using electronic structure theory, but requires several hours of CPU time per pose of two molecules. Faster methods have been developed [Kirkpatrick2008], but which lack accuracy.

In this project we will develop a machine learning method to effectively learn this 6 dimensional data (three Cartesian displacement directions, and three relative orientation components).

We will take a physics guided approach, where our physical understanding of how the function should look (i.e. the asymptotes and symmetry) are imposed on the machine learning method as an inductive bias.

As our data stream is computer generated, we can apply active learning techniques. Also we can apply a fast approximate method to generate lots of data, and then correct these values with a smaller number of higher accuracy calculations, a 'delta' machine learning approach.

In terms of the computational techniques, using a linear machine learning method combined with a modern and more expressive equivariant basis [Musil2021] (such as the atomic cluster expansion, [Drautz2019]) will be our initial approach. If the linear methods are insufficient, we may move to using Gaussian processes.

Nelson, J., Kwiatkowski, J. J., Kirkpatrick, J., Frost, J. M. (2009). Modeling Charge Transport in Organic Photovoltaic Materials. Accounts of Chemical Research, 42(11), 1768–1778. link J. (2008). An approximate method for calculating transfer integrals based on the ZINDO Hamiltonian. International Journal of Quantum Chemistry, 108(1), 51–56. link F., Grisafi, A., Bartók, A. P., Ortner, C., Csányi, G., Ceriotti, M. (2021). Physics-Inspired Structural Representations for Molecules and Materials. Chemical Reviews, 121(16), 9759–9815. link R. (2019). Atomic cluster expansion for accurate and transferable interatomic potentials. Physical Review B, 99(1), 014104. link

Note

Any useful pre-requisite modules/knowledge? : N/A

Optimising shock injection in laser-plasma acceleration

Project code: MLBD_32 Supervisors(s): Stuart Mangles (Physics/Space, Plasma ad Climate)

Note

This project can host up to 1 pair(s) of students.

laser wakefield accelerators driven by 200 TW lasers can accelerate a beam of electrons up to 1 GeV in just 1 cm of plasma. The laser pulse drives a plasma wave in its wake which, it the wave amplitude is large enough, can sweep up and accelerate electrons from the plasma. This “injection process” can be triggered and controlled by shaping the plasma profile - using shock structures.

The injection process is sensitive to the shape, and location of the shock structure. The aim of this project will be to optimise the injection process using machine learning applied to particle in cell simulations of the accelerator.

Note

Any useful pre-requisite modules/knowledge? : No prerequisite.

Machine Leaning and the Deepest Herschel Field

Project code: MLBD_33 Supervisors(s): David Clements (Physics/Universe)

Note

This project can host up to 1 pair(s) of students.

The Herschel Space Observatory observed the sky at far-IR wavelengths (250 to 500 microns) uncovering dusty galaxies that are otherwise hidden from our view. In doing so it has demonstrated that dusty star forming galaxies play an important role in the history of star formation and in the evolution of galaxies. However, Herschel was limited in the galaxies it could study as a result of finite sensitivity and resolution. As part of the SPIRE instrument team has recently been working with calibration observations to produce the deepest (ie. most sensitive) image that is available at these wavelengths. Since no far-IR space missions are planned to be launched before the 2030s this image will remain the most sensitive at these wavelengths for at least the next decade. While still limited by Herschel's finite resolution, analysis of this image using a technique called P(D) indicates that an unexpected population of far-IR emitting galaxies exists well below Herschel's conventional detection limit. We clearly need to know more about this population, but this requires cross comparison between Herschel data and data obtained at higher resolutions at shorter wavelengths. This project will apply a big data approach to understanding this population, using training sets from other Herschel observations and multiwavelength analysis of the Herschel deep field to better understand the nature of these faint far-IR emitting galaxies.

Note

Any useful pre-requisite modules/knowledge? : The astrophysics and cosmology courses would be helpful, but may not be necessary if a student already has a background in astrophysics, and especially extragalactic astrophysics.

Applying machine learning to analysis of data of attosecond pulses from the LCLS X-ray laser

Project code: MLBD_34 Supervisors(s): Jon Marangos (Physics/Light)

Note

This project can host up to 1 pair(s) of students.

X-ray lasers provide the tools to measure the ultrafast structural dynamics of matter and are being applied to a wide variety of physical, chemical and biological systems. Recent high impact research from the LCLS free electron laser at SLAC, Stanford, California includes time resolving the mechanisms of catalysis, measuring phase changes in correlated materials of importance to superconductivity research and determining the dynamics of photo-active protein motion. In May 2022 our team from the Blackett Laboratory conducted two experiment at LCLS using the newly developed XLEAP (attosecond mode) to (a) time resolve the dynamics of an electron hole formed in a small molecule using a pair of X-ray pulses, (b) investigated non-linear interactions of attosecond X-ray pulses with liquid water. A rich dataset was obtained from both these experiments. The task is now to fully analyse this data. As part of that we wish to develop machine learning approaches to automatically categorise the pulse duration and spectrum of all of the X-ray pulses in the data stream. In earlier work on longer (femtosecond) laser pulses our team has successfully improved the categorization and diagnosis of the X-rays. This is important as in future X-ray FEL experiments (in Germany at European XFEL, Shanghai at SHINE and in Stanford at LCLS II) the repetition rate will be too high to rely on full diagnosis of each shot so machine-learning will be vital. A.Sanchez-Gonzalez et al “Accurate prediction of X-ray pulse properties from a free-electron laser using machine learning”, Nature Communications, 8, 15461 (2017)

Note

Any useful pre-requisite modules/knowledge? : No

Data analysis for global 21cm experiments in the context of REACH

Project code: MLBD_35 Supervisors(s): Jonathan Pritchard (Physics/Astro)

Note

This project can host up to 1 pair(s) of students.

One frontier for astrophysics is the Cosmic Dawn - the period when the first generation of stars and galaxies formed and illuminated the Universe with starlight. Radio observations of the 21cm line of hydrogen have potential to observe this period for the first time and there is considerable interest in building new telescopes to detect it. Global 21cm experiments hope to detect this signature of the first stars through low spatial-resolution high-spectral resolution measurements, but there are significant challenges in foreground removal and identification of systematics. The first claimed detection with the EDGES telescope was widely treated with skepticism, but several groups are developing new techniques to try to obtain a robust detection. This project will explore the problem of developing the analysis pipeline for a global 21cm experiment, in the context of the REACH experiment now being deployed in South Africa. The aim will be to employ techniques from the CMB field for map-making from time series measurements in combination with Bayesian statistical tools to explore what would be required for a robust detection. This would mostly work with simulations and mock data, but with some possibility of early science data from REACH to dig into towards the end of the project.

Note

Any useful pre-requisite modules/knowledge? : Cosmology, very useful background

Understand the properties of the Higgs boson through its decays to tau leptons.

Project code: MLBD_36 Supervisors(s): David Colling (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

The experiments on the CERN LHC are the greatest producers of scientific data in the world. The purpose of collecting these data is to understand how the universe works at its most basic level. The Higgs boson is fundamental to this understanding. It was discovered 10 years but we are still trying understand its basic properties. These are predicted by the Standard Model (SM) of particle physics, however we already know that the SM is not complete. One of the most important properties of the Higgs boson is whether or not its interactions preserve the symmetries of charge conjugation and parity (CP). In the SM the Higgs conserves CP but there are many theories that suggest that it might not, also we have seen CP violation in other areas of particle physics. Any observation of CP violation in Higgs sector would be a very major discovery. One of (if not the) most promising area to look for CP violation is in the Higgs interaction with tau leptons. CMS, one of the detectors on the LHC, was able to make the world's first measurement of the CP nature of this interaction[1]. This measurement already makes extensive use of ML techniques, but it is just a first measurement and it is very important that we improve this measurement. This will involve utilising all the information in the data and greater application of ML throughout the analysis. The purpose of this project is to help develop this improved analysis.

[1] "Analysis of the CPCP structure of the Yukawa coupling between the Higgs boson and tau leptons in proton-proton collisions at sqrt{s} = 13 TeV" CMS Collaboration, JHEP 06 (2022) 012

URL = link is a long paper, but don't be put off by that, this is only because we put a lot of new material into a single paper.

Note

Any useful pre-requisite modules/knowledge? : Advanced Particle Physics would be helpful for context, but is in no way essential as you will be concentrating on a specific analysis.

Scaling super-resolved microscopy data analysis to enable quantitative readouts for assays of biomolecular interactions

Project code: MLBD_37 Supervisors(s): Andrew Rose (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

Single molecule localization microscopy SMLM) won a Nobel Prize in 2014 and has helped take fluorescence microscopy beyond the diffraction limit. Essentially it images fluorescent molecules labelling a sample one at a time, which enables their position to be determined with a precision <30 nm. However, it is a labour- and computationally- intensive technique with low throughput that has predominantly produced pictures rather than quantitative readouts. We are working to develop automated SMLM instruments for applications including drug discovery and are exploring cluster analysis of the single (fluorescent) molecule localizations to obtain quantitative readouts of biomolecular interactions in cells .

We are exploring a Bayesian approach to cluster-finding that potentially offers superior results compared to other algorithms at the cost of being more computationally intensive. To accelerate this process, a high-performance C++ implementation has been written, which now requires systematic validation, profiling and further optimization, as well as the development of a user interface. We also want to explore how this new software scales from a single machine to an HPC cluster, which will be necessary to be able to apply it to the vast SMLM data sets generated from high throughput SMLM assays in 96 well plates.

Note

Any useful pre-requisite modules/knowledge? : N/A

Interpretable Machine Learning for Space Weather Forecasting

Project code: MLBD_38 Supervisors(s): Mike Heyns (Physics/Space, Plasma ad Climate)

Note

This project can host up to 1 pair(s) of students.

This project aims to develop novel interpretable machine learning techniques to forecast space weather drivers at mid-latitudes, and do so in an operational context. Space weather has become increasingly relevant with our dependence on modern infrastructure, in particular affecting the power grids and satellite operation we rely on daily. Banking on a legacy of high-latitude auroral region modelling is no longer enough, and models need to be extended to mid- and low-latitude regions. With a physical understanding of the driving near-Earth current systems involved and in-situ measurements of the solar wind, machine learning approaches provide a great platform to work from. Using solar wind data from the ACE and DSCOVR satellites we will focus on forecasting the SYM-H index, a ground based geomagnetic index that is directly driven by the magnetospheric ring current. Emphasis will be placed on creating an interpretable framework that allows for uncertainty in the model to quantified and the output ultimately actioned on.

In performing this work you will have the opportunity to work closely with collaborators in South Africa and Gorgon simulation group at Imperial College London.

Useful references: 1. Camporeale contextualises the current state of machine learning in space weather and highlights some of the challenges faced in producing actionable operational link Siciliano et al. 2020 presents a first stab at predicting the SYM-H index with two different artificial neural network models, which may form a basis for thinking about the link Long et al. 2022 present a more recent attempt at predicting the SYM-H index, but include interpretability which would be at the heart of operational link

Note

Any useful pre-requisite modules/knowledge? : PHYS70019: Space Physics

Optimal self-correcting partial covariance mass spectrometry for peptide sequencing

Project code: MLBD_39 Supervisors(s): Vitali Averbukh (Physics/Light)

Note

This project can host up to 1 pair(s) of students.

The goal of the proposed project is to optimise the recently developed 2D partial covariance mass spectrometry (see T. Driver et al., PRX 10, 041004 (2020); Physics Today, DOI:10.1063/PT.6.1.20201023a). This will be achieved by formulation of the optimal self-correcting partial covariance mapping using machine learning (ML) within a regularised linear model.

The principle of detection of collision fragments for the study of the structure of colliding projectiles and the mechanisms of their decomposition is applied generally across 12 orders of magnitude of collision energy, from organic and biochemical mass spectrometry (~10 eV) to particle physics (up to ~10 TeV). Especially valuable information is provided by the coincidence detection of two or more fragments proving their origin to be in the same decomposition event . However, coincidence measurements in molecular physics, e.g. using the golden standard COLTRIMS or “reaction microscope” setup , are only possible in the idealised conditions of a single decomposition detection and a vast amount (greater than $10^{5)}$ of such individual measurements are required to reach reliable conclusions. For atoms and small molecules, these stringent requirements can be circumvented using the statistical technique of covariance mapping which, instead of requiring a true coincidence detection, focuses on the statistics of the signal fluctuations in a regular, non-coincidence measurement. Application of covariance mapping to larger species has not been successful until recently, because of the large number of spurious signals.

In 2020, we introduced a new method of self-correcting partial covariance, a conceptually new type of covariance mapping spectroscopy based on a single readily available parameter extracted from the measured spectrum itself (the total ion count – TIC, PRX 10, 041004 (2020)). We have constructed a new type of analytical mass spectrometric measurement based on the self-correcting TIC partial covariance: two-dimensional partial covariance mass spectrometry (2D-PC-MS). We have demonstrated that it successfully resolves correlations between fragments of macro-molecules in mass range up to and above $10^4$ Da, enabling high fidelity characterisation of a biopolymer sequence, e.g. of peptides, proteins and oligonucleotides.

The TIC-based partial covariance is an approximation that in no way guaranteed to lead to the fastest convergence of the true correlations with the number of measured spectra and/or to the maximal number of the revealed correlations. Therefore, there is a strong motivation to seek for the optimised single parameter derived from the spectrum itself, which is different from the TIC. We assume that the optimal parameter, W(X), is a weighted sum of fragment intensities, $X_{i:}$
W(X)= $\sum_i$ $w_i$ $X_i$ where the weights $w_{{i}}$ are the optimisation parameters and W(X)=TIC if $w_{i=1}$ with i spanning the physical range of the possible fragment mass to charge ratios. Besides being a natural mathematical generalisation of the TIC, the above equation is physically motivated: fragmentation-to-detection efficiency of any experimental apparatus depends on the mass to charge ratio of the fragment.

Optimisation of the $w_i$ weights on the basis of the large amount of available measurements is a typical problem to be solved by ML algorithms, however the criterion for the choice of the optimised single parameter, W, (the lowest number of scans to reveal N correlations) is not differentiable with respect to the weights $w_{i.}$ We will address this issue by designing differentiable objective functions to facilitate optimisation . Using the labelled data in the learning database (i.e. true and false correlations), we will design the objective function based on the top-k mean average precision (MAP@K) and cost-sensitive conditional risk.

Note

Any useful pre-requisite modules/knowledge? : No.

Photogrammetry/computer vision for upright radiotherapy patient positioning

Project code: MLBD_40 Supervisors(s): Kenneth Long (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

Radiotherapy patients need to be positioned reproducibly for a number of treatment sessions (typically 5-30 sessions over a period of several weeks). Each treatment session may last up to 20 minutes and the patient should stay as still as possible during this period.

This project will explore the use of optical imaging to quantify measurements of patient skin shifts, both between treatment sessions (so-called “inter-fraction motion”) and during a treatment session (“intra-fraction motion"). The project will also investigate how the optical imaging data should best be analysed and presented to clinicians. Presenting the data directly to patients could empower them to manoeuvre their own bodies into the correct position prior to treatment.

Note

Any useful pre-requisite modules/knowledge? : Physics in Medical Imaging and Radiotherapy would be beneficial.

Soft robotics/pressure sensing to help develop upright patient immobilisation strategies

Project code: MLBD_41 Supervisors(s): Kenneth Long (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

Radiotherapy patients need to be positioned reproducibly for a number of treatment sessions (typically 5-30 sessions over a period of several weeks). Each treatment session may last up to 20 minutes and the patient should stay as still as possible during this period.

Leo Cancer Care is designing various arm and body supports to keep patients comfortable and still on an upright treatment chair. This project will explore whether pressure sensing might play an important role in the design of new physical support structures, e.g., can it provide valuable information on whether volunteers stay still, or whether there are pressure points which we should try to relieve? The project will involve consideration of “soft robotics”; how computer-controlled structures with soft surfaces might be used to support upright patients (e.g. programmable cushions which would return to the same shape for each treatment session).

Note

Any useful pre-requisite modules/knowledge? : Physics in Medical Imaging and Radiotherapy would be beneficial

Photon beamline designs: expanding horizons

Project code: MLBD_42 Supervisors(s): Kenneth Long (Physics/Particles)

Note

This project can host up to 1 pair(s) of students.

Leo Cancer Care has developed a fixed radiotherapy beam (rather than a rotating gantry) to reduce the footprint and investment cost of a clinical system. The simplification of the beam delivery provides the scope for a re-evaluation of the elements of the radiotherapy linear accelerator and treatment head. The re-evaluation will be performed with a view to developing a cost-effective, simplified system with low maintenance requirements potentially suitable for deployment in low-and-middle-income countries.

In this project you would consider which elements of the photon beamline might be improved. The project will include consideration of the reduction of complexity, and therefore cost, and the evaluation of strategies by which maintenance requirements will be reduced. You will implement your designs in state-of-the-art software to determine the performance of your design. You will also consider emerging radiotherapy techniques such as high dose rate (or FLASH) treatments. And, as a possible extension, consider the design of a beamline for electron FLASH?

Note

Any useful pre-requisite modules/knowledge? : Physics in Medical Imaging and Radiotherapy would be beneficial