How Can Machine Learning Help Your Research Forward?

Wouter Deconinck, William & Mary

Wouter Deconinck
Wouter Deconinck

The excitement about machine learning, deep learning, and artificial intelligence is everywhere. What is behind these terms, and can you take advantage of their development to advance your career or research? Since we can only scratch the surface in this short and necessarily incomplete discussion, consider it a starting point for further exploration.

Let us start by defining artificial intelligence, the ultimate goal of developing systems that mimic “human intelligence,” i.e., learning and problem solving. Formally, an artificially intelligent system independently takes actions based on measured inputs to maximize the probability to achieve its goals. To do so it relies on several subfields to derive meaning from inputs (computer vision, natural language processing) or to learn implicitly what to base decisions on (deep learning).

Machine learning is the subfield of artificial intelligence that uses statistical techniques to allow computer systems to progressively improve performance on specified tasks, i.e., learn, without explicitly programmed algorithms to perform those tasks. Perhaps the most common machine learning algorithm is the artificial neural network which forms the basis for the field of deep learning.

The “hello world” example of machine learning is handwritten digit recognition based on the standard MNIST data set. A traditional rules-based algorithm to classify the 28x28 pixel images into one of 10 digits would need to take into account whether the ‘7’ has a horizontal cross bar through the middle, and whether the ‘1’ is a simple vertical bar or has an ear and bottom serif. For every exception the algorithm becomes more involved. Machine learning takes a different approach and avoids the need to enumerate all possible cases.

We should not think of machine learning as a magic black box. It is a field founded on mathematical and statistical techniques that are familiar to many physicists (from linear tensor algebra and multidimensional calculus to Bayesian probability). It is not a universal solution to every problem, but it may help to automate certain classes of common problems in physics, leaving you with time to spend on problems not solved by artificial intelligence (yet).

Andrew Ng, a thought leader in the artificial intelligence community and the author of a popular online course on machine learning, summarized the set of problems where artificial intelligence can be useful as follows: If you can perform a mental task in less than a second, then an artificial intelligence algorithm can be used to automate the task now or in the near future. This turns out to be a useful test when assessing where you may be able to use machine learning algorithms. Start by focusing on the problems where you can quickly tell what the answer is but where it is hard to write down a traditional rules-based algorithm.

At the risk of oversimplifying, there are two broad classes of machine learning: supervised and unsupervised learning. In supervised learning, the data scientist has access to a representative training data set, perhaps painstakingly verified by hand. Using this training set of input values (features) and the corresponding output values (targets), the training process determines the parameters of the algorithm such that a loss function is minimized.

Two classes of problems lend themselves well to supervised machine learning. We discussed classification of handwritten digits, which is similar to the separation of signal from background based on detected signals in physics. In regression problems we wish to determine the relationship between input values and the expected mean of the output value, even when that output value is affected by measurement noise as is commonly the case in experimental physics.

The loss function to be optimized is often the sum of quadratic differences between predicted and actual targets, similar to a least squares method. For classification problems, where the algorithm determines probabilities for each possible class, we use loss functions based on log likelihood or information entropy.

The loss minimization during the training process often uses a variation of the gradient descent minimization algorithm, repeatedly applied on a subset of the potentially large training data set and with a specified learning rate. As with the fitting of a polynomial to a set of data points, overfitting can occur when the number of algorithm parameters is large, a common occurrence in large artificial neural networks. In order to minimize overfitting we set aside two parts of the training data set: for validation and for testing. After training an algorithm on the training data set, we assess the performance on the validation data set. When overtraining, the cost for the training set will continue to decrease but the cost for the validation set will start to rise indicating that we are now describing the training set at the cost of generality. After we determine the network parameters and learning rate through repeated training and validation cycles, we can assess the quality of the trained algorithm with one final evaluation of the as-of-yet-unseen test data set.

The strength of machine learning often derives from non-linear elements, for example, as activation function f in each of the nodes of a neural network that project their input x→ onto their output y = f(w→·x→ + b). Training determines the weights w→ and bias b for all network nodes. The S-shaped sigmoid curve ensures that the projection is continuous, differentiable, and bounded to the unit interval.

In unsupervised learning, we do not use a training data set. Clustering algorithms can find structure in high dimensional data sets based solely on the data set under study. Anomaly detection algorithms on measurement data may use autoencoder networks to reduce the data dimensionality, i.e., the network learns to ignore noise but recognizes when new data deviates regardless. This in turn can be exploited in generative adversarial networks to produce data sets that behave superficially as identical to an original data set, and this process is a fast and inexpensive way to produce additional simulated data.

If you are interested in exploring machine learning, you will find that nearly all common programming languages and physics analysis tools have support for some aspects of machine learning (including MatLab and Mathematica). Nevertheless, the most widely used and feature-rich tools (tensorflow and pytorch) have their best supported interfaces in Python. Interactive notebooks in Google’s Colaboratory platform are cloud-based and present several tutorials to get started. The interactivity and ease of visualization in these notebooks facilitates the frequent training/validation cycles in machine learning.

As with any new approach, starting with modest ambitions on a simplified model of your problem will help you develop an intuition for the effects of your choices of algorithm and parameters. You are entering an exciting world of applied mathematics and statistics!

After obtaining a M.S. in Engineering Physics from Ghent University (Belgium), a Ph.D. in Physics from the University of Michigan, and a postdoc position at MIT, Wouter Deconinck has been on the physics faculty at William & Mary since 2010. In fall 2019, he will be joining the faculty at the University of Manitoba. His research interests are precision searches for physics beyond the Standard Model, in particular through parity-violating electron scattering. This article is based on a talk at the 2018 SESAPS Meeting in Knoxville, TN.