Behrooz Tahmasebi, MIT Ph.D. student in the Department of Electrical Engineering and Computer Science (EECS) and affiliated with the Computer Science and Artificial Intelligence Laboratory (CSAIL) – was taking a mathematics course on differential equations in late 2021 when a glimmer of inspiration struck him struck. It was in this course that he first discovered Weyl’s law, formulated 110 years earlier by the German mathematician Hermann Weyl.
Tahmasebi realized that this might have some connection to the computer problem he was experiencing at the time, although the connection appeared, on the surface, to be tenuous at best. Weyl’s law, he says, provides a formula that measures the complexity of the spectral information, or data, contained in the fundamental frequencies of a drum head or guitar string.
Tahmasebi was at the same time thinking about measuring the complexity of a neural network’s input data, wondering if that complexity could be reduced by taking into account some of the symmetries inherent in the data set. Such a reduction, in turn, could facilitate or even accelerate machine learning processes.
Weyl’s law, conceived about a century before the rise of machine learning, was traditionally applied to very different physical situations, such as those involving the vibrations of a string or the spectrum of electromagnetic (black body) radiation emitted by a heated object. . Nonetheless, Tahmasebi believed that a customized version of this law could help solve the machine learning problem he was pursuing. And if this approach succeeds, the benefits could be considerable.
He spoke with his advisor, Stefanie Jegelka, an associate professor at EECS and affiliated with CSAIL and the MIT Institute for Data, Systems, and Society, who thought the idea was definitely worth exploring. According to Tahmasebi, Weyl’s Law was about assessing the complexity of data, and so is this project. But Weyl’s law, in its original form, said nothing about symmetry.
He and Jegelka managed to modify Weyl’s law so that symmetry could be taken into account when assessing the complexity of a data set. “To my knowledge,” says Tahmasebi, “this is the first time Weyl’s law has been used to determine how machine learning can be enhanced by symmetry.”
The paper he and Jegelka wrote earned the “Spotlight” designation when it was presented at the December 2023 Neural Information Processing Systems Conference, widely considered the world’s largest conference on machine learning. It is currently available on the arXiv preprint server.
This work, comments Soledad Villar, an applied mathematician at Johns Hopkins University, “shows that models that satisfy the symmetries of the problem are not only correct but can also produce predictions with smaller errors, using a small number of points (This) is particularly important in scientific fields, such as computational chemistry, where training data can be scarce.”
In their paper, Tahmasebi and Jegelka explored how symmetries, or so-called “invariances,” could benefit machine learning. Suppose, for example, that the goal of a particular computer execution is to select every image containing the number 3. This task can be much easier and go much faster if the algorithm can identify the 3, regardless of the where it is located. placed in the box, whether it is exactly in the center or on its side, and whether it is oriented right side up, upside down, or oriented at a random angle.
An algorithm with the latter capability can take advantage of translational and rotational symmetries, meaning that a 3, or any other object, is not changed in itself by changing its position or rotating it around of an arbitrary axis. It is said to be invariant to these changes. The same logic can be applied to algorithms responsible for identifying dogs or cats. A dog is a dog, one might say, no matter how it is incorporated into an image.
The goal of the whole exercise, the authors explain, is to exploit the intrinsic symmetries of a dataset to reduce the complexity of machine learning tasks. This in turn can lead to a reduction in the amount of data needed for learning. Concretely, the new work answers the question: how much less data is needed to train a machine learning model if the data contains symmetries?
There are two ways to achieve a gain, or advantage, by capitalizing on the symmetries present. The first concerns the size of the sample to be examined. Let’s imagine that you are tasked, for example, with analyzing an image that has mirror symmetry, with the right side being an exact replica, or mirror image, of the left side. In this case, you don’t have to examine every pixel; you can get all the information you need from half the image, a factor of two improvement. If, on the other hand, the image can be divided into 10 identical parts, you can obtain an improvement factor of 10. This type of boosting effect is linear.
To take another example, imagine that you are looking through a dataset and trying to find sequences of blocks that have seven different colors: black, blue, green, purple, red, white, and yellow. Your job becomes much easier if you don’t worry about the order in which the blocks are arranged. If order mattered, there would be 5,040 different combinations to search for. But if all you’re interested in are sequences of blocks in which all seven colors appear, then you’ve reduced the number of elements – or sequences – you’re looking for from 5,040 to just one.
Tahmasebi and Jegelka discovered that it is possible to obtain another type of gain, exponential, which can be harvested for symmetries operating on many dimensions. This advantage is linked to the idea that the complexity of a learning task grows exponentially with the dimensionality of the data space. Using multidimensional symmetry can therefore generate a disproportionate return. “This is a new contribution that basically tells us that higher-dimensional symmetries are more important because they can give us an exponential gain,” says Tahmasebi.
The NeurIPS 2023 paper he wrote with Jegelka contains two mathematically proven theorems. “The first theorem shows that an improvement in sample complexity is achievable with the general algorithm we propose,” explains Tahmasebi. The second theorem complements the first, he added, “showing that this is the best possible gain that can be achieved; nothing else is achievable.”
He and Jegelka provided a formula that predicts the gain that can be obtained from a particular symmetry in a given application. One of the advantages of this formula is its generality, notes Tahmasebi. “It works for any symmetry and any input space.” This not only works for symmetries known today, but could also be applied in the future to symmetries that have yet to be discovered. This latter perspective is not too far-fetched, given that the search for new symmetries has long been a major focus of physics. This suggests that as more symmetries are found, the methodology introduced by Tahmasebi and Jegelka should only improve over time.
According to Haggai Maron, a computer scientist at Technion (the Israel Institute of Technology) and NVIDIA who was not involved in the work, the approach presented in the paper “diverges significantly from similar previous work, by adopting a geometric perspective and employing differential type tools”. geometry. This theoretical contribution provides mathematical support to the emerging subfield of “Geometric Deep Learning,” which has applications in graph learning, 3D data, and more. The article helps establish a theoretical foundation to guide further developments in this rapidly expanding area of research. “
More information:
Behrooz Tahmasebi et al, The exact gain in sample complexity from invariances for kernel regression, arXiv (2023). DOI: 10.48550/arxiv.2303.14269
arXiv
Provided by the Massachusetts Institute of Technology
This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and education.
Quote: How symmetry can help machine learning (February 5, 2024) retrieved February 5, 2024 from
This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.