Designing a new way to optimize complex coordinated systems

Coordinating complicated interactive systems, whether it’s the different modes of transportation in a city or the various components that must work together to make an effective and efficient robot, is an increasingly important subject for software designers to tackle. Now, researchers at MIT have developed an entirely new way of approaching these complex problems, using simple diagrams as a tool to reveal better approaches to software optimization in deep-learning models.They say the new method makes addressing these complex tasks so simple that it can be reduced to a drawing that would fit on the back of a napkin.The new approach is described in the journal Transactions of Machine Learning Research, in a paper by incoming doctoral student Vincent Abbott and Professor Gioele Zardini of MIT’s Laboratory for Information and Decision Systems (LIDS).“We designed a new language to talk about these new systems,” Zardini says. This new diagram-based “language” is heavily based on something called category theory, he explains.It all has to do with designing the underlying architecture of computer algorithms — the programs that will actually end up sensing and controlling the various different parts of the system that’s being optimized. “The components are different pieces of an algorithm, and they have to talk to each other, exchange information, but also account for energy usage, memory consumption, and so on.” Such optimizations are notoriously difficult because each change in one part of the system can in turn cause changes in other parts, which can further affect other parts, and so on.The researchers decided to focus on the particular class of deep-learning algorithms, which are currently a hot topic of research. Deep learning is the basis of the large artificial intelligence models, including large language models such as ChatGPT and image-generation models such as Midjourney. These models manipulate data by a “deep” series of matrix multiplications interspersed with other operations. The numbers within matrices are parameters, and are updated during long training runs, allowing for complex patterns to be found. Models consist of billions of parameters, making computation expensive, and hence improved resource usage and optimization invaluable.Diagrams can represent details of the parallelized operations that deep-learning models consist of, revealing the relationships between algorithms and the parallelized graphics processing unit (GPU) hardware they run on, supplied by companies such as NVIDIA. “I’m very excited about this,” says Zardini, because “we seem to have found a language that very nicely describes deep learning algorithms, explicitly representing all the important things, which is the operators you use,” for example the energy consumption, the memory allocation, and any other parameter that you’re trying to optimize for.Much of the progress within deep learning has stemmed from resource efficiency optimizations. The latest DeepSeek model showed that a small team can compete with top models from OpenAI and other major labs by focusing on resource efficiency and the relationship between software and hardware. Typically, in deriving these optimizations, he says, “people need a lot of trial and error to discover new architectures.” For example, a widely used optimization program called FlashAttention took more than four years to develop, he says. But with the new framework they developed, “we can really approach this problem in a more formal way.” And all of this is represented visually in a precisely defined graphical language.But the methods that have been used to find these improvements “are very limited,” he says. “I think this shows that there’s a major gap, in that we don’t have a formal systematic method of relating an algorithm to either its optimal execution, or even really understanding how many resources it will take to run.” But now, with the new diagram-based method they devised, such a system exists.Category theory, which underlies this approach, is a way of mathematically describing the different components of a system and how they interact in a generalized, abstract manner. Different perspectives can be related. For example, mathematical formulas can be related to algorithms that implement them and use resources, or descriptions of systems can be related to robust “monoidal string diagrams.” These visualizations allow you to directly play around and experiment with how the different parts connect and interact. What they developed, he says, amounts to “string diagrams on steroids,” which incorporates many more graphical conventions and many more properties.“Category theory can be thought of as the mathematics of abstraction and composition,” Abbott says. “Any compositional system can be described using category theory, and the relationship between compositional systems can then also be studied.” Algebraic rules that are typically associated with functions can also be represented as diagrams, he says. “Then, a lot of the visual tricks we can do with diagrams, we can relate to algebraic tricks and functions. So, it creates this correspondence between these different systems.”As a result, he says, “this solves a very important problem, which is that we have these deep-learning algorithms, but they’re not clearly understood as mathematical models.” But by representing them as diagrams, it becomes possible to approach them formally and systematically, he says.One thing this enables is a clear visual understanding of the way parallel real-world processes can be represented by parallel processing in multicore computer GPUs. “In this way,” Abbott says, “diagrams can both represent a function, and then reveal how to optimally execute it on a GPU.”The “attention” algorithm is used by deep-learning algorithms that require general, contextual information, and is a key phase of the serialized blocks that constitute large language models such as ChatGPT. FlashAttention is an optimization that took years to develop, but resulted in a sixfold improvement in the speed of attention algorithms.Applying their method to the well-established FlashAttention algorithm, Zardini says that “here we are able to derive it, literally, on a napkin.” He then adds, “OK, maybe it’s a large napkin.” But to drive home the point about how much their new approach can simplify dealing with these complex algorithms, they titled their formal research paper on the work “FlashAttention on a Napkin.”This method, Abbott says, “allows for optimization to be really quickly derived, in contrast to prevailing methods.” While they initially applied this approach to the already existing FlashAttention algorithm, thus verifying its effectiveness, “we hope to now use this language to automate the detection of improvements,” says Zardini, who in addition to being a principal investigator in LIDS, is the Rudge and Nancy Allen Assistant Professor of Civil and Environmental Engineering, and an affiliate faculty with the Institute for Data, Systems, and Society.The plan is that ultimately, he says, they will develop the software to the point that “the researcher uploads their code, and with the new algorithm you automatically detect what can be improved, what can be optimized, and you return an optimized version of the algorithm to the user.”In addition to automating algorithm optimization, Zardini notes that a robust analysis of how deep-learning algorithms relate to hardware resource usage allows for systematic co-design of hardware and software. This line of work integrates with Zardini’s focus on categorical co-design, which uses the tools of category theory to simultaneously optimize various components of engineered systems.Abbott says that “this whole field of optimized deep learning models, I believe, is quite critically unaddressed, and that’s why these diagrams are so exciting. They open the doors to a systematic approach to this problem.”“I’m very impressed by the quality of this research. … The new approach to diagramming deep-learning algorithms used by this paper could be a very significant step,” says Jeremy Howard, founder and CEO of Answers.ai, who was not associated with this work. “This paper is the first time I’ve seen such a notation used to deeply analyze the performance of a deep-learning algorithm on real-world hardware. … The next step will be to see whether real-world performance gains can be achieved.”“This is a beautifully executed piece of theoretical research, which also aims for high accessibility to uninitiated readers — a trait rarely seen in papers of this kind,” says Petar Velickovic, a senior research scientist at Google DeepMind and a lecturer at Cambridge University, who was not associated with this work. These researchers, he says, “are clearly excellent communicators, and I cannot wait to see what they come up with next!”The new diagram-based language, having been posted online, has already attracted great attention and interest from software developers. A reviewer from Abbott’s prior paper introducing the diagrams noted that “The proposed neural circuit diagrams look great from an artistic standpoint (as far as I am able to judge this).” “It’s technical research, but it’s also flashy!” Zardini says.

Tags: Science

Trending Tags

Trending Tags

Trending Tags

Trending Tags

Designing a new way to optimize complex coordinated systems

Read More

Recent News