Max Lee - AI Research Insights

My name is Max Lee, and I’m a computer engineer who has extensive experience creating AI, Natural Language Processing, and Computer Vision applications and algorithms.

Problems with Current AI

As a part of my experience, there have been several things that bothered me about the field, including:

Need for millions of samples of data → Sometimes, this point is non-negotiable.
Need for many thousands (or more) passes through the data to train a model → This takes time and energy.
Frequently, and especially for neural networks, a large amount of computation is required for each of those passes.
To top it all off, even after all this effort has been spent, as AI practitioners, we’re often just hoping and praying that something useful comes out of it, treating much of our work as something more akin to alchemy rather than science. Our resulting systems are often more of a collection of inscrutable black box solutions whose failures are not easily introspectable or explainable.
Only megacorporations can afford to build them. (LLMs)

Around the same time that I started working on AI applications, I also did contract work in embedded systems, microcontrollers, firmware, PCBs, single board computers such as raspberry pi, etc… in these compute-constrained systems, AI, especially neural networks, often aren’t feasible… this made me sad, annoyed, and frustrated with the state of things. I want my smart devices to actually be able to be “smart”.

Inspirations from Neuroscience

Being the nerd that I am, I also frequently read neuroscience papers, and as a result I understand that the way in which we’re getting machines to learn things via backpropagation isn’t how brains are learning. For example, a child doesn’t need to see a ball a million times to learn “This round thing that rolls around is a ball.” No, for a child, all of that information about the properties of the thing the child is seeing, the shape, the 3d curves (gradient depth), perhaps some motion of the ball, or maybe even the feel or smell of the ball, all of those signals are being absorbed by the brain simultaneously through different channels and regions, then integrated together into a final singular experience, and placed into a newly created spot for that unique and meaningful thing which is then labeled “ball”... instead of trying to manipulate millions or billions of neural connections, our brains filter for uniqueness or novelty, then apply known combinations properties (round, smooth, small, smell, location, emotion, etc.) to the newly discovered object (which may be a higher level concept or idea), so that this new idea becomes available to be used on its own or as a potential property to be associated with other new learnings in the future.

I realized that most neural networks fundamentally don’t learn this way, at least not at this time. But that did lead me to exploring spiking neural network architectures (such as Hodgekin-Huxley, "Leaky Integrate-and-Fire", and similar) and the math behind them, their behaviors which often look analogous to frequency resonators or frequency dampeners, wherein a single biological neuron can act as both of these roles depending on the source, level, direction, and frequency of different received signals… if you’re an audio or signal processing enthusiast, you can think of it as different collections of high/mid/low pass frequency filters and amplifiers. Or for the physics folks, maybe think of it as a combined system of tensioned springs with many potential input paths. In fact, the properties of these systems can frequently be modeled with electrical or physical analog systems

This initial kernel of biological inspiration then sent me down the path of looking for other examples or expressions that exist within nature.

Inspirations from Nature

I argue that we are all observers of filtering and classification systems every day, we just may not know it. For example, you may be sitting inside a quiet room with a window. How do you know if the wind is blowing outside if you might not be able to hear it? Well, we might instinctually look out the window. When we watch trees and plants sway, we conclude that it's probably windy outside. Depending on how much swaying we see and to what degree, we get an understanding of the strength of the wind. Building on this, we also see that not all branches or leaves move at the same time, to the same degree, or in the same direction. These simultaneous observations are all encoding the current state of the wind; I argue that this collection of different observed simultaneous motions is acting in just the same way as biological neurons' firing frequencies within our brain... at the very least, the idea lent inspiration to me.

Or, for another plant inspired example, I was once laying in the shade of a leafy tree on a sunny day, when looking at the leaves above me, I began to realize that the light which reached my eyes also encoded the current state and history of previous states all at the same time. What I mean is that the number of leaves, their orientation relative to the sun (potentially changing throughout the day), their reflectivity, refraction, combined effects as rays of light pass through them, all of it leaving a mark on the rays that eventually fall down and get collected and bucketed by my eyes. Perhaps to astrophysicists or biologists, this might seem obvious… but I think that for many people working in fundamental AI research, they are not thinking in this way, much to all of our loss.

While on vacation, I began to see that information is also being encoded and represented in the waves and swirls that occur in lakes, oceans, and rivers.For example, how far up a shoreline a wave will break and flow, whether a wave’s energy is deflated or inflated by the colliding forces if incoming and outgoing flow, how objects such as underwater rocks and geography influence them, overall wave duration, etc., all these variables interact and are encoded into the wave, then represented by where and how a wave reveals itself.

So for a non-exhaustive set of examples of encoding and representation of information in nature, we have:

Plants in the wind
Tree leaves and light - clouds, time of day, color of multiple levels of leaves, leaf orientation(s), angle of observation
Beaches and waves - underwater objects/geography, time of day, wind/geoforces

At any moment, the state of these systems is encoding the inputs which have led up to that state. Most of these systems have multiple points of potential observation whose meanings are either potentially equivalent or complimentary to each other. We need not know or even remember what set of conditions lead to the state, but simply observe the state and potential transitions to and away from the currently observed state. (Though that being said, remembering what went in and what the resulting state is, if storage isn't an issue, can be extremely helpful in multiple ways.)

In short, these are naturally occurring dynamically self-encoding systems, which when you start to see them, are present all throughout nature from the very small to the very large scale.

Problems with Natural Examples

One of the main issues that I and others who have worked or experimented with spiking neurons is that they are still computationally expensive to simulate in digital computers, "noisy" if using analog computing, and training them isn’t straightforward (unable to use backprop in most designs). While their dynamics are inspiring, this computational requirement went against my desire for efficient execution.

Similar issues also apply to simulating the flow of particle systems such as in wind and wave models. Additionally, adding in simulated trees for a wind model becomes much more mathematically expensive to simulate, once again, ruling it out. But I still took inspiration from them.

Exploring Possible Solutions

Creating any algorithm requires balancing compute (somewhat interchangeable with “time”) and space/memory.

What neural networks do in large part is to learn very "fuzzy" compression functions

(I can already hear some of you yelling at your screens for this explanation... I know, I hear you... but this analogy seems to work better for more people than not)

, spending all the time, energy, and compute up front in exchange for a hopefully small, problem-fitting function which can approximate an acceptable output. For example, while LLMs might train on many trillions of bytes of data, their model size will be will be several orders of magnitude smaller in size. This is a tradeoff, but as we've seen with the burning of billions of dollars by OpenAI, Meta, Anthropic, and others, it isn't fast (taking months), easy (many failures), or efficient to compute these approximated functions. We use neural networks and backpropagation to try to find this approximated function, so that we don’t need to remember the whole entire “map” between input and desired resulting output, and instead get a model which can "generalize".

It dawned on me that creating a time-efficient AI isn't a computation problem where we just need to keep getting bigger, faster, better GPU clusters (and nuclear plants to run them), but that instead what we really have on our hands is a search, storage, and compression problem. I asked myself:

How can I efficiently compute these physical systems and their various states?
How can I efficiently compress and store them?
After storing these states, how can I efficiently and transparently recall them in order to put them to use for purposes such as for classification, anomaly detection, or generative output such as language models?
Can these representations ALSO be "fuzzified" and still remain useful for memory-constrained situations?

Plinko and Pachinko: A Key Insight

A final piece of insight that I had… Take the games of “Plinko” and (at least in Japan) “Pachinko”. In both games, there is some round shaped game piece (such as a ball or puck) and a game board where various metal pins are placed in different layouts.

The game piece is placed into the top of the board along a continuous line of possible drop points. The piece is then pulled down by gravity, interacting with the game board through friction, collisions with the pins, bouncing, air resistance, etc., until it comes to rest in a certain bucket. What I realized is that from a very simplified perspective, that final bucket where the game piece lands encodes all of the information about the path it traveled, and that by extension, that path was influenced by the pins, friction, etc.. In short, that final resting spot represents everything that the falling object experienced along its journey from top to bottom. It represents the game’s system along which it travelled. One piece of information, the final bucket, could potentially represent the entire state of the system.

Using the concepts of natural systems and “landing spot” representing the state of that system, I got to work exploring possible solutions.

Building the Model: A Six-Month Journey

With the generous support of two friends and colleagues, I spent six months working on the problem, which was one of the most challenging technical and mathematical problems I’ve ever worked on… and probably one of the most intellectually rewarding too. What I was able to come up with was a promising candidate for an alternative to the most commonly used artificial neuron unit that makes up current neural networks.

The way I started was to simply explore and playing with different ideas… except for the constraints of keeping the number of computations down and inference speed up, everything else was open for exploration. I wrote various simulations using a simple game engine in the Rust programming language. Here are a couple examples:

Being able to visually explore an idea was helpful for confirming my understanding and debugging where my assumption fell flat.

This was a little different of an approach that is usually taken in the industry, where practitioners are instead using Python and language notebooks to carry out their experiments… but this approach not only worked for me, but also felt more intuitive, more closely aligning with how I myself imagine my way through the different problem spaces in which I work.

Key Algorithmic Breakthroughs

One of the really hard problems that I had to solve was how to handle any number of dimensions without exponentially increasing the amount of required calculations. While I can’t say in details how I did this due to various patents in progress, what I can say is that the algorithmic solution I found is O(n), where N is the number of datapoints being passed into the model. I even made it so that I could additively stream new points into the model during training while caching previous states so that the learning function itself requires almost no memory. This property is exceptionally useful in applications such as time series problems, anomaly detection, or even language models… basically, many problems that AI attempts to solve involve understanding and making decisions based on sequences or patterns. This algorithm enables that to happen, and very quickly.

Another property that I developed was the ability to learn additional data at a later date, or even selectively forget previously learned information without damaging anything else within the model. Accomplishing similar behavior with a neural network is often difficult to do with unpredictable outcomes. This also has the additional benefit of being able to associate any given state of the model with metadata, such as the source of the training material or potential external logic control.

Finally, the algorithm has the property of being tunable so that it can function in an episodic manner, accurately retrieving learned information without hallucination, or function more stochastically to enable a more “creative” output.

Unlike with neural networks, which require looking at each piece of information in the training data many hundreds or hundreds of thousands of times, creating a model using this algorithm requires only a handful of passes through the training data. which makes development and experimentation a lot faster and cheaper. Finally, the trained models can happily run very fast on a single CPU thread, opening up operation on CPUs that are 20 years old or more. This potentially puts advanced model training into the hands of everyone, rather than only those organizations with billions of dollars to burn.

The results of all this work are a couple of algorithms that successfully met the requirements I was looking for. They are fast to compute, fast to retrieve the results, compressible, highly distributable, optionally fuzziable, and have the ability to introspect into the generated results. Furthermore, these algorithms are usable on any computer hardware made in the last 20 years. Any individual person or small, medium, or large business generally already has all the computing power they need to build models around these algorithms. These models include a wide range of AI approaches, including classification, anomaly detection, and output generation.

Big Claims need Big Proof, right?

I’d like to spend some time going over demonstrations of these different applications in action.

Todo / Potential Applications:

Answer retrieval → As compliment or alternative to RAG
Handwritten number classification with MNIST
Object tracking
Anomaly detection
Text generation