By
Alain Blancquart
March 28, 2023
This article outlines the reasons why exponential AI growth on a global scale will soon become too costly and presents the solution we propose to increase model creation efficiency by a factor of a thousand, enabling either reduced electricity and cooling for equivalent performance, or increased performance for the same electricity and cooling. We will also present some technical details about EvoChip technology and how we will participate in this exponentially growing market.
Introduction
Artificial intelligence (AI) has captivated collective imagination for decades, fueling visions of spectacular progress and revolutionary upheavals. However, behind the rapid advancements and endless promises lie often overlooked realities. One of these realities is the impending collision of exponential demand with slower possible growth in power and cooling. Thus, the current exponential market growth cannot continue indefinitely due to the gap between AI progression and the technology required to sustain it.
The Technological Foundations of AI
To understand the limits of AI growth under current mainstream practice, it is crucial to grasp the technological foundations upon which this discipline rests. Recent advancements heavily rely on deep neural networks and machine learning, leveraging considerable computational power to process massive datasets. While papers are published daily on new training and optimization approaches, none of them are yielding more than incremental efficiency improvements.
The Moore's Law Wall and Limits of Computing Power
For decades, Moore's Law has been the driving force of technological innovation, promising a doubling of computing power at regular intervals. However, this law reaches its physical limits as transistors approach atomic size. The pursuit of exponential AI growth is therefore hindered by the fact that the technology required to fuel this growth cannot maintain its historical pace. For example, according to a study by the International Data Corporation (IDC), global demand for AI computing capacity is expected to increase by over 6 times, by 2025, while total computing capacity is projected to increase only 3 times in the same time.
This will require doubling computing efficiency – a 100% improvement instead of the single digit percentage improvements celebrated in many recent papers. Notably, the IDC estimate focuses on centralized AI, but edge devices (phones, automobiles, embedded devices, sensors, medical devices, etc.) will require far greater efficiency improvements so that a larger quantity of devices can feature embedded AI learning engines.
The Environmental Consequences of AI Growth
The intensive use of computing resources by AI has significant environmental repercussions. Data centers, which power the computations necessary for AI, are among the largest electricity consumers globally. According to an estimate by the International Energy Agency, data centers will account for approximately 3% of global electricity consumption by 2025. Additionally, water extraction and cooling required to keep these data centers running at optimal temperatures can exert additional pressure on local water resources.
Alex de Vries, a member of the VU Amsterdam School of Business and Economics, warns that AI’s growth is poised to make it a significant contributor to global carbon emissions. He estimates that if Google switched its search business to AI, which Google would eventually do, it would end up using 29.3 terawatt-hours per year – equivalent to the electricity consumption of Ireland. (October 29, 2023)
The New York Times estimated that GPT-4 consumed between 51,773 MWh and 62,319 MWh. That’s over 40 times more than what its predecessor, GPT-3, consumed. This is equivalent to the energy consumption of 1,000 average US households over 5 to 6 years. (August 10, 2023)
Increasing Demands for Generative AI
In addition to the increasing demand for computing capacity, the evolution of AI towards more complex domains like generative AI also requires substantial resources. Generative AI, which encompasses techniques such as Generative Adversarial Networks (GANs), deep neural networks, and language models, requires massive amounts of data and computing power to generate accurate results. For example, OpenAI's GPT-3 language model required hundreds of millions of dollars in computing costs and thousands of GPUs to train.
Just how BIG is the AI opportunity we’re talking about here?
McKinsey just came out with a report pegging generative AI’s financial impacts of at least $2.6 trillion annually, with a skyrocketing demand for computer chips.
The computing power AI needs has recently been doubling every 3.4 months.
Georgetown University research confirms that “AI chips” are essential for cost-effectively implementing AI at scale and the success of modern AI techniques simply relies on computation on a scale unimaginable even a few years ago.
According to research firm Gartner, AI chip revenue is expected to grow from only $34 billion in 2021 to $86 billion by 2026.
What must be done?
It is no longer sufficient to gain a few percentage points of technological efficiency; rather, breakthrough technology is needed to push the looming barriers to AI adoption forward by at least a decade.
Reducing hardware requirements by at least a thousand-fold to achieve the same results is necessary. Either one server instead of a thousand to obtain the same results or using the thousand servers to achieve one thousand times greater results. Thus, a complete rethinking of technology was necessary, and that is what we have done at EvoChip.
The only way to deploy AI massively and sustainably is to fundamentally change the way AI models are generated, with hardware that is specifically designed for far more efficient AI modeling approaches. Investment in more efficient cooling technologies for data centers, reducing resource consumption, and exploring renewable energy sources to power IT infrastructure are obviously still necessary, but will not be enough.
EvoChip's solution
Current semiconductors computing stacks contain many layers of abstraction, from transistors to high-level programming languages running on general-purpose operating systems and hardware, plus auxiliary computing devices such as GPUs and hardware accelerators. Each hardware or software layer requires more and more transistors to handle increasing abstraction from the core functionality, as well as translation between layers. While these layers are ideal for general-purpose use in unforeseen cases, they are not optimal when trying to extract every useful unit from a piece of semiconductor silicon.
EvoChip technology involves replacing the current computing stack of a semiconductor (FPGA, ASIC, or GPU) based on neural networks with a new stack developed from the combination of a new mathematical approach and evolutionary algorithms.
This new approach allows for increased efficiency by eliminating many calculation cycles and successive layers of processing, achieving a better efficiency ratio of 1/1000 today.
EvoChip technology, intended to be integrated into hardware environments, has been developed to take full advantage of the features of this environment. It was necessary to develop a software version of the technology for development and debugging of the operating logic in a hardware environment. It turned out that the algorithms are so effective that they offer significant efficiency advantages (which could become another market for EvoChip). Our technology works equally well in a hardware-only, software, or software-accelerated hardware environment.
By using a fraction of the silicon footprint used by current methods, our technology computes at high speed with low energy consumption, delivering performance comparable to current ones, thereby producing better performance per transistor.
For example, it is possible to process over a billion data lines per second on a $300 FPGA card with a 100MHz processor powered by a 9-volt battery.
The following graph represents the amount of FPGA resources required by EvoChip technology compared to the resources required for a simple neural network for the breast cancer dataset from the University of Wisconsin Dataset. On the left is the amount of resource available in a Zinq series 7000 chip on the Digilent Arty Z7 board. As you can see, there are not enough FPGA circuits to compute using a neural network on a single data line per clock cycle, whereas for EvoChip, there are enough to compute about 20 data lines per clock cycle.
Our technology is developed to be available in the form of a modular IP catalog, so that each chip or application can be optimized either for general use or for demanding and specialized use cases. It can thus be easily applied to all scales of AI: Resource constrained environment (IOT), compact quantitative modeling, small or large datasets, large language models, and generative AI.
Features
Some Technical details
We have reached enough accuracy to demonstrate that this technology is unique and creates very strong models a magnitude faster than other existing technologies.
The results shown in these graphs are generated with a very simple approach.
We take a model structure; its behavior is governed by a set of bits which controls things like determining which input variable to read at which point in the model calculation and the behavior of the math inside the model. These bits are then the DNA of the model. Much like manipulating base pairs in a strand of DNA to change the emergent phenotype of an organism, manipulating these bits changes the emergent math of the model.
How about model generalization
To randomly obtain a model, we just take a random sequence of bits, and assign them to the state of the model. We then evaluate the model for each row of data under consideration. You then can measure the effectiveness of the model.
Generally, with all previous algorithms we have worked with, you might get a marginally useful model with respect to the training data, but you won't get a model that is strong enough in test data. You must take the random model and through an iterative process and improvements develop a strong model. This is how the dozen plus commercial implementations of such algorithms implemented so far, behave up to now, and this is how existing technologies in general behave.
We are seeing something very different with the search space created by the combination of unique math and structure. Just by randomly generating models we quickly find one with the 70% or more accuracy in the training data. This happens within a few hundred attempts. That's not doing anything more sophisticated than rolling some dice 200 times. It also turns out that such models behave as accurately on the unseen test data as on the training data. So, you get a model that reliably behaves as expected with new data.
We then take the best model out of 200 models as determined by accuracy on the training data, and randomly change pieces of it to see if they get better or worse. If better, we adopt this model and continue testing random mutations. Within a short period of time, we arrive at a model as "accurate" as the best Neural Networks models of similar scope.
Currently, this generally takes less than 1000 mutations attempts. Though it only takes about 15-20 actual mutations to achieve that model improvement from the original. Due to the algorithm efficiency, this takes only a few seconds or less (in software, much faster in hardware) even for a few hundred thousand rows of data.
Another interesting feature of EvoChip technology. It is a normal market procedure when we model a dataset, to randomize the row order and split it in half, 50/50. The first half is used for model training and the second half is the hold out set for testing. This has always been an arbitrary split, chosen as a middle ground data point. As we are seeing parity between test and training measured accuracies, the natural next question is, do we really need 50/50 split to obtain the best models, or do we need something different? Perhaps we can get just fine with less training data. The fewer rows of data you look at, the less resources you need and the faster you obtain a model.
For example, it turns out that on the 150k rows machine failure dataset, model accuracy and other parameters doesn't start degrading until around 15% training / 85% test split of data between training and test partitions. That essentially makes training a model on 150k dataset, a problem of training on a 22.5k dataset. We have not performed comparative modeling with other algorithms like Neural Networks or decision trees over different split ratios to see how they compare to EVO's behavior. We started running 1000 such tests.
Providing measurable gains
To demonstrate efficiency gains without sacrificing quality, we have delivered a software version of our technology to small companies who have an urgent need to process their data more efficiently, not by just gaining a few percentage points, but by offering the ability to decrease their processing time by at least a factor of a thousand. These users provided evidence-based case studies.
Taking the example of Liquid BioSciences, with whom we have recently started working. This company has been making diagnostic biomarker discoveries for many years over 50 diseases, using data provided by government agencies, life sciences clients, and universities. Currently, identifying the most relevant biomarkers from large-scale RNA sequencing requires an average of two to three weeks using a parallel system comprised of 32 servers. This does not allow the company to conduct more than 10 to 15 studies per year. They conducted initial tests with EvoChip technology and reduced the processing time to a few minutes using only one server. Since this company has retained all treatment histories since its inception, we use them to establish an indisputable comparative status.
We have the first result provided by Liquid BioSciences, using this software only version of our technology.
Liquid BioSciences worked on a client dataset for colorectal cancer. There are 690 patients, with nearly 50 000 variables per patient, a mix of cancer and non-cancer, and the task was to distinguish between these two groups.
They ran this dataset on both their existing process and technology and on the software-only version of EvoChip, which had been delivered to them that same day. EvoChip ran on a 12th Gen Intel(R) Core(TM) i9-12900H, 2900 MHz, with 14 Cores and 20 Logical Processors.
The result is that the software-only version of EvoChip is 3386 times faster than their current solution, already largely over the 1,000X improvement we expected for the hardware version of EvoChip. This means it is quite likely that the hardware version of EvoChip, to be delivered to them in the coming days, will vastly exceed the 1,000X improvement we claim in our company deck (but we want to remain conservative). Here are the details of the test:
As you can see, EvoChip finished 1,000 runs in 4 minutes and 19 seconds, versus one run in 14 minutes and 37 seconds.
This means that a direct comparison would be 14 minutes and 37 seconds for one run, versus 0,25 second per run on EvoChip.
It may not seem like much to save only 14 minutes and 36 seconds, but to complete an entire project Liquid BioSciences typically needs many thousands of runs to find the very best variables and optimize accuracy.
On this colorectal cancer dataset, the project needed around 60,000 runs and took 20 days on 30 servers. EvoChip would have done this in less than 5 hours on one server, or less than 10 minutes on 30 servers in parallel. Liquid BioSciences client needed them to do this project quickly so that they could submit a study abstract for a major cancer conference. Liquid BioSciences did deliver in time, faster than any competitor could have. However, if Liquid BioSciences had used EvoChip to accelerate the identification of the best variables, and then done two days of work to finalize, control and present the results, they would have cut 20 days down to 2 days, which would have given their client more than two weeks to prepare the abstract.
Liquid BioSciences plans to replace their current server infrastructure by a lighter configuration, using the in-hardware version of our technology.
This first version of our tech doesn't involve any hardware modification or redesign; it is showing tremendous capabilities well ahead of currently practiced technologies.