Seongryong Oh, Ph.D candidate Yoonsung Kim, Ph.D candidate Wonung Kim, Ph.D candidate Yubin Lee, M.S candidate Jiyong Jung, Professor Jongse Park, Professor Divya Mahajan, Professor Chang Hyun Park>

As latest Artificial Intelligence (AI) fashions’ capability to know and course of lengthy, advanced sentences grows, the need for brand spanking new semiconductor applied sciences that may concurrently increase computation velocity and reminiscence effectivity is growing. Amidst this, a joint analysis staff that includes KAIST researchers and worldwide collaborators has efficiently developed a core AI semiconductor ‘mind’ expertise based mostly on a hybrid Transformer and Mamba construction, which was carried out for the primary time on the earth in a kind able to direct computation contained in the reminiscence, leading to a four-fold enhance within the inference velocity of Large Language Models (LLMs) and a 2.2-fold discount in energy consumption.

KAIST (President Kwang Hyung Lee) introduced on the seventeenth of October that the analysis staff led by Professor Jongse Park from KAIST School of Computing, in collaboration with Georgia Institute of Technology within the United States and Uppsala University in Sweden, developed ‘PIMBA,’ a core expertise based mostly on ‘AI Memory Semiconductor (PIM, Processing-in-Memory),’ which acts because the mind for next-generation AI fashions.

Currently, LLMs akin to ChatGPT, GPT-4, Claude, Gemini, and Llama function based mostly on the ‘Transformer’ mind construction, which sees all the phrases concurrently. Consequently, because the AI mannequin grows and the processed sentences turn out to be longer, the computational load and reminiscence necessities surge, main to hurry reductions and excessive power consumption as main points.

To overcome these issues with Transformer, the lately proposed sequential memory-based ‘Mamba’ construction launched a way for processing info over time, growing effectivity. However, reminiscence bottlenecks and energy consumption limits nonetheless remained.

Professor Park Jongse’s analysis staff designed ‘PIMBA,’ a brand new semiconductor construction that instantly performs computations contained in the reminiscence with a purpose to maximize the efficiency of the ‘Transformer–Mamba Hybrid Model,’ which mixes the benefits of each Transformer and Mamba.

While current GPU-based programs transfer knowledge out of the reminiscence to carry out computations, PIMBA performs calculations instantly inside the storage system with out transferring the info. This minimizes knowledge motion time and considerably reduces energy consumption.

As a outcome, PIMBA confirmed as much as a 4.1-fold enchancment in processing efficiency and a mean 2.2-fold lower in power consumption in comparison with current GPU programs.

The analysis consequence is scheduled to be introduced on October twentieth on the ’58th International Symposium on Microarchitecture (MICRO 2025),’ a globally famend pc structure convention that shall be held in Seoul. It was beforehand acknowledged for its excellence by profitable the Gold Prize on the ‘thirty first Samsung Humantech Paper Award.’ ※Paper Title: Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving, DOI: 10.1145/3725843.3756121

This analysis was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP), the AI Semiconductor Graduate School Support Project, and the ICT R&D Program of the Ministry of Science and ICT and the IITP, with help from the Electronics and Telecommunications Research Institute (ETRI). The EDA instruments have been supported by IDEC (the IC Design Education Center).

/Public Release. This materials from the originating group/creator(s) may be of the point-in-time nature, and edited for readability, fashion and size. Mirage.News doesn’t take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely these of the creator(s).View in full here.



Sources

Leave a Reply

Your email address will not be published. Required fields are marked *