The race for the most accurate and best-behaved language model continues, and while OpenAI appears to be leading with GPT-4, alternatives—and not just open source ones—are advancing. This is demonstrated by Google’s PaLM 2 model, but also by Meta’s (Facebook) LLaMa model, which now has a promising variant called LIMA.
Two very different workouts. As it explains a recent study For Meta, large language models are trained in two phases. In the first one, an unsupervised training is carried out that starts from raw text and that allows learning in a general purpose scenario. In the second, a debugging of these models is carried out and reinforcement learning is applied to align the model to certain tasks or user preferences.
LIME. The latter is precisely what Meta has done by train and launch LIMA (Less Is More for Alingment), a language model based on LLaMa with 65 million parameters and that has been fine-tuned with just 1,000 prompts and responses specially prepared to behave properly. No reinforcement learning or modeling based on human preferences has been necessary, but its behavior has still ended up being outstanding.
Tests. The model has been developed by Meta in collaboration with Carnegie Mellon University, the University of Southern California and Tel Aviv University. Based on the researchers’ tests, LIMA performs fantastically and learns to follow specific response formats with just a handful of examples in its training data. He is even able to generalize well to tasks new to him that did not appear in the training data set.
As good or better than GPT-4 and Bard. In a study controlled by these researchers, LIMA responses proved to be equivalent or preferable to those produced by GPT-4 in 43% of cases. Things got better when compared to Bard (58%) and went even further when compared to DaVinci0003 (from OpenAI) with 65%. All of this “suggests that almost all of the knowledge of large linguistic models is learned during pretraining, and that only a limited set of tuning data is necessary to teach the models to produce high-quality results,” said the authors of the study. .
The RLHF may not be such a big deal. One of the important conclusions of the study is that the use of the Reinforcement Learning from Human Feedback (RLHF) technique does not bring as many improvements as previously believed. In this system, a series of human users reward the model to optimize its behavior as they train it. It is an expensive process that they use in OpenAI to refine their models and that for example used in GPT-4 to improve model performance.
Surface alignment hypothesis. According to Meta, this raises the hypothesis in which the so-called alignment phase after the initial training should focus on teaching the model a certain format or style that it can resort to when interacting with users. So this “tuning” of the model is more about style than substance (more about quality than quantity, one might say).
But. Even so, the LIMA research team highlights that building these data sets with high-quality examples is quite a challenge and is not always a scalable option. Even with these results, LIMA is still somewhat below GPT-4: it generates good answers, but a special prompt that tries to put it in trouble or a bad example in its tuning could lead to not so precise answers.
LLMs lose some relevance. For Yann LeCun, from Meta, LIMA’s behavior shows that investing in the development of new and large LLMs will be important in the short term, but it will not be so in the medium term, “at least not without some big changes,” he indicated. in a recent tweet.
In Xataka | Meta was losing the AI race: it just did a 180-degree turn with the announcement of its specialized chip