Transformers

Large Language Models – the good, the bad and what’s next

Do you know Bert?

Created with dreamstudio.ai by Ingo Hoffmann

Not from Sesame Street. Google’s BERT.

BERT was one of the first Large Language Models (LLMs) and a breakthrough for the application of Artificial Intelligence (AI).

In order.

In 2017, Google scientists unveiled a new Machine Learning Model: Transformer.

Transformers are particularly good for processing text. They worked better than many of the models used to date for processing text and speech (Natural Language Processing).

The distinctive feature: it is a “Self-Supervised Learning “ model.

Why is this so interesting?

The breakthrough of machine learning is based on processing a lot of data. In most cases, “supervised learning” models are used. These need annotated data for training the AI models.

For example, many images of cats with the annotation “cat”. The AI system is then trained with many images with and without cats. It gets an image and makes a statement (probability) whether this contains a cat. Through the annotation, the AI system can check if the prediction was correct and adjust if necessary.

For this kind of training a lot of data is needed. I mean A LOT. And annotations are done by humans most of the time.

It is of course costly (time and money) if hundert thousands of images must be annotated by humans (the same is true for translations of texts and all other training data).

The good

Then came Transformers.

In simple terms, these use existing texts to train and control themselves. For example, parts of a sentence are omitted, and the AI system must predict the missing parts. Then the sentence will be used to check if this prediction was correct and to adjust the model if necessary. All without human annotation.

This innovative approach allows for processing massive amounts of data.

Wikipedia, web pages, books, software code, anything can be used.

Because of this diversity, the new models, called Large Language Models, “learn” many things that they have not been explicitly trained to do.

Examples

they can answer questions on all kinds of topics
They can write or complete texts
They can complete programming code

Google quickly recognized the possibilities and in 2018 developed the first model trained using transformers: BERT.

BERT quickly became popular and has been used by Google for Google Search since 2019. Other companies have developed similar models.

OpenAI the well-known GPT (2018) and GPT-3 (2020) models.

Microsoft, Meta and Chinese AI companies have developed their own models.

Stanford researchers, in an August 2021 paper, called Transformer a “foundational model” that has been trained on a broad database and can be adapted to a variety of downstream tasks and applications.

Source: Stanford

Although there is still need for discussion in the community.

I propose that we adopt the term "Large Self-Supervised Models (LSSMs)" as a replacement for "Foundation Models" and "LLMs". "LLMs" don't capture non-linguistic data and "Foundation Models" is too grandiose. Thoughts? @percyliang

— Thomas G. Dietterich (@tdietterich) August 13, 2022

The Economist recently wrote:

Earlier generations of ai systems were good for only one purpose, often a pretty specific one. The new models can be reassigned from one type of problem to another with relative ease by means of fine tuning. It is a measure of the importance of this trait that, within the industry, they are often called “foundation models

This ability to base a range of different tools on a single model is changing not just what ai can do but also how ai works as a business

The application areas are numerous, as a look at some examples shows

Microsoft’s Github Pilot is one of the first commercially successful solutions based on GPT-3 that helps programmers build software.

Meanwhile, images can also be trained with Transformers.

The possibilities of DALL-E and other text-to-image systems like Dreamstudio are amazing.

With a plain text input, images are created, in the style of photographs or artwork, just the way you want them.

The bad

All positive? It’s not that simple.

LLMs use a lot of data and therefore need large computing capacities for training.

Source: Economist

It is therefore expensive to train such models.

The training of the Megatron-Turing NLG 530B model developed by Microsoft and NVIDIA 15 is estimated to cost up to US$100M in computing time.

In the future, even larger models may go as high as US$1 billion in training costs.

Only very large companies our countries can afford this. Most of them come from the US or China.

Training such models consumes a lot of energy. The carbon footprint can vary depending on the location, as a study shows.

Source: Nature

The more models are created and used, the greater the energy consumption and carbon footprint. Sustainability is an issue here.

The models are not transparent. It is often not clear how a system arrives at an answer. They do have a reasoning problem.

The models are trained with a lot of data, but in it there are often a lot of unreliable sources. This means that the answers cannot be relied upon.

It is easy to mislead the models.

They are prone to bias and can represent extreme opinions, as Meta has recently shown involuntarily.

The large models are often optimized for specific languages, especially English and Chinese. Users and applications in other languages are at a disadvantage.

On the other hand, the models can make people feel like they have their own consciousness. This is especially true for use in chatbots, with which people can communicate directly.

Google developer Blake Lemoine sparked a great debate with his statement in the Washington Post that the Google Lambda model had a personality of its own.

There are good reasons to doubt this.

There are also legal issues that need to be addressed.

For example, what about copyright rights when systems like DALL-E generate images that use company logos or other protected representations.

Similar issues exist when these models generate text (or video in the future).

What’s next?

Are transformers the future of machine learning?

Probably not – but certainly an important part of it.

The use cases for Transformer-based models continue to evolve.

For example, protein structures can be discovered using Transformer models, as Deepmind’s Alphafold demonstrates. This can be the basis for many scientific breakthroughs.

There are approaches to address the existing problems mentioned above.

Bloom is one of the most important developments here. International scientists have come together, with support from organizations and government funding, to create an open Transformer model: Bloom.

Not only is it open source, but the training data has also been made open, too.

Stability.ai released Stable Diffusion, a text-to-image generating model like DALLE under an open source license.

These models can be used and further developed by anyone. This means that not only the big companies are able to use this technology.

New chips are being developed specifically for machine learning. This will make the training of Transformer models much more (energy) efficient in the future.

Legal issues are being addressed by the providers of the models, such as by OpenAI

Countries like the UK are addressing these issues to create new frameworks.

Developments such as Bloom, which supports almost 50 languages, or Leam.ai, which develops Large European AI models, address diversity and open access to these models.

Conclusion

Larger Models + More data = better models?

That has been the case so far and will be true for some time.

We continue to have the challenge that only a few organizations and states can develop these models.

But there is also a move toward smaller, more focused models. Open models like Bloom should be encouraged and used more.

Transformers are a fascinating AI technology with many potential applications that we are just beginning to see.

There are challenges in access, quality, transparency, and energy consumption that need to be addressed.

Transformers are an important part of the AI development, but not the only approach for future artificial intelligence.

As Yann Lecun summarizes

“A system trained on language alone will never approximate human intelligence, even if trained from now until the heat death of the universe.”

Try out

Stable Diffusion : Highly recommended

They just released their Dreamstudio to the public and you will get dome credits to create your own artwork.

Try Stable Diffusion to create some stunning images.

Large Language Models – the good, the bad and what’s next

The good

The bad

What’s next?

Conclusion

Recommended Readings

Try out

About The Author

Ingo Hoffmann

Recent Posts

Categories

Newsletter

Transformers

Large Language Models – the good, the bad and what’s next

The good

The bad

What’s next?

Conclusion

Recommended Readings

Try out

About The Author

Ingo Hoffmann

Related Posts

Innovation in the Exponential Age

CyberWehr

Has Europe lost the AI race?

News not Noise

Recent Posts

Categories

Newsletter