A.I. can now use text prompts to design new proteins which don’t exist in nature

January 30, 2023   |   by Chris Kalaboukis

There is so much hype around Artificial Intelligence tools and what they are able to create in the fields of creativity.

One of the most recent examples which is taking the world by storm is by using text prompts, either to create new text or even to create new images.

But I just came across a new research study just published at the end of Jan 2023 which may have a bigger impact than any on the future of the planet.

The study describes a new AI model called Progen, developed by the research team from Salesforce.

Titled: “Large language models generate functional protein sequences across diverse families“, scientists have managed to train a language model to understand the structure of 280 million protein sequences. As a result, you can now ask the language model your request for a new type of protein with specific characteristics or behaviours, and the model will produce a variety of new combinations of amino acids for proteins it thinks will solve that challenge.

Or simply put:

You can ask this AI model to design a completely new protein for you, even if that protein does not exist in nature

As a test of the system, the researchers asked the AI to create several different enzymes which had the properties of lysozyme protein families. The model quickly generated a million different sequences, of which scientists chose 100 to test, and create 5 protein sequences to test in vitro for their properties to defend against bacteria and fungi. Two of the artificial enzymes were able to break down the cell walls of bacteria with activity comparable to a known lysozyme (HEWL), yet their sequences were only about 18% identical to each another. The two sequences were about 90% and 70% identical to any known ‘natural’ protein.

According to a summary of the research by UC San Fransisco:

The AI-generated enzymes showed activity even when as little as 31.4% of their sequence resembled any known natural protein.

The AI was even able to learn how the enzymes should be shaped, simply from studying the raw sequence data. Measured with X-ray crystallography, the atomic structures of the artificial proteins looked just as they should, although the sequences were like nothing seen before.

This means that some of the proteins which the system produced, and which scientists found out worked to solve the challenge the AI was given, were completely different from what is found in nature. The AI had created new proteins, based only on its training of the huge data set and was not constrained by biology.

James Fraser, one of the authors of the work, stated: “We now have the ability to tune the generation of these properties for specific effects. For example, an enzyme that’s incredibly thermostable or likes acidic environments or won’t interact with other proteins.

Now do not think that you can just use this tool to find the fountain of youth or cure for cancer. The text prompt still needs to be in scientific language based on how proteins are described, so you cannot just type in wishes like “give me a protein to make me look 30 years younger and grow 20cm“.

But this is a huge step in the right direction.

By finding proteins which would never occur in nature, scientists will be able to find new ways to tackle complex challenges than would be previously possible through other forms of molecular biotechnology. Proteins that can be used for almost anything from therapeutics to degrading plastic.

They may even find proteins which would have been impossible using the current biology of life, where even a single mutation in a protein can make it stop working.

The model does still suffer from all of the current issues with “creative A.I.s”, especially the fact that it cannot guarantee that what it produces will actually work. In fact, the majority of what it produces might not work at all.

But as a tool in the hands of scientists who can then select which of the proposed new proteins have potential, test those and see which work, it can be powerful tool for the future.

Best of all, the source code for Progen is publicly available, giving access to the next generation of scientists all around the world.