Artificial intelligence, avocados and abstraction

Editor’s note: This is one of the occasional posts we do speaking to someone with something to say on topics that we find interesting. If you are seeing this and are not yet signed up to Deepnews, click the button bellow to start receiving our blog posts every week, and a Digest of quality news on an important subject every Friday.

By Christopher Brennan

What does it mean to “know” a concept for an AI? What does it mean when one puts different concepts together to create something it’s never seen before? How do AIs go beyond the data that they are fed?

The rapid growth of artificial intelligence has led to wondering about how much AIs “know” or “understand.” Whenever an impressive new tool comes out, the machine and its abilities are always compared to human knowledge. This was definitely the case with the unveiling this past week of the cleverly named “DALL-E,” which creates sometimes very nuanced images based on text prompts. It and its companion CLIP were created by OpenAI, famous for GPT-3, which I have mentioned before and can generate paragraphs of text that sound surprisingly human, if not quite always logical. 

If you haven’t read them, the releases linked above are worth the time. If you think that they are too long, maybe I can entice you with the fact that you get to play around with prompts for “avocado armchairs” and look at some fun pictures, which DALL-E is not finding on the internet but is creating. 

If you look through the article from OpenAI you can see that DALL-E seems capable of generating images that look like real world objects, but also more fantastical objects such as the armchairs, by combining different ideas. This of course brings up the idea of the AI “understanding” the concepts, which is of interest to Deepnews as a company whose model is looking for the abstract concept of “quality.” In OpenAI’s language, DALL-E has “geographic knowledge” that allows it to generate the “city streets of China” or the “food of Belgium” as well as “temporal knowledge” as it can generate objects that appear to be from different decades. 

This touches on the general idea of artificial intelligence and abstraction, though we’re not quite at the place where DALL-E can understand the abstract concept of time. This week I had the chance to speak to Ryan Khurana, Research Fellow at the Montreal AI Ethics Institute, who reminded that the OpenAI model had ingested a huge amount of data, of things including avocados and armchairs, that it could then recombine.

“It is the holy grail kind of target for AI to abstract. To be able to do new things and recombine things in that way, it’s a good kind of setup to abstraction. It’s very similar to what human beings doing with abstraction. But abstraction is a much larger set of things, it’s not just being able to recombine text inputs,” he said.

“To me an abstraction is to have only seen a mango, or some other kind of seeded fruit, and be able to, upon seeing an avocado in your test — I’ve never seen it before — be able to see it and to understand its relationship with everything you have seen. It’s an understanding of what classes are. It’s an understanding of what the nature of a fruit is.”

Though DALL-E has only 12 billion machine learning parameters as opposed to GPT-3’s mammoth 175 billion, OpenAI continues to pursue sophisticated AIs through scale, spending large amounts of time and money on the compute power to train their massive models. Some have highlighted how this may lead to breakthrough research being dominated by large companies and the most prominent universities, a “de-democratization” of AI that Khurana covered in a recent piece for Scientific American.

Rather than despairing, however, he finds hope in “less than one-shot learning” written about by researchers at the University of Waterloo focused on vision machine learning. Whereas the big models we have talked about function on more and more data, this approach, well-summarized in this piece from MIT Technology Review, is aimed at using less data, and creating a synthetic dataset that allows a model to pick up features that it can use to identify classes beyond just what it’s been fed. An earlier paper from the Waterloo team created a model that was able to identify the written numbers 0 through 9 after only seeing five examples that blended together different features of the 10 numbers.

Khurana thinks that this sort of approach, which he compares to the way that babies learn, will help researchers in places such as academia stay relevant despite lacking the resources of the biggest companies. It may help address one of the other criticisms of large language models beyond their cost, which is a lack of transparency when they are trained on terabytes and terabytes of data that is difficult to audit. 

“Whereas GPT-3 can perform all these really cool tasks, it’s somewhat opaque to us what relationships it’s picking out, and how it’s performing those tasks,” he said.

“The more you have control over the labels in the dataset, the more you can kind of say, ‘Oh, here is how the number three of the number seven are related based on their physical appearance, etc.’ And I think that helps, in some ways, understand how abstractions are occurring.”