We spoke to a Stanford prof on the tech and social impact of AI's powerful, emerging 'foundation models'
From single points of failure to training and policies, Percy Liang covers a wide range of topics in this Q&A
Interview Even if you haven't heard of "foundation models" in AI, you've probably encountered one or more of them in some way. They could be rather pivotal for the future of not only machine learning and computer science but also society as a whole.
Foundation models are called this because they are the base upon which myriad applications can be built, and issues at the foundation level could therefore have repercussions on the software and services we use.
Typically, these models are giant neural networks made up of millions and billions of parameters, trained on massive amounts of data and later fine-tuned for specific tasks. For example, OpenAI's enormous GPT-3 model is known for generating prose from prompts, though it can be adapted to translate between languages and output source code for developers.
These models – drawing from vast datasets – can therefore sit at the heart of powerful tools that may disrupt business and industries, life and work. Yet right now they're difficult to understand and control; they are imperfect; and they exhibit all sorts of biases that could harm us. And it has already been demonstrated that all of these problems can grow with model size.
What happens if these foundational models play an increasingly prominent role in society, and we can't be sure they're safe, fair, and reliable?
What if they can only be built and deployed by well-resourced corporate giants that prioritize profit above all else? Can this technological upheaval be of any good for us as a whole?
Seeking answers to these questions, The Register spoke with Percy Liang, an associate professor in computer science at Stanford University, about foundation models and what they portend.
Our interview, which has been edited for length and clarity, opens with Liang talking about why Stanford launched the Center for Research on Foundation Models (CRFM), an initiative involving hundreds of academics. (Liang led an academic paper written by those at CRFM that tackled these same sort of questions we had.)
Liang: We started this center about a few months ago, it grew out of HAI [the Stanford Institute for Human-centered Artificial Intelligence].
The focus of this center is on these foundation models, which is this new paradigm for building AI systems. I think it's important to reiterate this paradigm shift, because normally when we think about all these models, like GPT-3, we're drawn to what they can do – like generate text, code, images – but I think the paper really highlights [they] can be useful for a lot of different tasks.
It doesn't solve all the tasks. I want to really emphasize that we call it a foundation because it's just something that can give rise to other applications through adaptation. I think one problem with this, though, is it does create one single point of failure, which means that any problems with this foundational model are going to be inherited downstream.
So I think we have to be very, very careful about building these foundations in a way that [are] robust, fair, safe, and secure. So that when we build things on top of it, the whole system doesn't kind of crumble.
El Reg: At what point did you realize that foundation models were a big paradigm shift? What convinced you that they're here to stay and deserve to be categorised in a class of their own?
Liang: I mean, for me personally, the tipping point was looking at GPT-3 and seeing the type of capabilities that were emerging.
I think it's important to stress that this is not about language; I think a lot of the energy around this happens to be in the field of natural language processing, but quickly we're seeing vision models and models in biology that kind of have these similar characteristics.
I think a lot of the energy around this happens to be in the field of natural language processing, but quickly we're seeing vision models and models in biology that kind of have these similar characteristics
I think it's a general AI, which is why I think we chose the term foundation models: to emphasize, not their technical composition – for which you can use pre-trained models or self-supervised models – but to emphasize their ... role in almost a kind of sociological sense, and that it's not just about language but about AI in general.
El Reg: One of the things that struck me when I read through this paper is that there are a lot of unknowns with these systems. We don't know how they really work or if we can ever really understand them or even fix issues like bias. Some of the solutions that were mentioned in the paper all seem to boil down to having access to these foundation models. This rests on whether their creators are willing to reveal the training data, or the weights or parameters of their model. Foundation models are created by tech companies. How open do you think creators like Google or Open AI are right now? Can researchers build their own foundation models?
Liang: I think openness is a big problem, right. I think the last decade we've seen all these advances because there's an incredibly open culture in deep learning.
But foundation models are another story. And, you know, there's multiple things that are being hidden. Sometimes the code is there, sometimes it's not. Sometimes the data is there, sometimes it's not. Sometimes the models are there, and sometimes they're protected behind an API or are internal. There's a lot of these models which are completely proprietary, that the public doesn't have access to. So, I think it is a problem from the point of view of doing science on these [systems].
I think one problem with this, though, is it does create one single point of failure, which means that any problems with this foundational model are going to be inherited downstream...
I think it's really critical that academia is able to participate. And it's not just about academia, it's also startups. If you think about what happened with search engines, you know it's really hard to make any kind of new search engine now because of this kind of centralization and entrenchment that happens. And if we don't do something about this now, I think we'll end up in a similar position.
El Reg: There still seems to be a disparity. Even if you do build your own kind of foundation models, it doesn't necessarily mean that you can fix the issues in other different foundation models.
Liang: Yeah, that's a good point. One of the things that makes it hard about foundation models is this "emergent at scale" property. If you train a small model, you will just often make the wrong conclusions about what these models are capable of. Because when you get to a certain size, only then can you see it can generate text that's actually really fluent. So this makes it hard to only work at small scales.
That said, I do think that there's a kind of a sweet spot, where you're big enough and you're in this regime where certain phenomena appear. GPT-3 is 175 billion parameters. But at around six to 10 billion parameters, you'll already see signs of some of its behavior, and that's something that we could do work on. Biases creep in even at the smallest model.
So, you know there's productive things you can do there. And then some of these other capabilities you need kind of medium-size models. And then for the biggest models, I really hope that there's a way for academia and industry to somehow collaborate together.
For the biggest models, I really hope that there's a way for academia and industry to somehow collaborate together
Industry obviously has resources but also more importantly they're the ones actually deploying these models into the real world and having social impact on people. I think together we could definitely work out, hopefully, a kind of a better future for these models.
El Reg: The paper says the gap between collectives of developers and companies building these foundation models will grow over time. There are efforts to try and replicate GPT-3, like EleutherAI's GPT-J, but the gap, you think, will grow over time as these foundation models get bigger and it'll be harder and harder to replicate them. Do you think that this is a space that academia can really fill, and if so, how? How can you be competitive with these companies when you have to rely on some of them for computational resources to even build these models?
Liang: Yeah, that's one of the reasons why we came up with the idea of a National Research Cloud. I mean that's exactly the question: how do we have computing infrastructure that is actually geared exclusively for research and public benefit?
El Reg: Well, again, those kinds of government-owned cloud initiatives would still rely on hardware companies that make these foundation models, so it seems like a chicken-and-egg problem.
Liang: Yeah, I agree it's a hard [problem], and I don't have a definitive answer here. I will say that the gap is large, that folks have been able to go up to six billion parameters, not even to the 175 billion parameters, which I guess is GPT-3. And clearly industry is racing off, we're going to hit one trillion parameters soon.
In the short term, there are things that you can do on smaller models that translate to relevant, meaningful implications for large model. For example, characterizing bias. We don't have good ways of exactly capturing exactly what we want. You know we can think about evaluation and that's independent of scale.
Then there's the longer-term thinking, which is: how can you make volunteer computing a reality? So, for example, Folding@Home is this project that harnesses volunteer computing for simulating protein structures for drug discovery. Anyone can hook up your laptop and help out with this, and for a period of time during COVID, it was the world's largest supercomputer.
I think it's hard, but that gives us some hope that appropriately designed systems, if you think carefully about the kind of decentralisation, can succeed. In foundation models there are many more kinds of technical hurdles to overcome. But I think it's definitely worth a shot.
- Twitter's AI image-crop algo is biased towards people who look younger, skinnier, and whiter, bounty challenge at DEF CON reveals
- Turns out humans are leading AI systems astray because we can't agree on labeling
- AI brain drain to Google and pals threatens public sector's ability to moderate machine-learning bias
El Reg: The paper does mention that there are some people in the AI community who believe these foundation models should not be built in the first place because of the potential harms and the sheer amount of energy it takes to train them. Do you agree? Do you think that the benefits outweigh the risks? What kind of future do you see if we continue building these foundation models, and keep trying to study them when we can only know so little about them, compared to deciding we shouldn't build them at all?
Liang: So there's the data creation, curation, training, adaptation, and deployment [process]. I think the question of when to build [should be asked] at each stage. Should you be even collecting this data? I think a lot of the problems have nothing to do with foundation models in particular, but it's about collecting certain types of data from users. So it should be addressed there.
There's also the curation aspect. Each piece of data should be almost tagged with "what am I allowed to do with this?" Should my email data be used to power some machine translation system? I think the question when not to build is almost context free, so it's very hard to answer that question.
Should my email data be used to power some machine translation system?
But I think a question of whether I can use this particular data for that downstream application, I think, is a meaningful one. And I think if we have a standards or norms around what is allowable, then I think, to me, that is kind of the right way to answer these questions, not a kind of categorical no or yes.
There are definitely conditions in which we should not be building certain types of foundation models. And one of the things that's difficult is that you build a foundation model and it can be used for anything. In the paper we talk about this idea of function creep, which is that you build a foundation model, like... CLIP, for example, and actually, well, we can't do face recognition with this, it's not designed for that.
But because it's so general, one of our people at Stanford actually showed that you can adapt it to something the applications hadn't intended. So for that, I think you need for every foundation model there should be ... a specification of characterizing what the properties are but also what it's allowed to be used for.
I think that we're missing this kind of framework of thinking about at each part of the pipeline. You have assets: [at each step, we should answer the question] what are you allowed to do with it, and what [are] you not allowed to [do] with it? If we have that, then I think we can more meaningfully answer it: the when-not-to-code question. There are applications where there [are] benefits and there are other applications which are risky, but I think by having this more nuanced view, you can use the foundation models for certain purposes but not others.
As an analogy you can think about... well, face recognition is definitely very controversial and one of the arguments you can make is that we should just not build these systems. But what about image classification? ... I guess it would be a tough argument to say we shouldn't build image classifiers at all because clearly they have many applications and it's useful [to do so].
We don't have to ban outright image classifiers. We can have constraints on what this technology can be used for. I mean getting all this right is very tricky; I'm not claiming that this is an easy problem. It involves not just the technology, but thinking about policy and all these other things as well. ®