Episode 13

Measuring and Visualizing AI (grounding decisions in data with Nestor Maslej)

Technology is really a human problem more than a technical one.

PP_Nestor Maslej_hero

Episode Description

AI is going to affect us all and everyone has opinions about it. But what does the data say?

In this episode of Policy Prompt, Vass and Paul welcome Nestor Maslej from Stanford University’s Institute for Human-Centered Artificial Intelligence, where he is the research manager of the AI Index and the Global AI Vibrancy Tool. In developing tools that track the advancement of AI, Nestor hopes to make the AI space more accessible to policy makers, business leaders and the lay public. Nestor discusses the excitement and fears surrounding this fast-moving technology and the importance of quantitative data in AI myth busting. “At the Index, we really feel that to make good decisions about this tech, whether you are in a boardroom, in a Parliament, or simply sitting in your living room, you need to have access to data and you have to actually understand what is going on with this technology.”

In-Show Clips:

Mentioned:

Further Reading:

Credits:

Policy Prompt is produced by Vass Bednar and Paul Samson. Our technical producers are Tim Lewis and Melanie DeBonte. Fact-checking and background research provided by Reanne Cayenne. Marketing by Kahlan Thomson. Brand design by Abhilasha Dewan and creative direction by Som Tsoi.

Original music by Joshua Snethlage.

Sound mix and mastering by François Goudreault.

Special thanks to creative consultant Ken Ogasawara.

Be sure to follow us on social media.

Listen to new episodes of Policy Prompt biweekly on major podcast platforms. Questions, comments or suggestions? Reach out to CIGI’s Policy Prompt team at [email protected].


60 Minutes
\
Published March 24, 2025
\ \

Featuring

PP_Nestor Maslej

Nestor Maslej

Chapters

1 0:00:00

Welcome to CIGI’s Policy Prompt

2 0:00:35

Introduction to Nestor Maslej, research manager of the AI Index and the Global AI Vibrancy Tool at Stanford’s Institute for Human-Centered Artificial Intelligence

3 0:02:04

A peek behind at the curtain at how Nestor got into this work

4 0:05:29

Diving into the AI Index: the work of grounding perceptions about AI in data

5 0:07:28

Are there notable shifts the AI Index researchers have seen since they began it?

6 0:10:54

How does DeepSeek shift the dynamics of AI competition?

7 0:16:03

Scoring and ranking AI: moving beyond “benchmarking”

8 0:21:00

Closed-source versus open-source AI? The gap is narrowing

9 0:25:17

Beyond infrastructure: what it might take for a country to lead in AI

10 0:34:40

Thinking of AI as an energy rather than as a product

11 0:40:53

AI technologies and the Jevons paradox

12 0:44:08

Is scaling going to continue leading to better progress?

13 0:47:03

The genius of the human brain

14 0:49:54

The dawn of a new golden age or the end of civilization? Nestor’s work as “AI myth buster”

15 0:54:32

Final thoughts: technology is really a human problem more than a technical one

16 0:55:42

Debrief by Paul and Vass


Vass Bednar (host)

Honestly, I am getting a bit overwhelmed with all the reports you send me, specifically on AI. I feel like I'm always playing catch up and there's a huge fire hose, and I'm never kind of done learning. It's a very noisy space too.

Paul Samson (host)

There's nothing like an AI report in the morning, but I'm pretty sure it's not just me or are you sure it's me? Is it only me?

Vass Bednar (host)

You've got your own folder in my inbox, Paul. I'll put it that way.

Paul Samson (host)

Okay. Well, that report, that last one I sent you is really worth checking out the Stanford AI Index, which I know you have.

Vass Bednar (host)

Yes. It was an early email from you, but happily I read on, and this index is so packed with information, I mean, as you would expect from an index because it's integrating, but they've now introduced this AI Vibrancy Ranking tool. It's pretty fun. You can compare countries and you get to set your own weighting measures and parameters. You could decide that R&D is more significant to you than infrastructure, or even discount or up vote kind of responsible AI policies. So I appreciate the dynamism of that complementary tool and the index covers all sorts of stuff. We'll get into it with Nestor, but technical advancements in AI, the public perceptions of the technology, government policy, and the geopolitical dynamics surrounding its development. So there's a lot there.

Paul Samson (host)

Yeah, so today we have the perfect guest for that, Nestor Maslej himself, who's the research manager at Stanford for the AI Index and the global AI Vibrancy tool, which just came out in its first iteration this year.

Vass Bednar (host)

Welcome to Policy Prompt, Nestor.

Nestor Maslej (guest)

Thanks for having me, guys. Super excited to be here. Very kind introduction to the AI Index. Of course, I have to get excited about it because my job, but it's nice to see other people in the space appreciating it almost as much as I do.

Vass Bednar (host)

Paul was telling us a little bit about you, but I'd like to hear more because you and Paul have met each other and you and I are just meeting now. I've met you through your work. You grew up in Toronto, you studied in Boston, and then you ended up in Silicon Valley where we're reaching you now working on AI issues. Could you just give us a peek behind the curtain and tell us a little more how that journey unfolded for you?

Nestor Maslej (guest)

Well, I think I'd always been interested in doing things in the academy. And I think when I was in undergrad, I think the topic that I found to be really interesting was economic development, and that's kind of what I studied when I was an undergrad. I did a thesis on economic development in Eastern Europe. I went to do my master's. And being from Canada of course, I had this Canadian perspective, wanted to understand some of the developmental issues that we faced back home with residential schools and things along those lines. But while I was in my master's, I think I started to read a lot more about transformative technologies. And it was, I think around this time that I found myself perhaps getting a bit dissatisfied with my own relationship to social media, which I then kind of sought to be one of the main technologies of the time.

And kind of looking ahead, it dawned on me that even then AI was going to be this thing that was on the horizon, and I felt more and more inspired to do work in that space and to study what exactly was going to go on with that technology. So when I wrapped up my master's, I applied for this job at Stanford and it was kind of right place, right time in combination with, I think, the background that I brought because it was a year or so before ChatGPT. And the AI Index, which you guys referenced, was already, I think, established in 2017. So it kind of had been around for a while, had built a reputation and then ChatGPT came out and all of a sudden everybody was interested in learning about AI.

And it was funny for me because as part of this job, I speak to a lot of policymakers and business leaders, and I remember sending some kind of outreach emails before ChatGPT to some of these policymakers and they would always say, "We're happy to chat, but we have other things to worry about." And then as soon as ChatGPT came out, all of a sudden these guys were kind of back in the inbox saying, "Hey, actually, do you have time to do that briefing?" So it's been interesting to see how the broader landscape has reacted to the technology. And it seems kind of... Even though we're kind of two years out from the launch of ChatGPT, there's still a lot of interest in engagement in the tech and the discussion audit has definitely evolved.

Paul Samson (host)

So that's great. So you're a Canadian living in the US. We won't ask you about tariffs even though it is January 31st, 2025 as we're recording this, but we're going to dive in more on the Index now. The Index, its objective is to really do quite a broad thing of tracking and collating and distilling, visualizing data. And there are a lot of cool visuals in there. It's theoretically completely unbiased. It is vigorously vetted. There are broad data sources that are all well documented, et cetera, and it's developing a bit of a nuanced understanding of what's going on. What was it that was missing before around AI, this index has filled a gap? How would you describe what that gap was and what this index is doing to fill it?

Nestor Maslej (guest)

I mean I think just kind of fundamentally, AI is one of these things that touches a lot of different verticals in society. Whether you're a policymaker, business leader, maybe perhaps even an artist that's wondering what your career is going to look like in the future. AI is going to affect us all and everybody has opinions about how this technology should be managed, governed, and dealt with. And I think that really fundamentally speaks to the fact that in a lot of ways, technologies are less about technical developments and more about how people react then to those technical developments.

And I think at the Index, we really feel that to make good decisions about this tech, whether you are in a boardroom, in a parliament, or simply sitting in your living room, you need to have access to data and you have to actually understand what is going on with this technology. And I think when the index was established, there was just this feeling that there were a lot of narratives about the tech, what it could do, how it could get better, but not necessarily a lot of quantitative data that could really help you understand how much more is the business world thinking about this? How much better is the technology actually getting?

Paul Samson (host)

Baselines?

Nestor Maslej (guest)

Exactly. Baselines that allow you to do that evaluation. And I mean, it's funny, we've been around for seven years, but I don't think that that kind of ability to dispel narratives has completely gone away in that you look at what happened earlier this week. DeepSeek launched, that was a big piece of AI news. And it seemed it kind of affected the world in a very kind of narrative, emotion driven way where there was a very aggressive market reaction then perhaps a bit of a bounce back.

Paul Samson (host)

Big time. And we're going to have questions for you on DeepSeek for sure, because there's a lot to unpack there.

Nestor Maslej (guest)

And it just speaks to the fact that everybody has takes on this technology because it is going to be so important. And I think just we at the Index feel that we need to ground these takes in some data and in the kind of reality of what statistics actually tells us.

Vass Bednar (host)

You mentioned you've been around, the Index has been around for seven years now, right? It's the seventh edition. I'd love to kind of ask you a two-part question, which does feel somewhat illegal as a podcast host, but one, I'm curious what shifts you and the team have seen that maybe really feel notable over time with the Index outside of geopolitics and governance? And then I'd also love to know was there anything that really surprised or charmed you where you sort of went back and really wanted to check the numbers and the inputs? Was there something that either felt counterintuitive as a finding or just maybe really delightful or intriguing?

Nestor Maslej (guest)

I think both really good questions. I mean, I don't know about podcast decorum. It seems legal to me to ask two questions. So I'll kind of... First question, I think for me, the big narrative if you kind of look at things in the last five years is AI has really moved from being an academic technical problem to more of a societal one, especially in the business world. And I think what I mean by this is that certainly if you went back a decade, even five years ago, it's really seemed like AI was still something that existed in the labs of universities and a few kind of small technology companies. Where there were these questions like, can AI classify images? To what degree can it understand text? To what degree can it understand speech? But it really seemed like we were kind of dealing with this as an idea removed from how it impacts the world.

I think fast-forward now of course we still want to know how much better is the technology getting? What kind of architectural improvements can we possibly make? But in a lot of ways, the question now is more about how are businesses going to use it? What are policymakers going to do to respond to it? How are people going to feel about the technology? So I think it's kind of the thing that I'm really seeing or that I've really noticed is that we've moved away from this purely kind of technical orientation on the problem and started to think about it a lot more in societal terms. And I think really that's probably for me going to be the next kind of big thing to look at because people very often ask me, when are we going to get agents?

How much better is the tech going to get? And to me, I kind of say, of course the tech will get better, but in some ways that's not really the point. That the thing that I'm really interested now and understanding is our business is going to use this. Is this going to be a tool that is going to drive a lot of productivity gains? That kind of shift in narrative is something that I'm definitely seeing.

Vass Bednar (host)

I mean for us in Canada, we know that our firms historically do not adopt these kinds of productivity enhancing tools, even that AI adoption is low. So that's kind of an ongoing policy question here. But Paul, I jumped in on you.

Paul Samson (host)

No, just to add to that. When you look at the productivity gap right now between the US and the European large economies and Canada and Japan, it is so striking how the US productivity has really taken off in the last five plus years, and the other ones are stagnating and people are trying to figure out how much of that is tech driven or data driven or AI driven, and it's really hard to unpack those numbers, but there seems to be something going on there. But let's go, you talked already about DeepSeek.

Clip

What took Google and OpenAI years and hundreds of millions of dollars to build, DeepSeek says took it just two months and less than $6 million.

Paul Samson (host)

And so I think there's a whole bunch of issues we could unpack there that are perennials in a way, and yet there's the DeepSeek immediate reaction of the market and all kinds of things, but there's some deep issues for DeepSeek that is about how much of a disruptor is this to the system as a whole? The AI system that had a number of assumptions embedded into the way we frame things about energy use and how much compute power you need and all that. So how does it shift the dynamics of AI competition? Maybe we can start with that. Does it shift things? Do we know... There's some things we're still trying to figure out what's exactly happened for sure.

Nestor Maslej (guest)

Well, for me, I think I wasn't as surprised by the DeepSeek news maybe because I had been thinking about where the AI space had been going before and it seemed to me in a lot of ways a continuation of a lot of trends. I mean, in fact, we're kind of in the process of putting together next year's report. So literally last week before this all happened, I was reading the very paper that people commented on and it struck me as an impressive result in a noteworthy one, but I didn't think it would lead to a massive stock market sell off. And I guess kind of what I mean is... I think what DeepSeek represents is this kind of threat that now this AI LLM capability is diffusible in some way that really a Chinese startup lab is able to create a model with less hardware that is just as good as some of the things that American ;abs can do.

I think really at its core, that's what the issue is about. Of course there's broader questions about export controls and things along those lines, but when you look at the result, you really kind of see that this kind of mode that perhaps some of these Silicon Valley companies had around AI doesn't really seem to be existing. I mean that's something if you're following the space that you would've noticed even six months ago, a year ago. To me, one of the more interesting and noteworthy AI results in 2024 was when Elon Musk's xAI company, I think they launched their LLM Grok-2 and it scored almost as well as some of the leading language models like those from OpenAI and Anthropic. And I think for me, the reason that that was really significant was because Grok had been around for a year, or xAI had been around for a year.

And basically in that time, they were able to develop a model that scored as well as those of OpenAI and Anthropic. Already I think we were seeing that this kind of lead that some of these established companies had wasn't kind as prominent as perhaps we would've imagined. Now there's this kind of second question of hardware efficiency, I think that was another big story with DeepSeek. They needed a lot less GPUs and a lot less hardware to get their results, but also we had been seeing a lot of evidence that algorithms have become increasingly more efficient. And I think what you're seeing in AI is kind of two things where at the frontier, these developers are still trying to scale these models in as much data as possible, but also you have a lot of distilled small models that are rising as well.

But the last thing I'll kind of say about DeepSeek is that to me, we're kind past the point of, "Oh, this model is very technically good, therefore we should think that it's going to disrupt the market." There are many technically good models that are out there, and I think really what you see on these leaderboards is five or six models that are pretty neck and neck, but the question for me isn't really about which one is the best, but rather which one our business is going to start using. And that isn't really as much of a technical question as it is perhaps a question that relates to other matters.

Paul Samson (host)

And what about trust? Which ones do they trust the most?

Nestor Maslej (guest)

For me, that was a really big thing with DeepSeek. That it's coming from a Chinese based company and you saw some of the prompts where you would ask it about some of these controversial issues-

Paul Samson (host)

Tiannamen Square.

Nestor Maslej (guest)

Related to the history of the Chinese party, and it wouldn't kind of give you the answers that you would want. And it's good if you have a model that's high performing, but if it kind of operates in these different sociopolitical spaces and according to different norms, that could really impact the degree to which other companies feel willing to actually use this technology.

Vass Bednar (host)

Policy Prompt is produced by the Centre for International Governance Innovation. CIGI is a non-partisan think tank based in Waterloo, Canada with an international network of fellows, experts, and contributors. CIGI tackles the governance challenges and opportunities of data and digital technologies. Including AI and their impact on the economy, security, democracy, and ultimately our societies. Learn more at cigionline.org. We have our videos on, but this isn't a video podcast, so I think you saw my face when you mentioned Grok-2 being a model that sort of brought that element of surprise.

Maybe you could speak a little bit more about the scoring and ranking system and also why it's so outstanding that Grok in that sort of period of time is able to be a contender. I think for our listeners that would be useful, but I'm pretending it's for the listeners, but it's also for me.

Nestor Maslej (guest)

No, it's a great question and I think there's a broader thing about... This is also I think one thing that I like to emphasize when I talk about AI. I think the way in which we evaluate AI systems to a degree is not broken, but it needs to be thought of in a different way. So when I talk about scoring, the way in which a lot of these models are evaluated now, and you saw this with Grok, you saw this with DeepSeek, is they're tested on different benchmarks. So benchmarks are. For all intents and purposes, tests of AI model capabilities. And the reason we benchmark AI models is because 10 years ago when the deep learning revolution started becoming a thing and you wanted to know just functionally if a model could classify an image, could it tell me this is a cat versus this is a dog?

You needed some kind of basic level test to do that. And that's how we got ImageNet, one of these foundational benchmarks created by Dr. Fei-Fei Li. And since then we've had benchmarks introduced in a lot of other domains and they're very useful for actually gauging what AI is able to do from an intellectual perspective. And now the kind of reality is when a new company launches a model, they'll typically launch it and they'll see how well it scores on all these benchmarks and they'll say, "Our model does 2% better than the current or the previous state of the art on this benchmark of grade-A mathematics." And they won't say this explicitly, but there's this implication because we score super well on all these benchmarks.

Your company should be building with us or you guys should be using our model. Historically, benchmarks have been one of the main ways in which we've evaluated these systems. Increasingly we've also seen more human or let's say democratic modes of evaluation, things like the chatbot arena leaderboard where people can go on these online platforms and rank the outputs of different models. But the reason I say I think we need to move beyond benchmarking and think about a different way to look at the models. We're kind of evaluating what models can do in very narrow academic settings. As in when a company tells you that a model scores 95% on a benchmark of grade-A mathematics, I mean how many businesses are doing grade-A math, not that many.

Vass Bednar (host)

I don't want to know the answer, but...

Nestor Maslej (guest)

Exactly. And if you've actually used these models, you'll find that they can behave very differently for very different tasks. And there is a lot of what humans do in business and economic settings, whether it's being persuasive, knowing how to craft a good email, knowing when to send an email, knowing how to negotiate. Those are things that are very difficult to capture in a lot of these academic benchmarks. So I think on the one hand we were getting models that are smarter and smarter and smarter on a lot of these academic tests. And you could probably say now that the best AI systems could beat most humans, if not all humans, on competition level mathematics or advanced reading comprehension. But there's a lot of intelligence in, for example, cracking an egg or just even eating a sandwich.

Or working with one of my favorite examples, there are these construction companies that kind of would advertise on their scaffolding. They would say, "Hey, ChatGPT finished this." We don't think of construction workers as intelligent workers. We think of that as lawyers, doctors, whatnot. But there is an in intelligence that goes into even building a building and we haven't captured that intelligence in AI. I think to summarize this rant, it's just me saying that the models have gotten a lot better when it comes to these kinds of intellectual assessments, but I think we need to ask ourselves, what does it really mean to be intelligent and what kind of intelligence are we capturing when we talk about what these systems are able to do?

Because there is still a lot of intelligence that is really only human and that it seems that AI systems are very far away from actually achieving.

Paul Samson (host)

So the measurements might need to go a bit broader sometimes than even what the indexes are really focusing on where some of those softer or less clearly identifiable forms of intelligence. I wanted to stick with this theme a little bit of DeepSeek and other companies. They weren't alone with this idea of we can do cheap, good, fast, low energy. The other issue that came out was open source versus closed source. And you mentioned it briefly, but that feels like one of the big strategic issues that's going to be playing out in the US with Meta saying, "It's all about open source," and OpenAI saying, "No, it's got to be closed."

And the implications of that is do you feel like that is coming up? Maybe you could say a word or two about open source, but also do you feel like that's going to come to a head in the US as a big strategic question?

Nestor Maslej:

Definitely. And I think this kind of also speaks to an earlier question that was asked about what's kind of counterintuitive about AI. And I think to me it's just how uncertain the business dynamics are around this technology. As in I'd be willing to make a bet that in five or 10 years AI is going to be ubiquitous, but whether it's going to be OpenAI's AI or Meta's or DeepSeek's, I really don't know. And this kind of rolls in nicely to this question of open versus closed. For the listeners that are perhaps unfamiliar, typically you can launch models according to different releases or different gradients of release. So when you launch a model in an open weight way, you fully release the weights. The weights are these kinds of parameters that models learn when they train.

And because you fully release the weights, you can kind of control the model locally, you can modify it fully. You could do what you want to. It's so much better from a control perspective, and you could also release it in closed ways, which means that you typically access the model through an API. As an example, OpenAI's GPT 4o or Anthropic's Claude 3.5. You would call them limited models because you have some access to them, but you don't actually get the weights. You only access them through APIs. And there are kind of debates as to which approach is better. The proponents of open weights say that it's better for transparency, it's better for the innovation ecosystem. And the proponents of the kind of more limited or closed space, they seem to say that because AI is a very dangerous technology, we can't be simply letting it into the hands of a bunch of other people.

Now, when we wrote the Index in 2024, one of the big takeaways was that open weight models were fairly behind closed weight models. So if you look at all these benchmarks, MMLU, MMMU, GPQA, Chatbot Arena Leaderboard, on virtually all of them, the best open model, which was Meta's Llama was pretty far behind the best closed models. That gap has really narrowed quite profoundly this year. And one of the big reasons it's narrowed is you've had these model launches like DeepSeek, like Llama 3.3, where these open systems are now very closed in terms of performance to these closed weight systems. And I mean that has a lot of very interesting questions or raises a lot of questions about where is the landscape of AI business going?

Because what you functionally have is you have a guy like Mark Zuckerberg who's giving out this model for free, very high quality, and I think it's a very interesting business bet because he's basically saying, "We don't need AI revenues right now. We think AI is going to be a ubiquitous tech in a few decades. So we're almost willing to go scorched earth and give out this technology for free." Perhaps hoping that businesses and institutions build their AI infrastructure on Meta and perhaps in the process put OpenAI Anthropic in some of these other competitors out of business. We're actually going to see if it works or not. But I guess it's been surprising to me how much better these open weight models have gotten.

And I think it really puts a lot of questions into, what is the landscape of AI from a business perspective going to look like in a year two, three? I mean, we already saw some of these... Even a year ago, there were some of these kind of marginal, let's say AI LLM developers like Inflection or Aleph Alpha from Germany that tried to get in the foundation model race, but they dropped out because they just couldn't keep up with the big dogs. And you wonder if perhaps this phenomenon might at some point hit some of the other startups that are also in this space.

Paul Samson (host)

Totally.

Vass Bednar (host)

We've been talking about models and geographies. Maybe we can also talk a little bit about the infrastructure that helps kind of underpin them. Earlier this month, a month of recording on January 21st, 2025, President Donald Trump announced the launch of the Stargate project, which does make me think of the movie and I suppose it's intended to. It's a joint venture involving OpenAI, SoftBank, Oracle and MGX, and this initiative plans to invest up to $500 billion over the next four years to develop AI infrastructure in the United States. With the ambition, and this is sort of from the announcement itself, to bolster the nation's leadership in AI and create over 100,000 jobs. Why is the US government investing so heavily now in this kind of infrastructure? And did that announcement sort of surprise you?

Nestor Maslej (guest)

No, I think it didn't surprise me in that I had a hunch or a sense that the Trump administration would take matters of AI quite seriously. I think the question was what kind of elements was it going to prioritize? But I think what it kind of more broadly reflects is that we're kind of now at a moment where a lot of countries are very aware that AI is important geopolitically, and they're almost kind of trying to plant their flag and state claims as to where exactly they're going to take AI. And it's a funny landscape to juxtapose with even a decade ago or so, Canada was actually the first country that launched an AI strategy in 2017, but back then it just seemed like a nice thing to say, "Oh, we have a national AI strategy." Now it seems to be much more-

Vass Bednar (host)

We repeat that fact a lot though in Canada. We're always like, "Canada was the first, did you know?" It's a fun facet.

Nestor Maslej (guest)

Well, now it's a concern. I mean, according to our rankings, it's dropping a little bit and I don't think that's necessarily because Canada has fallen behind as much as it is because other countries have really put the foot on the gas. And I think that's interesting to me. All these countries are kind of acknowledging this is an important thing and they're approaching it in somewhat different ways. I mean, there are some commonalities. I think a lot of countries like the UK, Canada, they've all made infrastructure investments partially because you need the infrastructure to power the creation and the building of these models, but countries have tried to position themselves in different ways sometimes with varying degrees of success.

I think interestingly in the United Kingdom under the Sunak administration, it seemed like they were trying to really lean into this kind of safety existential risk angle. And it seems that since the Labour Party has taken over, they've de-prioritized some of that AI angle and they're still looking at AI, but from a different perspective. So I think it's interesting to see how countries are acknowledging this is important, but looking at it in different ways. And of course you need the infrastructure to do good things with artificial intelligence, but I think getting your country to be kind of dynamic in this space isn't just about having a lot of GPUs.

There's a lot of other things that you need to do to put your country in a good position and ensure that it could have leadership of this technology.

Paul Samson (host)

So I have to say on the Stargate project announcement, Nestor was not surprised, but Elon Musk was because it didn't mention xAI and it mentioned the rival OpenAI, for example. But the point I want to make is about allies of the US like Canada, France, UK, Japan, take your pick. Is this another IRA, Inflation Reduction Act, kind of America first, all these investments are internal?

There's a fear out there a little bit that there's a very insular view to all this. So I think the reaction to that has been mixed from outside of the US and some head-scratching from within as to how these partners were picked. So it's more of a comment than a question. Are you hearing any of that? Is there kind of... Elon Musk came out to be clear and said they don't have the money.

Nestor Maslej (guest)

I think when I said I wasn't surprised, it was more so that I still thought that the Trump administration was going to invest in AI. I think the kind of details of the partners that they picked is interesting to see who that actually was. I mean it seems like now for a while there's kind of been this consensus that there is a bit of a race with AI. You can kind of go as far back as the CHIPS act in 2022.

Clip

The CHIPS for America semiconductor bill right now on Capitol Hill. Legislation which has been several months in the making, would set aside more than $50 billion in subsidies for chip makers in the US.

Nestor Maslej (guest)

When there is this sense that we need to safeguard this resource that we have. And it is tricky because there are certain people in the AI community that feel that when you think about AI, the biggest risk is this kind of existential question of this superhuman technology being developed and they would really emphasize the kind of need for international collaboration. And I suppose we've had some of that. I think the Paris AI Action Summit is going to be literally in a few weeks. So there's been a bit of that, but it does really seem like countries are trying to advance some of their own capabilities, and maybe that just kind of speaks more to the uncertain geopolitical environment that we live in right now.

Paul, you had joked about tariffs at the beginning, but when you're kind of in a world where tariffs like that can kind of come from whenever, wherever. You're, as a country, I think much more prone to be thinking about how can you do things yourself first rather than perhaps working with others. And you can discuss the degree to which that's kind of positive or negative, but the vibe has certainly shifted in that way. And I think countries are starting to react like that as well.

Paul Samson (host)

It's a fair point for sure. There's a bit of a panic everywhere about are we in the action or are we where we need to be? And we haven't even talked about the poor countries out there that are just like, "Wow, we have no access or control whatsoever on what's going on."

Vass Bednar (host)

To that end, just to sneak in a question, I was curious about how picking up on different political environments, curious about how the index factors in very closed systems like North Korea, Russia, Israel, maybe others where these countries have very small but highly sophisticated technological development or maybe their AI work is more for militarization rather than being commercial in nature. I wondered if you had maybe a quicker answer for us on that.

Nestor Maslej (guest)

It's a good question. I mean, it's always a challenge for us because we try to make our report as global as possible, and a lot of the data sources that we have, the coverage isn't going to be as good for a North Korea or perhaps or Russia as it will be for the United States. So I think where appropriate, we do try to caveat that, but we do have a variety of different data sources that could look at things in different ways. So for instance, when you talk about what countries are leading with AI research, one of the metrics that we look at is where are these foundation models coming from? And we tend to collect information from media reports, academic reports.

Of course, if you're looking mostly at Western sources, which we try to cast a wide net, you might miss some of these discussions, but we also have figures on how many AI publications are different countries releasing. And there I think the data tends to be a bit more internationally representative. So we do try to cast a wise net, but one of the things that would also note with AI, and this is kind of one of the key takeaways that we had in the report. It really is becoming very expensive to build these AI systems, whether or not the hardware is getting efficient and it is getting more efficient. At the frontier, these systems are still costing in the kind of millions, tens of millions, hundreds of millions dollars to train.

And that really means that almost the kind of be at the cutting edge of this research, you need to be in an environment where you have access to that capital, which means that you need to have a commercial ecosystem that can support the development of those kinds of models. And I feel like we have a pretty good handle on those kinds of ecosystems and tracking those as well. But really if you look at where a lot of these models are coming from, and this is a big thing that we talk about at the index, a big thing that \[inaudible 00:35:53\] also does in terms of its academic advocacy work. The majority of these systems come from industrial players and that's just because they cost a lot of money to make.

That kind of opens in entirely different can kind of worms as to where the discussion is and what we need to do from a policymaking perspective. But we do try to look at the landscape as broadly as we possibly can.

Paul Samson (host)

Nestor, the next question we wanted to get into a little bit is one that all institutions, certainly ones that are think tank kind of things or research institutions that are struggling with. And that is how do you use AI tools now in the appropriate way? And I don't know if there's a Stanford kind of guidebook or if your institute has taken one on, but how are you using AI now to help build some of these things or to compare or check and how might that be evolving? Is business practice going to drive us there or are you going to be at the cutting edge of use of AI and these kinds of indices? What's going on there?

Nestor Maslej (guest)

I'll kind of talk in the abstract first, then I'll get more specific. I think that it's definitely going to have to be business practice that is going to drive adoption rate. As different as AI is technologically, I really subscribe to the belief that in some ways things aren't that different and we just see kind of new iterations of things. And there are a lot of examples in history when transformative new technologies come to the fore. And typically there tends to be a delta of 10, 15, 20 years between when the technology is introduced and when it starts driving positive productivity impacts across the board. And that's because typically in that interim period, businesses need to figure out how to use the technology well and design processes to ensure that they get the most out of it.

And again, you could imagine a manufacturing facility in the 1850s that was using steam power. When electricity was introduced, they couldn't just snap their fingers and retrofit their production to be electric. They probably needed to think of designing the production process in an entirely different way to actually leverage what this new technology could do. And I think it's a similar thing with AI from a variety of perspectives. I think to me the analogy that I like to use is AI is not a product in the way that you have a refrigerator or an oven or something like that. It's more an energy, it's more the kind of electricity again. And what we need right now is better wiring, better ways for this technology to funnel towards purposes that we're trying to achieve.

And that wiring perhaps is going to be achieved by businesses that are going to build custom applications. Perhaps businesses now are going to try to experiment with the technology to use it a little bit differently. And on some level we might even need to explore perhaps different modalities of engaging with the technology because a lot of these LLMs, they exist in the form of a chatbot. And while the chatbot is useful, you could imagine that that's not really the best interface for using this technology. Presumably in any kind of given business there's a lot of custom things that you need to do and in an ideal world, you would love to have an application that does a lot of those things.

So that's kind of my abstract response. I think on the more practical level, I don't really think there is a standard playbook. I think that if I was in any organization now, the first thing you need to do is, it starts with education. It starts with understanding what this technology is, what are some of the trade-offs, what are some of the different risks and advantages if you build with certain models or not. And I think really accepting, and I would argue with anybody on this point, I do think either you decide today to embrace this technology on your own terms or the decision to embrace it is going to be made for you in 10 years on substantially worse terms. And in that period now I think it's a matter-

Paul Samson (host)

Sounds like a trade deal between Canada and the US.

Nestor Maslej (guest)

Exactly, exactly. And I think that in that period now it's more about doing that experimentation and that's something that I try to do every single day. I try to see which models I like, I use them for copy editing, for generating ideas, but there's a lot of work that I do, and I'm sure you guys find this as well, that an LLM could probably do if I had some kind of app, but it just can't completely be done with a chatbot and that interface isn't kind of working quite well. So this kind of harkens back to what I said at the beginning of the conversation that I think the big question for me moving forward is really going to be how are businesses and regular people going to figure out how to use that technology?

And certainly the technology can get better, but I think already now we have a very good technology. It's about figuring out how to use it and if you're a business who wants to use it, I don't think you can kind of hope that there is a magical ten-step solution. Every business has different needs. There are going to be different things that matter. If you're a law firm where you need highly accurate responses and highly accurate citations, you need to go with a model that is top of class in terms of accuracy. Whereas if you're perhaps a business that is developing a consumer facing, customer chatbot, you might not necessarily need high accuracy, but you need something that is relatively accurate at a pretty cheap cost because you're going to be running a high number of queries.

So thinking about where you stand and what you want to do is very important there.

Paul Samson (host)

So marketing pizzas, you can fail quite a bit if you're providing legal device, you better not fail. But if you're doing public policy or these kinds of things, you probably don't want to be failing either. So there's still a pretty high bar here and the human in the loop is going to be there for some time if not forever.

Vass Bednar (host)

And speaking of things people want to do, at one point I was like, "Oh, I wish I could be the provinces chatbots are." Because I really don't think chatbots should be able to even pretend that they're humans. I think they have to be so labeled and so clear. I think especially when governments are deploying them, it can really seem like you spoke to someone from Immigration Canada, but you spoke to a bot. So it's balancing those efficiencies anyway. I guess my mandate would be pretty short. There was a lot of chatter recently, I guess at the time of our recording, but on social media, and as you know, Paul's a big social media guy, around Jevons Paradox. I hadn't heard of it before.

Clip

For example, as coal burning technology became more efficient, the consumption of coal actually increased because it became cheaper and more accessible rather than saving the amount of coal used.

Vass Bednar (host)

An economic theory suggests that as tech improvements increase the efficiency of resource use, the overall consumption of that resource may increase rather than decrease. I think cell phones are a decent example. Greater efficiency, lowers costs, making the resource more attractive, leading to higher demand. We've already sort of talked about the cost of production and other inputs, but do you anticipate that the paradox will be reflected in some way through future iterations of the index?

Nestor Maslej (guest)

I definitely think so. I think that beyond just tracking the degree to which this technology is diffusing more, and I think you definitely see that. One of the highlights that we're going to be reporting on in this year's report is how, I think for the first year in a while, the number of businesses that report using AI has really kind of shot up. So it does seem to be that the tech is being used a lot more. And I mean part of that is probably just marketing and kind of AI being the zeitgeist, but the technology is genuinely a lot cheaper and it's easier to use and more effective. So I definitely think you're going to have something like that, but I think there's going to be two things that are going to be going on and you're seeing this already.

Where on the frontier, a lot of these model developers, they still have an incentive to build to the best model, to build the most powerful model, to make the strongest systems. And they know that scaling these systems has worked so far. So they're going to continue doing that, continue feeding more and more data into these systems, and that's going to lead to models that are going to cost more and more on the frontier because it's a bit of an arms race. If you're a company like Google, you can't afford to even lose five to 10% of your advertising revenue to perhaps a search chatbot rival. That could be very devastating. So these companies have an incentive to play this kind of game and to be in this arms race.

But what you're also seeing is that for these models that are kind of in the middle of the pack that are good, but maybe not frontier, the cost to train these systems and to deploy them is decreasing quite sharply. And that's actually some new analysis that we're pioneering in this year's index. Looking at the inference costs of these AI systems and the inference costs is going down very rapidly and very massively. So you are starting to see cheaper costs that I think is very likely going to drive greater degrees of diffusion. But I think we... It doesn't seem to me like we've still figured out or businesses have still figured out how to use this technology in this kind of reliable, consistent way.

I think there's a lot of kind of micro experiments that suggest that AI does drive productivity forward, but it doesn't seem like we've cracked the barrier there, so to speak, yet.

Paul Samson (host)

Let me pull off this Jevons paradox on another angle here a little bit because I think it is something that is kind of... These are sleeper issues. Technology is really successful, it gets made more efficient, it gets made more efficient again and again, but then suddenly everyone's using five of them and the overall footprint energy or environmental of that technology becomes really, really big. And so everyone is assuming that AI will keep going forward, may become much more efficient. Let's assume it does, still requiring massive data centers and interfaces.

So is that still very much like the frontier constraint that a lot of the AI people think like we're going to need these small modular nuclear reactors and we're going to need all this stuff because we're running out of energy. We're okay on the data, we're okay on the algorithm improvements and latency of training and stuff, but energy is the problem here. And then of course there's the environmental impact, but even just energy itself, is that going to hold true? We don't know of course, but is that still the kind of the top issue at Silicon Valley?

Nestor Maslej (guest)

Well, it's an issue. I think that it really depends on who you ask. I think broad perspective, of course it's always good to have more energy and if you funnel it towards a productive cause that is positive. But I think this kind of energy hypothesis assumes that the kind of way forward for AI is going to be at least on the training side, scaling these systems more and more and doing these massive training runs. And basically the kind of to fill you in when the transformer model, which is the kind of architectural backbone of a lot of these AI systems was discovered in 2017. Not long after that, researchers realize that if they just pump more data into this kind of AI recipe or way of building an AI system, they get better performing models.

And that's pretty much what all these companies have been doing in the last five or six years. I mean there's been some kind of, let's say around the edges improvements in how these models are designed, but it's really a story of scaling. So the big question moving forward, is scaling going to continue leading to better progress? And there are some people that really adamantly believe the answer is yes, that we get more data into these systems and these systems are going to get better because they have gotten better at scale. Other people point to the fact that there's a lot of other examples of technologies rapidly rising and then suddenly flatlining.

I think an example that I really liked was looking at the kind of speed of commercial airlines where kind of in the forties, fifties and sixties, the speed of a lot of these airliners was going up. People started pre-booking flights to the moon because there was that kind of excitement about this technology. And of course since the seventies, it flatlines.

Paul Samson (host)

Pre-booking flights to the moon.

Nestor Maslej (guest)

I think with AI there's a similar question, can we continue scaling? And if the answer is yes, then there's going to be a need for more energy. But I would also kind of posit that there's probably a better way to build AI systems. I don't know how to do that, or I'm not going to be the one who's going to think of that way. But what I mean by this is that if you think of the brain, and I think really one of the big things I would say is learning about AI has really made me appreciate the genius of the human brain. I mean these AI systems, they're seeing billions of times more textual data than the three of us will have seen in our entire life. And still at times they kind of make errors with logic and reasoning that a ten-year-old wouldn't make.

It's not to say these systems are bad, but if you kind of look at the performance of a human versus the performance of an AI system, there's just much more efficiency in the operation of a brain. The amount of calories you need to power a brain is substantially less than the amount of energy you need to train an AI system. So clearly there's some magic going on in our minds that isn't being captured in these AI models, but it's really hard to know when someone's going to think of the next great AI architectural revolution. There's a great quote from a Stanford computer scientist, one of the godfathers of AI, where someone asked him once, when are we going to get AGI? And he said, "Anywhere from five to 500 years." And it just-

Paul Samson (host)

Artificial general intelligence.

Nestor Maslej (guest)

It kind of reflects the fact that you never know when you're going to have that next big model breakthrough. The guys who did the transformer, I don't think they had any idea this was going to be the kind of dominant paradigm of the next decade. And if you look at a lot of these big model breakthroughs like AlexNet, which came from the University of Toronto, which was one of the first to use GPUs to scale up AI systems, the transformer. They were all open source in some kind of way or the results were being shared. We're not living in that world anymore. OpenAI and Google, they're not openly publishing what they're doing to make these systems better because it's a competitive problem now.

And you wonder to what degree is that potentially also going to slow down innovation. So the energy question really kind of depends on where you think the future of AI is going to go in terms of how you get systems that are better and stronger.

Vass Bednar (host)

Nestor, every time I have a cookie or something, now I'm going to think about how I need fuel for the magic of my mind.

Nestor Maslej (guest)

Exactly. There you go.

Vass Bednar (host)

You heard me kind of teasing Paul for all his links and his AI excitement. It does feel like every time I blink or refresh my feed, there's another AI breakthrough or noise to someone's fighting about something. And when people sort of debate, it's either... It's so polarizing, right? It's either the dawn of a new golden age or the end of civilization depending on the article that you click. Given that you're somebody who actually crunches the numbers and sort of decodes things, do you ever feel like you're playing AI myth buster?

Nestor Maslej (guest)

I think sometimes. I think it depends on who you speak to. The AI myth busting that I need to do with my dad and family is different than with policymakers or business leaders. But-

Vass Bednar (host)

How so? How's it different? They're not the same kind of audience.

Nestor Maslej (guest)

Definitely not the same. Different kinds of questions. I think when you talk to business leaders I think is there's this kind of question of how big a deal is this technology going to be? I think a lot of them have heard about it and it's like, is this a crypto thing where there's a lot of hype, but then it might go down and then potentially go up again? Or is it actually a game changer? And I think there, if you look at the data, the data now suggests this technology will make a big difference. And I don't think we've even started to see it get off the track. So I think that's the kind of myth busting that you need to do there. I think with policymaking audiences, it's very often there's this acceptance that you need this technology, but how do you get it right?

How do you ensure that it's safe, reliable, effective? And I honestly think when you talk to... I don't know if my dad will listen to this, but if you categorize them as the kind of regular Joe, the kind of regular civilian, what are they worried about? I honestly think in a lot of places like Canada and the United States, there is a fear of AI. If you look at public opinion data, which we also track in the index, some of the more pessimistic countries when it comes to artificial intelligence are the United States or Germany or Canada or France. And I haven't seen good hypotheses as to why this is the case, but I think part of this has to do with the fact that in a lot of these countries you have a lot of knowledge or service workers that see this technology that could potentially threaten them.

And if you look at other countries like Indonesia, Mexico, Malaysia, there is a lot of bullishness around AI. There's excitement about this technology. Maybe there they feel they have less to lose, that AI is going to help them climb the ladder, whereas perhaps people here feel that AI is going to take them down a few pegs. So I think when you speak to that audience, the kind of myth busting is really around, is this technology something to be afraid of? Is it something to be excited about? What can it actually do? What can it actually not do? Because the technology is impressive, but it still has a lot of kind of downsides as well. And there is a lot of kind of narrative in the world of AI.

Again, this comes back to what we were saying at the beginning, but you all remember, I think when ChatGPT launched a few months afterwards, there were these kinds of calls to stop AI research, six-month moratorium, \[inaudible 00:54:37\] AI is going to be an existential risk. And I think in hindsight now, you probably... That seems almost a bit... It seems like that was a call that perhaps came too soon. And I think even some of the signatories of those kinds of declarations have said, "Maybe we were kind of a bit hasty in how we made claims for those kinds of things." And I think this is all to say that when it comes to a lot of these technologies and our kind of prognostications about them, we need to humble ourselves a little bit because we obviously want to know where the future's going to go and how things might change.

But we're not always the best at predicting the future. And of course if we had a crystal ball, things would be great, but that's not how it works. And I think the best thing that you could do is just try to make the best guess that you can, but also have a flexible perspective that reacts to what is actually happening rather than what you think might happen. And a lot of this takes about this or that. It's fun when you just do it on Twitter, X, or on social media, but when it actually comes to imposing legislation, like in California, you had SB 1047 where the stakes can be very massive for businesses, then it becomes a bit of a different calculus. And then that myth busting has to be informed a bit more by fact and reason.

Paul Samson (host)

That feels like a great place to end the conversation because I think it was a great insightful summary, but maybe we should still say, is there anything you really wanted to say that we didn't ask or give you a chance to?

Nestor Maslej (guest)

I think the final thing I'd leave the audience with is that, as I said, technology is really a human problem more than a technical one. AI is something that is built on laboratories, it runs on computers, but the way in which it is going to be rolled out in society is going to be a reflection of what we want of it and what we ask of it. And whether you're a policymaker, a business leader, or just a regular person, you can't really afford to put your head in the stand and pretend that the technology doesn't exist. It's going to be here whether you like it or not.

And I think the reaction to that and the appropriate thing to say is that people have the ability to shape how this technology is going to go to inform the course of its development. And it's time now to kind of have some of those conversations and have some of those dialogues.

Paul Samson (host)

Great. Thanks so much, Nestor, for your time and back to your workday on the West Coast.

Vass Bednar (host)

Thanks, Nestor.

Nestor Maslej (guest)

Thank you guys.

Paul Samson (host)

Hey, Vass, did you just get the email I sent you with the new report? There's a draft coming out. No, I'm joking, but there's actually new-

Vass Bednar (host)

\[inaudible 00:57:30\] date one.

Paul Samson (host)

There is a new index coming out in a few weeks, right. So we'll stay tuned for that.

Vass Bednar (host)

I heard how excited you are about it. So I know your calendar is marked.

Paul Samson (host)

Pre-send email.

Vass Bednar (host)

Look, I loved getting to chat with Nestor. He brought the Index to life in a way that I was so pleasantly surprised by. That wasn't a numbers-heavy conversation, it was a context conversation. And I learned a bit more about the technology. And just looking at all the notes I was writing down, I really liked when he used the term energy. You do sometimes hear that line about, "Oh, AI is like the new electricity." I liked the thought of it as an energy, and I'm for some reason quite bothered that Grok has a good AI.

Paul Samson (host)

That Grok's moving up the rankings.

Vass Bednar (host)

I think I just don't want it to be liked, but that deserves some introspection for me. What about you?

Paul Samson (host)

Well, just on exactly that line then, the thing that seems like it's going to separate out beyond just the technology and efficiency element is that data quality and quantity to some degree. And one of the things that Grok has, Grok-2 of course, is a lot of data from X.

Vass Bednar (host)

From my tweets.

Paul Samson (host)

From your tweets and from Tesla vehicles and all kinds of things. And so that really could be an advantage in some ways for Grok. And then when you think about China having unique data sets, those could be very interesting too. But they've got those limits of if there's certain things you can't use, that kind of blows up the model and that came out in the discussion. I learned definitely kind of penny dropped when you said a cookie goes a long way for brain power, but not too far for an AI large language model.

So our brains are pretty powerful and they're creative in things in ways that AI will... I'm suspect that AI will get there in the same way. AI can be super creative and powerful, but is it going to have that little even misfires like Nestor said and you said it, when you read a report that AI is written beautifully.

Vass Bednar (host)

So you mean when you delve into a report?

Paul Samson (host)

When you delve into a report or you're reading, I was thinking more of an essay.

Vass Bednar (host)

AI wrote and use the word delve too much.

Paul Samson (host)

Or student essay, right? And they've used, "This paragraph was written by AI," and it's missing that kind of human touch. The human touch is partly as a result of flaws of the way that we think. Right? But it turns out to be creative.

Vass Bednar (host)

I mean, not me personally, but no, I know what you mean. What it means to be human and the voices that we look for. And maybe that's part of the-

Paul Samson (host)

So AI is going to become super powerful, but it's not the same thing. It's not totally the same thing. On that one I still feel confident despite those that say, "We'll be doing everything better." And Nestor wasn't saying this, but some do. It will do everything better. Everything? You really mean everything? I don't buy it.

Vass Bednar (host)

The productivity enhancing potential, I think everyone is quite intoxicated by, the way it can compliment your work, give you more capacity for the most human elements of your job, and maybe minimize some of the wrote ones. Great, but even writing your own email is kind of joyful in your own voice. And I feel like we are getting a little autocorrect and chatbot heavy in life right now.

Paul Samson (host)

That's true. Handloom weavers weaving pants and shirts and stuff like that were amazing craftspeople and it was really hard to let go of that, but it ultimately made sense to move on and have that kind of mechanized and do other things. And we'll face more of that. So the transition can be a good one. But you don't want to be replaced on things you like doing and that you don't need to be replaced on like emails.

Vass Bednar (host)

Like this podcast. Well, hopefully you won't replace me for the next episode, and I adored this conversation and I'm looking forward to the next one. Policy Prompt is produced by me, Vass Bednar and Paul Samson. Tim Lewis and Mel Wiersma are our technical producers. Background research is contributed by Reanne Cayenne. Brand design by Abhilasha Dewan. And creative direction from Som Tsoi. The original theme music is by Josh Snethlage. Sound mixing by Francois Goudreau. And special thanks to Creative Consultant Ken Ogasawara. Please subscribe and rate Policy Prompt wherever you listen to podcasts and stay tuned for future episodes.