The VideoVerse

TVV EP 24 - Justin Ridge: Bringing Codec Standards to The World

June 04, 2024 Visionular Season 1 Episode 24
TVV EP 24 - Justin Ridge: Bringing Codec Standards to The World
The VideoVerse
More Info
The VideoVerse
TVV EP 24 - Justin Ridge: Bringing Codec Standards to The World
Jun 04, 2024 Season 1 Episode 24
Visionular

In this episode, we talk to Justin Ridge, President of the Media Coding Industry Forum (MC-IF). A veteran of video coding standards, he shares his perspective on the video codec landscape and the work of MC-IF in promoting MPEG standards. We discuss the adoption of VVC and how it compares with HEVC, the route map for future standards, the work being done for next-generation video coding, and the influence of the AI revolution on these developments.

Show Notes Transcript Chapter Markers

In this episode, we talk to Justin Ridge, President of the Media Coding Industry Forum (MC-IF). A veteran of video coding standards, he shares his perspective on the video codec landscape and the work of MC-IF in promoting MPEG standards. We discuss the adoption of VVC and how it compares with HEVC, the route map for future standards, the work being done for next-generation video coding, and the influence of the AI revolution on these developments.

Welcome to The VideoVerse.

Zoe: Hi everyone, this is Zoe. So we are here for the VideoVerse Podcast again. So for this episode, I'm actually, we invite Justin Ridge from Nokia. I have to say that I got to know Justin all the way back almost 20 years ago, when I started my first job at Nokia. And so I will like Justin to introduce him himself first. Hi Justin. Good to have you.

Sorry, I forgot to mention, sorry, that Thomas is from our team, so everybody already got familiar if you watch our previous episode. And so Thomas, join me from London as the co-host for this episode. All right, so Justin. Now is your time to introduce yourself.

Justin: Thanks Zoe. It's good to be with you and with Thomas. As Zoe said, I first met her, it makes me feel very old quite a long time ago when we were both working at Nokia in Dallas. I've been with Nokia for over 20 years and for most of that time, I've been working on video coding research. So going back to the H.264 days, then H.265, H.266 or VVC, but also working on the codec promotion side, the adoption at application standards. So I do a little bit at DVB and going back to ATSC in the old ATSC-M/H days, but it's always been video codec-related in some way. I'm involved on the ITU side with Study Group 16, which is one of the parent bodies of JVET that produces those standards. And also in the promotion sense, I'm also involved with the Media Coding Industry Forum, which we call MCIF. And that's a sort of current initiative we're working on.

Zoe: All right. Basically right now you are at this moment, you're still based in Dallas, right? Right now.

Justin: Yeah. Correct. I've been based in Dallas all that time. People have sort of come and gone around me, and people who were in Dallas have moved to different places, but I've always been based in Dallas.

Zoe: Right. So of course that was my starting point actually for my own professional life, and then started from Nokia Dallas to right now. Seeing that you're still there, but you, like you just mentioned, you experienced all these changes this many years, 20 years, and you are still in the field of video codec or, at large, video codec. So I just wonder, so what do you feel, because from the standard, at the time we were, I think you were also working on H.264 HEVC at the time, and then we experienced HEVC and VVC, and now there's a new codec standard effort.

So we like to know that what many people ask how different this codec standards make from one generation to the other and why there's a still like codec standards being used. Because for example, at this moment, even though we're talking about lot of new standards, but AVC H.264 should be still the one that has been unanimously supported. No any other codec yet, like the ecosystem of AVC, and what about the new standards, why there's still new standard activities that's going on?


[00:04:30  There's always a desire for more efficiency]

Justin: Yeah, well, I think there's a few parts to that question. So the first first part of it is that there's always a desire for more efficiency. I mean, in many areas, efficiency is money, whether that's in a broadcast sense, where spectrum is valuable, or whether it's in a storage sense, where, I mean, storage is not not free, and people need to store vast amounts of video, and so the efficiency at which they can do that matters. And so, and the amount of video is still increasing, the amount of video that's used on the internet, and so there's always this desire for better efficiency. But also the other reason you see new codecs is because there's new applications and higher resolutions, and these sorts of things. And the new codecs are designed more to work in these in these areas. I mean, when H.264 was being developed, people were testing with CIF and QSIF-type images. I mean-

Zoe: And even running that, I still remember was really slow.

Justin: Yeah, yeah. I mean we used to, we used to do the simulations on, like, a desktop PC or something that was sitting under the desk, and you'd go away overnight, and you'd leave the simulations running on this QCIF sequence. But obviously that reflected what phones were capable of in those days. But these days, of course, that's totally out of date. And so as you change the test conditions to reflect current situations, you find that there's new tools which start to become useful. And also things that might have been too complex 20 years ago can now be done in current software and current computing architectures. And so it opens up new technological possibilities as well, of tools that can be used.

So that's why the new standards keep coming, of course. They need to keep pace with the demand. Sometimes they come out a bit fast maybe. And that might've been the situation with some codecs is that they came out a little bit misaligned with the demand in some areas, and so H.264, I think you asked why that was so popular, and I think that was something that came out at the right the right time. I mean, it was the right time for mobile, it was the right time for some of the new broadcast standards, and it was just the right time for a lot of things. And so it was picked up, and they're all sort of the same family. We've still been building on top of this hybrid, let's say, block-based codec. And all of these new ones are still a variation on that basic skeleton. And probably at some point, we'll reach a situation where that's no longer the case, but currently that's the direction things are going in.

Zoe: So you are saying that-

Thomas: So what is-

Zoe: Okay, go ahead.

Thomas: I just wanted to ask, I mean, what is driving that continued use of that model? Is it risk aversion or is it, you know, making sure you will always get some gains? Because what you see at the moment is more and more tools being added to standards. Do you think there's maybe a risk that it becomes difficult to actually get the gains that the standard gets because of the increase in complexity of just adding more and more tools?

Justin: Yeah, I think that's very true, Thomas. The existing framework gives you some sort of constraints. You're locked into a certain way of doing things, and I don't know if it's necessarily the complexity, but just the cost of changing something means that you have to change a lot of other things, potentially. And so why do people do it like that? I think you're right. The process does incentivize these sort of low-risk steps. It would be a big risk for someone to go away and design a completely new codec. They probably could do it with enough time or resources, but then it's gotta be accepted by everyone. 'cause we work in this open standards environment. And of course, if the majority don't like what's been done, then you've potentially wasted all that time, and that's the risk.

But you're right, it does make it more challenging to do something radically different. And when you've got that sort of a structure, at some point, something disruptive happens that will break it. At some point you just either don't get the gain, or someone comes along with something that's so radical and so much better that people can't ignore it. And that's probably some combination of those things is what will end up happening, I would guess, just looking at the crystal ball,

Zoe: You think right now maybe is the time that these things happen? Because you mentioned, you two mentioned, excuse me, the framework that we're talking about is motion block based, the motion estimation plus two-dimensional transform. So that's the fundamental framework for all this codecs standard have followed, and everybody talking about AI right now how many leverage a neural network for video compression. Do you think right now is the time maybe potentially the framework may change?

Justin: Probably a little early. I think at the moment what you'll see is those kinds of innovations, AI and so on, mostly happening in a non-normative way, or in other words, outside the standard, for example, in an encoder or in some type of post-processing because they're still fairly complicated things to do on all platforms. And if you want a codec that works across all platforms, you can't make it just so complex that there's so much completely eliminated. Plus if it's non-normative, then that also allows, it simplifies the standardization process. And I don't think we're actually there yet where we'll get some radical change in structure. I think people were investigating it, but it's probably a little too soon to expect that change.

Thomas: I wanted to also ask you about this business of post-processing. I mean, one thing we've seen recently is the standardization of film grain synthesis from, originally from AV1, but creating a specification for other codecs. And in the current viewing environment, actually the codec is only one part of visual quality, and televisions now are doing a huge amount of processing, including upscaling and so on, and talking to content providers, I get the impression that they're a bit frustrated that they don't have a common language to address all the disparate devices with their display capabilities. Do you have any thoughts about how the standardization community could help address that? You know, that you can have all these AI tools to do super resolution, but if lots of clients don't do them, then you are targeting the lowest common denominator.

[00:13:23 Network-Based Media Processing - NBMP; Supplemental Enhancement Information - SEI]

Justin: Yeah, no, it's a good point, and this is true. There's a lot of things can be done in post-processing and some of this work, there is an actual effort in MPEG which is called network-based media processing, or NBMP. And it's sort of looks at how some of these sort of things can be done in a uniform way. And there's this idea of guided operations, where you don't necessarily make them a normative process and say, well, this is how you should do upscaling, or whatever the process is, but to give some sort of guidance on the parameters so that things might look, you know, within some range of tolerance of what they're intended to look like.

But it's a good point. I mean, part of the reason that you end up in this situation is that the video codec side is, the standardization part is chopped off into certain blocks. You know, you don't typically, when you have requirements for a video codec, you typically don't take into account things like the display or the capture or the channel, the transmission channel. and that's partly for tractability, because you know, the design problem starts to become almost infinite once you start to consider all of these challenges. And it's partly also because it requires some type of agreement on what those test conditions would be. And then of course everyone's use case is different, and they don't want tools which kill the efficiency in their use case, and you know.

Then there's use cases which haven't even been thought of yet, which we see as a continual theme with the video codecs. A lot of the times the way they get used is not what they were originally thought of being for. And so all of those things sort of fight against designing these solutions in. So I think people were aware of it and they're working on these guided approaches and these kinds of things, and perhaps that's one of the areas that these SEI messages, or supplemental enhancement information messages, are intended to address. So for example, you get a film grain SEI, and they're intended to address these kinds of things, but it's still an open problem and it's still something we see people actively working on.

Thomas: Is this something that maybe, you know, an industry forum could help drive? You could imagine some kind of Kitemark for what's, you know, a high quality implementation does, you know, it should obey the codec standard. But maybe these other SEI messages, you know, for particular application scenarios. Is there a role for industry to gather together and produce more consistent standards?

Justin: There is. And the question is, perhaps more, where? Some of these industry, well, that's what call 'em application standards. If you look at the broadcast ones, you know, the DVBs, ATSC, so on of the world, they might say, you need to use this SEI message. You have to be able to understand this one. Some of the streaming specifications will say you have to be able to adapt the resolution, or you have to be able to do these certain type of, handle these certain type of features. That process has gone back to even Blu-ray, where certain SEIs were sort of known to be required.

So it's been done on sort of a per-industry basis, or a per-vertical basis. But there hasn't been a sort of cross-industry trend, at least so far, to do that. Maybe it's something which as media becomes a bit more uniform, maybe it's something which will extend across a industry. In MCIF we had this, we had a sub-profiling working group. And the idea is that in VVC, you've got these flags where you can turn on and off certain tool sets. So you might say that for a particular application, I don't need the entire, whichever profile, but I only need a certain set of tools.

And so those can be signaled in some type of a sub-profile indicator, basically. And so MCIF is actually one of the ways in which those can be defined, through MCIF for example, but that also sort of fights against scale, where people want to, if they going to produce a chip in silicon, they want it to do everything, because they don't want just a typically they don't wanna save, you know, a few dollars on this implementation, which they want to reuse over and over and over and over and over again. And so that the scale sort of argues against doing those sorts of things. But it's a possibility.

Thomas: Yeah, pro profiling is particularly interesting. I remember some of the discussions over profiles have been, let's say, some of the most difficult standardization discussions that can take many meetings and many hours in each meeting. It seems to me that the lesson is that the profiles are kind of bad for adoption, especially now that there are so many existing codecs that if you fragment your market with multiple profiles, it's harder for your new codec to get adopted. I mean, this is something that AV1 did by having basically a single profile, it seems that VVC followed a kind of similar approach. I mean, is that a kind of fair characterization?

Justin: Yeah, I think that's a fair characterization. A lot of that is driven by the increasing hardware capabilities. I mean if you go back 20 years ago, you had this vast difference in the codec capability from these phones, which did CIF or QSIF. You know, and they, if people threw something, like, interlaced at them, they'd just have a meltdown. And so you needed to have these simple profiles, and the idea was that, in those days, that you wanted the more capable devices like a TV or a set top box or whatever to be able to play that video someone generated on their phone, like, you stick in a USB with a memory card or whatever the story was and you could play those. And on those vastly more capable devices.

Whereas now you sort of, you find a, you know, smartphones can do a whole lot of stuff these days, as everyone knows. And the difference in capabilities is not so much the resolution or the frame rate or some of these sorts of things, but maybe it's some other selling points that are in the higher-end electronics. And so this is one of the things, I think, that does drive this less need for profiles. And I don't think anyone wants to have excessive number of profiles to, I mean, no one wants to segment things, as you've accurately described it. And so perhaps the fewer number is this realization that it is possible to get away with fewer.

Zoe: Yeah. Talking about this, because you just mentioned MCIF, and I just want to get a sense of what this is a focus, especially as we just mentioned, we see the migration of so many codecs. You mentioned that because underlying computational resources, actually got more and more powerful, and the same time, the use cases also become more and more wider and deeper. Like right now, we are also talking about spatial videos, 3Ds, emerging kind of experiences, but at this moment there's no dominant codec. For example, used to be still HEVC oh AVC H.224, still the one that is has a larger, I believe, the latest market share, but there's so many codec that is still available at this moment. And you mentioned not only involved in the codec standardization, you're also involved in the application level. So taking the standpoint from MCIF, as well as applications activities that you got involved. So what's your view of the current state of the art about all this many codec, and what would be the next step that can be influenced?

[00:23:28 Next step that can be influenced]

Justin: Well, I think if you look at performance, at least in terms of the codecs that are currently available, I think there's a general acceptance that VVC is the current, most efficient in terms of coding efficiency, in terms of performance. But of course, when you start to come with a codec where people have existing deployments, that's a challenging economic thing. I mean, if you come along with H.264 and people were transitioning from analog to digital, they really have to replace a whole bunch of stuff and they've got no choice, but once you start to come with HEVC or VVC or AV1 or any of these other codecs that are around, you sort of have this legacy digital system then that you have to have to deal with.

And it's not a trivial thing to just rip that out and, or rip a component out, even, and change it. And so you do have this expansion of the number of codecs, as you've mentioned, Zoe, and so for that reason, you do have a lot of H.264 deployment still, and you still have a lot of people using HEVC, and in some cases the deployment is growing. And we've been told that the main focus of some companies for the next couple of years will be on improving their HEVC offerings. And so-

Zoe: Well, what kind of particular use cases, for example, they adopt based on-

Justin: I think, so for example, if you look at encoder manufacturers, that's one where they still see a lot of juice can be squeezed, you know, like the lemon, they see that a lot of juice can be squeezed out of their current HEVC products. I mean they're still making improvements in some cases to old EG2 encoders, in some cases. But the sort of where there's the ability is perhaps in HEVC for some of these industries, whereas VVC is sort of in this initial deployment phase where you start to see chip sets coming, you start to see mobile chip sets and so on, or TV chip sets that start to support VVC and hardware, but you're still missing some of the content, for example.

And so these things tend to roll out in this sort of non-synchronized way, and it's different globally as well as I think you know, you start to see more interest in VVC in China, perhaps, than you do in some other regions. And so there's a region-based difference as well. And so things are happening at different speeds in different places. But VVC probably on about the same track as HEVC was, you know, if you look at three or four years after HEVC was finished, VVC's probably in about the same place, whether it continues that way, you know, who can predict, but it's sort of on track let's say.

Zoe: Yeah. So then what, so because you put a quite effort, as mentioned in the MCIF, the Media Codec Industry Forum, so how this forum actually plays a role in terms of all this codec drivings and adaptations and from the application perspective?

Justin: Yeah, MCIF is, if you look at the standards bodies, I know you've had some involvement with AOM or AOM codecs. And that's a bit of a different body, because they can do some of the marketing and development and those sorts of things within the organization. Whereas if you look at JVET or MPEG, or these standards bodies, they actually have no mandate and no resources to do those kinds of activities. Their basic job is to push the standard for publication. And once it's, you know, up on ITU's website, or wherever the standard's made available, then that's apart from maintenance and debugging and all of those sorts of things, that's the end of the standardization body's job.

And so MCIF is really intended to pick up where that leaves off. So for example, we want to educate people about the new codec's capabilities. You know, this is what you can do, and this is how you can use it, for example. These tools that are baked in and they're in the spec, SEI messages, and all of these sorts of things, they have quite a lot of power, but if you just leave it at that point, they're sort of in the minds of the researchers and do people actually know how I take this and I can make it work or help reduce my memory usage, or help reduce, you know, some fix some pipeline problem that, you know, it's intended to do this and people just aren't necessarily aware of or thinking about how they can do those things. And so part of the role of MCIF is to explain how these new parts of the codec can be used.

So for example, we've got what's called broadcast and streaming guidelines which say, you know, if you are in a broadcast environment when you take this big VVC codec, this is how you should configure it and use it in a broadcast-type setting. So that's the basic role of MCIF, is to take MPEG technologies, not just VVC, but you know, there's a bunch of other stuff that goes on in MPEG.. Not only video, you know, there's audio systems, all of these sorts of things. And to pick up with engaging with the community, with the people who would use it, and being a of a benefit or an assistance to the people who would like to deploy those technologies. That's sort of the MCIF role in a nutshell.

Zoe: Yeah. So as you just mentioned for the standard, we all know that the standard only standardize the syntax of the decoder behavior, right? So after that, of course along the way there's always a reference software that actually has to implement the encoder side to manifest and showcase the new coding tools capabilities. But then after that, you mentioned there's a driving, for example, even I remember, the IBC in the MCIF booth, that is quite some demos not just to showing showcase on encoder side but also on the decoder, 'cause the hardware decoder deployment takes time, especially right after the new codec was standardized, and then they need to go from there.

But usually cheap side needs about 1 to 2x cycle to go to the market. Then you need the software decoder to actually being laid out there starting to be deployments so that at least you have the initial point. Once you have decoder, then you have a content, and encoder comes out. So I do see the software decoder demo in the booth. So that's also part of, right, driving the deployment of new standards.


Justin: Yeah, absolutely. I mean that's where companies like yours come in in doing some of these products. You know, we have this at the booth, I think one of the most common questions that we get at, whether it's at IBC or NAB or whatever the shows are, is what chip sets are out yet. You know, people are looking to see, they're looking for some sort of trigger or gauge, and once they start hearing that this is in, you know, X percent of TVs or X percent of phones, or whatever the story is, they start to say, well this is serious? And so part of what we do at IBC, you're right, we had encoded demos, we had decoded demos, we had demos showing it over 5G, for example. And at NAB we had a 5G gaming demo from, sorry, a mobile gaming demo from Ericsson. And which was pretty, pretty cool, ultra-low-latency, you know, VVC gaming in software.

And so, you know, showing people that this stuff is not some distant dream in the future or something that's still, you know, in a theoretical realm. But something that people can actually deploy on a phone or that's actually in a chip set or something is an important role of MCIF. But you're right. You do see these things coming in software first. And that's perhaps one of the interesting things with some of the newer codecs, VVC is an example, is that you find that the gap between software and hardware is not what was. You know, you can actually take a codec and run it in software and it's not outrageous. And as you say, that's a quick way into the market because of the lower cycle time on those things.

Thomas: One of the things that comes up when we talk to customers about deploying a new codec is that they're doing comparisons between the old codec and the new codec, at a point of overlap, so they're not necessarily going to deploy VVC at the absolute bleeding edge of what its capabilities are. They want to deploy it at a similar complexity to maybe their HEVC deployment or... And in some cases they want to deploy ATVC in the footprint of a low complexity H.264.

So there are these kind of big design constraints that each generation of codec has to kind of contain the previous generations so that you can do that efficiently, that you can produce really lower complexity versions of the technology. I was wondering, you know, how much of these kind of encoder complexity questions come up in standardization? You know, in my experience, people talk an awful lot about decoders and their complexity, 'cause hardware is the kind of the critical path. But do people talk about encoders and the market and how you could make something work along those lines?


Justin: Probably not as much as we should is the honest answer. I would say it's a bit lacking. Of course with these standards groups of whatever nature, you are constrained by the knowledge of the people in the room. And so if the encoder vendors do not participate in the discussion, then you lose a wealth of their knowledge of what will be challenging for them. We do have some encoder designers that do come, and people who are familiar with encoder design. And so there's not zero voices, but there's probably not the level that there should be.

What we've seen is perhaps a growing expertise, though, in software encoder design, you start to see people doing faster versions of the reference software and not only one or two, but there's people who are interested in doing somewhat faster versions of what we might have already used. Now, it probably also leads into the actual design process or the standardization process. I mean, it's getting really ridiculous how long it takes to run some of these simulations. You know, like if you have a... The way standardization works is that you have this sort of set of common conditions that you test against, and the principle of that is fine. You want everyone to be comparing apples with apples and you want to know, you know, so that no one's pulling tricks and cherry-picking certain results or sequences.

You want everyone to test on the same basis of conditions. So that's all fine, but the problem is that with the current software, I mean it takes weeks. It takes ages to run these simulations, and I mean I don't know how much warmer the planet is getting due to these data center, you know, simulations that people are running to try and squeeze a bit out their codec. But you know, that's a non-trivial issue, especially as you start moving into AI and some of the machine learning things, which make that even worse in some ways. And so there is probably some sort of decision to be made about how do we get a tractable design process? How do we get something that's sustainable and able to allow people to innovate, spend more time in the smarter thinking, rather than just in the actual process?

And I think some of that thinking would be relevant to what you are saying about the encoder. You know, as people start to think what can we do here, it forces 'em to maybe be a bit more real, rather than just throwing an extra grid at something, or you know, whatever. But I think that's an area for improvement, I'd say. I agree with what you are, you are saying, and probably it would be nice to improve a bit there.

Thomas: Yeah, I've heard that, well, essentially you have to write SIMD for many of these tools in order to get a reasonable runtime, especially if there's other SIMD in the reference encoder and you don't want your tool to stand out. And I mean that's probably a good thing in some sense, in that you have some real idea of what an implementation costs in software if you do some at least rudimentary level of SIMD. But I was thinking of these kind of tool flags that you mentioned earlier with VVC, and what the thinking was there in terms of how you imagine they might be used by different encoder applications and whether you'd thought about this issue of having to compete with prior standards.

Justin: Yeah, I think the real intention there was to have the ability to turn off particular tools from an implementation. So if something was particularly, it wasn't so much from the ground up, start saying, well I'll start with this basic thing and then I'll choose, I'll pick and choose the tools that I want to use. But it was more a little bit of future-proofing, I would say. Because when they design the standard, you know, you don't want to design something that's unrealistic or not implementable, but you also never know what's going to be useful in real life. And you also don't know what's going to be a real problem in real life. You can't predict it with 100% accuracy.

And so the idea more was if you start with this profile. If you start with this set of tools and then for a particular application scenario, I mean, for example, video conferencing, you want super low delay. And if one of these tools is being a bit challenging for low delay implementations, then maybe you want to turn off that tool and you can for, well, for video conferencing, you can negotiate that and say, you know, I'm sending this flag and so on. But for other applications you can't necessarily do that negotiation. And so it's something you might want to define in some sort of sub-profile, in some sort of structured way. So that was really the thinking was that you can start with a set and then you can turn off some particular things which might turn out to be problematic in some environments, without thinking that they were problematic at the time because, you know, they wouldn't have been included if that was the case.

Thomas: Yeah, I think, I think the thing that we found, you know, writing encoders was that when you are asked in advance at the start of a standardization process, you know, to define a profile, that you don't know in advance what's going to be difficult, and actually the things that people think are going to be difficult might not be that difficult in practice or you can use them sometimes. So we've always wanted to have the richest tool set available to us, but as you say, sometimes you still find that there are things that you can't use or you don't want to use that you might want to turn off. So profiles were kind of difficult for getting the best out of even very low complexity implementations. You would find you could still have used things on a higher profile, but you can't. Because you're constrained to working within that range.

[00:43:57 Codecs don't just come and go in five years]

Justin: Yeah, I think these technologies are sort of littered with these things which people thought would be useful and then turned out not to be so, you know, so much in demand and things which people thought would be useful for one situation or one setting, and then people realize I can use it for this as well. And I think these are one of the things where it's very difficult to know exactly how technology will be done. 'Cause codecs last a while, you know? Codecs don't just, you know, they don't just come and go in five years, and so it's very difficult to know how video usage will even evolve in five years time.

Zoe: Yeah. Just to follow that topic, 'cause just now we also touched how version learning can be used for codec standard. On the other perspective, just wonder, for you view, how this new codec or the next codec technology will address the, let's say, the gen AI videos. Because they were created by the neural network, and then so lot of people mentioned that hey, then you don't need to trust me that there maybe only small amount of bits good enough and guide the reconstruction of the videos to entirely on the player side download. So how do you think about that?

Justin: I think these are all interesting topics. You know, we dunno what, perhaps what I could say is that for the current, currently there's no codec past VVC on the Ampac side.
There's an exploration activity. In other words, people are testing out ideas, they're seeing what they can do. And as part of that exploration activity, people are looking at neural network, integrating these things into the codec. There's what we would call an end-to-end learned codec, which is being explored, and then there's these neural networks being used in certain tool designs or certain designs like loop filters, for example.

And so that sort of exploration is going on. I would say it's still in fairly early stages, but the other thing that's happening there is that we've really just started thinking about requirements for the next next codec. And when I say just, I mean a couple of weeks ago. So it's one of those things that takes a little while, because in all of these settings, the codec is, it's not just people charging in, writing software and bringing some idea. There's sort of some requirements, some underpinning what's done. And that's where some of these topics will be discussed. So do we want to, what type of video do we want to encode?

I mean, we've seen a push in recent years for coding screen content. That's been one of the growing areas for, it started off in HEVC, and then was a bigger deal in VVC. And you know, that suits these eSports and all of these sorts of uses where people are sending something over a, you know, Twitch, or whatever platform. And so then you'll start to see these requirements for these AI-based images, or videos as well I'm sure. Some type of synthetic one. And I think part of the question will come, what out of that is a function of the actual video codec? And what out of that belongs somewhere else? I mean you could, Thomas has mentioned film grain, and you have this like an SEI or something guiding you on how to reconstruct or reapply this film grain, and you could have something which guides you for the parameters of how to regenerate this content.

But there's also a realistic limit on that. At what point are you sort of embedding another whole algorithm, another whole codec into that? And at what point does that become some sort of separate technology that you want to consider on its own merits? So I think that's an open question. Certainly these new contents will drive things, but whether it's part of the codec, whether it's something separate, whether it's something that's synthesized or post-processed or, you know, I think that still remains to be seen.

Thomas: So is the major focus of the requirement still for human viewing of human-originated content? We've heard a lot about machine consumption of video. I mean, is MPEG involved in that? Is are they part of the requirements for the next codec, or is that a kind of separate thing?

Justin: Ah, that's a really good question. So within JVET, they're preparing a technical report on what's called VCM, or video coding for machines, that's currently being developed. So as you know, the JVET documents are public, I mean they're not behind any sort of restriction. And so you can actually look at some of the details of this video coding for machines, TR, they call it, technical report. So it's certainly a topic that's been considered in JVET and then in MPEG there's also work on a more dedicated video coding for machines project.

Now, to what degree that will appear in the requirements, my own personal guess is that will appear in the requirements to some degree. Somehow there will be something in there for it to operate in some mode which doesn't, you know, destroy the video in some human visual system way so that it can be effectively used by a machine. I would expect a requirement like that. I don't think that's a brave prediction. But usually the question with all of these novel or new things is to what extent people are willing to make the basic 2D video codec more complex or to degrade the performance in those situations to provide for this expanded versatility or scope. And typically the answer in the past has been these things are cool, no worries, they're fine, so long as they don't have any major or degradation for the basic 2D video use case, and probably will end up with something similar like that for all of these other new things, would be my guess.

Thomas: Yeah, it sounds a bit like, you know, having a lossless mode, you know you want to do a decent job of a lossless video codec, but actually you're not going to, it's not gonna be the major use case, that kind of thing.

Justin: Yeah, yeah. And that's been the same thing with screen content and some of these other sorts of things. Of course the line's a bit fuzzy, because as something becomes more popular and more in demand and people think, you know, this is... Immersive video is another topic, you know, 360 degree video. As these things become the latest buzzwords and the latest interesting thing, people think, "Oh wow, a lot of content's gonna be in that." And so the line is not clear, but generally speaking, you don't want to degrade this, I mean most of the content is still, you know, movies, user content done in 2D or some type of depth, maybe light depth, but you don't want to degrade that scenario.

Zoe: So basically for our discussions I can feel that this is actually consistent. What I meant is the views from your side somehow to have been consistent with your own journey. 'Cause you have a main theme about your views on video codec, but you kind of aware almost every single aspect of that, and to me, and you'll actually keep the conservative thinking for the next step, but you've hotwired the convention about all these new potential things. And so in your sense for the next, let's say, it's very hard to predict the future. Even for the Gen AI, right? It just came out. It seems right now it's everywhere, and then I'm using ChatGPT, not just on the daily basis. A lot of tools that we created, we actually have to rely on that, and gave us some hints, but it just came out, let's say, one year and a half ago. It seems like it's been there for a long time, and so it's very hard.

But still we want having views to end this episode. Not just your views for next but also for yourself, because this is yours, because you have been working in this field and then almost like in the same domain. Of course, there's a codec standardization migrating to the application layers, but still codec theme. So what do you think of happening in the next one year? Or two.


[00:55:07 What do you think is happening in the next one or two years]

Justin: Next one year, probably not much change, to be honest. We'll start to see VVC getting deployed more, is what I would say. It's already in people's roadmaps, and so I think you'll start to see product announcements for VVC. But in terms of next generation codecs, that's what I was referring to, probably not a whole lot on the JVET side. You might see some type of preparatory work a bit more about these requirements. Maybe they'll start to draft some type of call for evidence to try and get people's, some sort of checkpoint on where we stand with the technologies that have been floating around.

And probably we might have start to have a few more of these discussions about how we go about codecs and what's the limits of them, some of the things Thomas was saying, and maybe we have some of those conversations in the next year, but that's a very, for a codec design, that's a pretty short-term outlook.

Zoe: It seems not much, but it's actually codec is just like even new codec standard. Right now, maybe you got some really still not too complicated, but again in the BD rate, meaning that you achieve the same quality but save the bid rate at 0.5% to 1%. That is already a lot at this point. So it's going to be some, I would say, evolutional changes potentially leading to revolutional change. So that's what we see a lot on the codec side. So what about yourself? So you are continuing along this journey?

Justin: Probably so, at least for the next few years. I mean, I'm old but I'm not old enough to retire or anything like that. So I'll probably keep going with the codec area. I think that does have a level at which it will mature. We probably are, you know, a generation of codecs away from that at the moment. But it's nice to be in this sort of area where there is new research and where it's a sort of a topic of interest and so on. I think you probably find that in your own company and your own work. This is just sort of on the edge and you can do do new stuff. But of course as it as it matures, maybe it's time to start looking at other areas after that. But I think that's a little further off yet.

Zoe: Great. Basically, I am really grateful for you to have in this podcast, 'cause while you are saying, I actually see the same, or maybe more abundant experiences, but I just remind me the first discussions, you were actually representing some review about the based codec, it's just a discussion, but I still remember the way you were presenting at a not very fast pace, but some good behind the scene, and it's just like the old time, but we still here and so not you. I believe like Thomas and is also recording. I remember Martha Karczewicz right now, of course she has been continually leading the codec standardization and from Qualcomm, and I still remember Xiang Lin, and we actually joined the Nokia team almost at the same time. Xiang actually has also been working on codec all the time right now he's with .

So I believe you are one of them who will be continuing working on this field, and then this field also like it's accumulation as well, a lot of any people's effort, a lot of any technologies. And then there's a new content that will be coming, so we'll see what is going to bring up and machine learning, of course everybody talk about it. Even we got that a lot from our investors. Will you be like, your product technology will be suddenly disrupted by these new things?

And we all keep an eye on that, as we discussed, but we believe something may happen. Like you mentioned, still along the codec side, the major framework, as you mentioned, there's still something that for us to explore, there are at least one generation that can be anticipated.

Justin: Yeah, I think it's not a question of whether that will change, but when, you know? What's the trigger or what's the tipping point or level of maturity for something else to come in? But I think it's something definitely to keep aware of. But look, Zoe and Thomas, I think video coding is a pretty small community, really. I mean, you sort of cycle around, and you meet some of the same people over and over again. And so it's, you know, I certainly appreciate the chance to catch up with you, but I'd also say to the, you know, any of the younger people who are watching or watch your podcast, that it's still an interesting area to get into. There's a bit of a learning curve or you sort of need to break into it. And it's not all something you learn in school. There's a lot of black arts to the design of these things, which you sort of have to get your hands dirty and do, but it's still an area where we need to not just have the same old community, but to also people, you know, coming in and bringing fresh ideas and new thoughts. And that'll drive some of these changes in the future, too.

Zoe:
Oh, thank you for bringing that point. We hope of course, more and more people, at least I can pay attention just by watching this episode of the field of video compression. All right, thanks everyone, and thanks for Justin to join this episode. Yes.

Justin:
No worries. Thanks a lot guys. It's it's been really good talking to you.

Zoe:
Thank you. So thanks for everyone listening to our podcast for this episode.

There's always a desire for more efficiency
Network-Based Media Processing - NBMP; Supplemental Enhancement Information - SEI
Next step that can be influenced
Codecs don't just come and go in five years
What do you think is happening in the next one or two years