The VideoVerse

TVV EP 02 - Thomas Daede Pt 1 - AV1 has its roots in RC planes???

September 02, 2022 Season 1 Episode 2
The VideoVerse
TVV EP 02 - Thomas Daede Pt 1 - AV1 has its roots in RC planes???
Show Notes Transcript Chapter Markers

If you've been around the video codec world at all, then Thomas Daede is a familiar name. From his early days in trying to find a way to stream digital video for FPV RC planes to his cutting-edge work at Vimeo, Thomas is a legend.

We dive into how he got started, shedding light on the early days of VP9, HEVC, and the AV1 standard developments in this 2 part series. Thomas contributed significantly to the open source AV1 codec and continues to as he shares where the codec is headed.

Watch the full video version.
Learn more about Visionular and get more information on AV1.

Nathan: All right, everyone. Welcome to the VideoVerse. I am here with my esteemed co-host Zoe Liu. And our guest today, if you’ve been in the world of I’m gonna say video codec as the broad term, I’m guessing you’ve heard of him before. Thomas, why don’t you tell us really quick who you are, what it is you do kind of on a big picture?

Thomas: Hey, I’m Thomas Daede. I work at Vimeo on a video transcoding team there. I previously also worked at Mozilla on the AV1 video coding standard as well as the testing framework for it.

Nathan: Awesome. Yeah, you have a very interesting history with codecs in general. One of the things I wanted to ask you about because you shared a little bit about this off camera and I kind of geeked out with you on this. Tell me a little bit about the really early days how you got into I’m gonna say codec. I’m using that as my broad term. You had a project that you were working on that forced you into it. Is that fair to say?

Thomas: Yeah. So I actually back in like 2013, 2014, I also had a hobby of doing radio controlled airplanes. This was right during or right before that like quadcopters and drones started to take off. So at that time, there wasn’t a whole lot of really good video transmission from the airplane to the ground available. They had just started coming out with some digital systems, but mostly what everyone used is people had analog systems. So they had these little cameras. I actually have a little demo here. This is a analog NTSC transmitting camera. So it sends a very wide bandwidth signal at a couple gigahertz to a receiver.

Nathan: And that’s the antenna hanging off the end that we see?

Thomas: Exactly, yup. So we got a little short gigahertz level antenna. And a little whip antenna on here and power. And these things were not so great. The range was pretty poor. Of course, line of sight usually only. That’s okay for an airplane usually, but it got noisy really quick. These are all analog and they tend to drift all over the place. You have to retune them. And I really wanted to have something better than that. So I actually wanted to do digital video transmission.

Nathan: Which at the time, was anybody doing that?

Thomas: There were a couple very expensive systems that did it. But the problem was that video bandwidth is very high and so you need very expensive fancy radios that have a large amount of bandwidth. So a lot of people would do things like use five gigahertz wifi antennas, but wifi is also not so good for long range. So they would do things like modified wifi antennas. They would do things like patch antennas that could automatically rotate and tilt toward the airplane.

There were some systems out there, but they were kind of very specialized and expensive. And so I also kind of was rolling my own stuff there. I’d already rolled my own actual, the airplane controller was its own radio already. So I kind of wanted to try something myself and see if I could outperform the systems there.

Zoe: And what’s the result?

Nathan: Yeah, what happened, right? Good question.

03:28 How Thomas deployed Daala, an AV1 predecessor

Thomas: What I did is I decided I wanted to basically cram my video as small as possible. I wanted the very smallest bandwidth so I didn’t have to resort to using these very expensive, complicated radios. So I instead looked around for basically the state of the art video compression that I could have possibly used to send it. There’s all of these like H.264 and the like, I wasn’t super excited about H.264 because A, it had been around a while so it was no longer state of the art. It was my own personal project, but I also didn’t like the fact that for a lot of these newer codecs, I had to pay licensing fees for the codec I’d be implementing myself anyway. It felt a little bit backwards. So I ended up looking for like new research video codecs and I found actually that the team at Mozilla and Xiph was working on this video codec called Daala, which is an experimental codec. And I basically looked at it, it looked good, and I went and I implemented part of it on, in particular, I implemented the transform and lapping step in an FPGA. So the idea is that the FPGA would be fast enough to do real-time encoding of it in the air.

Zoe: So, finally, you deployed the FPGA version of Daala.

Thomas: Yeah, I did the FPGA version first, right, because the idea was I couldn’t, on the low power budget I had too on the plane, it wasn’t terribly low. It was more that if I put like a ton of battery in there, that would add weight. The idea was that the FPGA version would be able to vastly outperform any software implementation, especially at that time.

Zoe: Yeah. Can you talk a little bit about Daala ’cause I’m not sure whether– ‘Cause Daala, I’m pretty familiar with that in some sense because that was basically Mozilla’s initiative, having a royalty-free, open source video codec, but you may want to talk a little bit more about that to our audience.

05:31 Daala tools later adopted by AV1

Thomas: Yeah. So that is a research codec. It still exists if you want to go find the code base. It was basically designed around several unique principles. So it had a lap transform. So instead of a normal video codec that has just plain discreet cosign transform blocks that are then blurred together with some sort of post filter, this one used an invertible filter that ran in between the blocks. So the idea is it would basically kind of mix the blocks together on the edges. And you’d do your normal square transforms. I mean, encode those and you do the inverse transform both on the square transforms and the overlapping part. And idea was that this would avoid the blocking artifacts that normally appear in video codecs, but it would also avoid the blurring problems that appear when you use traditional a video codec’s de-blocking filter. 

It also had a couple other things. It had a more perceptual quantizer called PVQ. It had a range coder that was designed to be multi-symbol and basically worked quite a bit faster than other range arithmetic coders. And basically, cram as many possibilities into a single symbol as possible to keep the symbol breakdown for the overall video. And then, it also had a post filter called Cdef that was basically there to clean up the edges further. It was actually, the original idea was that we wanted to make really nice vector art look good. 

So we had this filter that would, the idea is that we’d code a bunch of edges and we’d like kind of do this paint. So we would code like the top and left edge and we’d basically make our own, draw a line through each block. And that would be a predictor. That didn’t work super well, but what did work is using that same predictor but as a post filter. So we basically find an edge in an image and then clean it up with the same method. So, Daala became a part of the AV1 video standard once the Alliance for Open Media was formed. 

We basically combined several pieces of different video codecs. Basically, the base video codec for AV1 was VP9, but we did add several of the Daala features. We sadly didn’t, the lap transforms didn’t get in there. Maybe in the future. But we did get the range coder, so the range coder with some modifications is what’s in AV1. And then the Cdef post filter, also with some modifications, is what’s in one of the post-filters available in AV1 as well.

Nathan: I’m glad you said that ’cause as you were talking, I was thinking I’m hearing echoes here of AV1 stuff potentially. So that explains why. Some of that stuff transitioned over. Did you basically get pulled into the AV1 world at that point because of your involvement with Daala, or how did you end up getting involved with AV1?

Thomas: Yeah. Well, I mean, the goals were aligned, right? We saw that at Mozilla, the licensing terms had come out for other competing codecs like HEVC were basically not shippable for us, with Firefox being a free product. Like, we couldn’t charge that amount for a thing that’s a free download that anyone can download and install and have zillions of installs for. So that licensing model didn’t work for us. So we were already working on Daala as our potential replacement, hopefully with other adopters as well. 

But it turned out this was a problem for other people as well. And so basically, eventually, we got several other people on board, which there’s a long list on Alliance for Open Media group. That was actually kind of an independent separate effort, but eventually, everyone with the same sort of problem came together, and we decided it was best. Obviously, we didn’t want to make like a Daala codec and an AV1 codec, so we decided to combine forces because if we all have the same basically goal in mind, we might as well make one really good video codec with all the efforts of everyone.

Nathan: Yeah. Is that kind of a big, and I’ll ask this question to both you guys ’cause this is a world that you’ve been, I’m gonna say eyeballs deep in and I have been a little bit involved in, but not as much. Is that a big deal amongst engineers to see all these different standards come together and turn into a single standard, ’cause I’m imagining each contributor is going to have to contribute something but then also forfeit something. Am I thinking the right way about what that process might have been like?

Thomas: Yes, absolutely. There’s a lot of compromises. It’s sometimes difficult as an engineer to not see your stuff get adopted or put in. Because AV1 is a royalty-free codec, there’s no like financial incentive. Some other codecs, there’s like a big financial incentive if you get your stuff in ’cause maybe you get more licensing fees. AV1 doesn’t have that, but you still have quite a bit of pride as an engineer, right?

There’s still some emotion if your stuff doesn’t get in or does get in. Certainly, there was stuff I’d love to get in that didn’t happen. Maybe for a future codec. But it’s a big debate and process. And like that was actually also something I was involved in because I worked on AV1 testing. So I was one of the people responsible for how do we test all these new additions to AV1 and how we decide which ones really improve the video quality and which ones don’t.

Nathan: Fascinating. That’s what I was imagining ’cause engineers are engineers, but they’re creative people at heart, you know? They’re creatively solving problems. And so I can see where you put your heart and soul into something. But in some ways, it almost feels like AV1 is kind of democratized codec in a way. ‘Cause, like you said, there’s no financial incentives. There’s the pride factor, but now, it’s everyone can contribute, right?

Thomas: Yeah. Basically, you do have to like sign up for the AOM to get inside a lot of the meetings. However, the standard code base we work on is an open-source code base. And so there’s basically ways even for people outside of that to contribute as well.

Nathan: Yeah. That’s awesome. Not really shifting gears, but diving a little deeper into your involvement with AV1. I know you’ve worked pretty closely with specifically, I mean, AV1 has so many tools and capabilities, but specifically the HDR support, the HDR factor when it comes to AV1. It seems like HDR video is one of the things that often brings AV1 up in conversation. Can you talk to us a little bit about the role that AV1’s HDR capabilities play as far as encoding video and how it helps?

12:26 Thomas talked about the role that AV1’s HDR capabilities

Thomas: Yep. This is actually something that I’m currently working on at Vimeo, so that’s one of our big pushes. So HDR has been doable for a long time, even on older codecs. But it brings up with a lot of challenges that make newer codecs like the AV1 be a lot better. Probably like, the first big benefit is that it’s basically 10-bit by default everywhere. 

So the very baseline profile of AV1 has 10-bit support, so you’re pretty much guaranteed if something has a hardware AV1 decoder, it’s gonna be able to decode 10-bit. And 10-bit is important for HDR because HDR’s wider dynamic range means that banding artifacts and the like are gonna be much more visible than they were with SDR. And so having those extra bits to reduce any visible banning artifacts is much more important for HDR. AV1’s not the first video codec to mandate 10-bits. However, in practice, it is the first video codec that’s in a browser that we can rely on having 10-bits for.

Nathan: Interesting. So AV1 essentially, and I’m asking, is making, as AV1 becomes more popular and more standard, essentially they’re making 10-bit the defacto if you will. 8-bit’s always been the defacto, but now, 10-bit eventually will become the defacto? Is that fair to assume?

Thomas: Yeah, that’s fair to assume. I think a lot of users of AV1 are just jumping straight to 10-bit. There is 8-bit AV1 if you want. There are some advantages to using 8-bit AV1. For example, if you’re using a pure software decoder, you can still decode 8-bit AV1 faster than 10-bit on a CPU.

So, for example, if you really want to get really high quality, low file size AV1 video and you’re targeting lower power computing devices that don’t have a hardware decoder, then you might still want to consider generating AV1 8-bit video, usually for SDR. You wouldn’t wanna do that for HDR. But for a lot of these HDR cases and the like, 10-bit is the most sensible option.

Nathan: Right, absolutely. Yeah. I know it a lot from the capture side, the camera side. I wanna upgrade to a 10-bit camera, but of course, if the whole pathway doesn’t support 10-bit. I guess that’s where AV1 gets exciting for someone like me, who’s a content creator and I’m capturing video. If I capture 10-bit and I know that it can get delivered at 10-bit, it just means that much more of my original picture has been retained. Is that right?

Thomas: Yep. In particular, one thing that’s not always obvious about the eight versus 10-bit difference is that you might think that computer monitors, for example, most of them are 8-bit RGB.

So the difference between eight and 10-bit’s not gonna be visible ’cause if I only have an 8-bit monitor, then how am I gonna see the advantage that 10-bit gives? But there’s actually a couple reasons that you’ll actually see an advantage even if you only have an 8-bit monitor. One is that video is not stored in 8-bit RGB. It’s stored in YUV. And it’s a limited range. Like, 16 to 255 are the values used for 8-bit YUV. As well as like that, plus the color conversion means that you actually get, for most 8-bit video, you get less than 8-bit RGB quantization. So you can get banding from that and 10-bit will basically avoid that. 

The other reason is because on a lot of natural video like straight off the camera, you’re gonna have lots of noise that causes dithering. And that dithering is gonna mask over any sort of banding artifacts. So you won’t see them directly off the camera. Unfortunately, video codecs are really, really bad at encoding basically noise and dithering because totally they take a lot of bits.

And so most video encoders will just smooth them out. But the downside is that once you remove those, they’re no longer masking the banding. And that makes the banding that is there much more visible because you’ll now see those. Instead of seeing a smooth gradient that’s been dithered, as more and more pixels become the next color, you see this sharp line between the two quantization steps.

Zoe: Yeah, especially still in reference to that with quite some customer’s video we see, especially for example, in the sky area and like the background in the wall. And if you see only some banding because you’ll see some artificial pattern down there, it’s very visible and quite annoying. And yeah, so like you point out, Thomas, that when we get to 10-bit, that’s actually a great tool to like naturally remove such kind of or avoid such kind of artifacts to be manifested in the final decoded video. So I’m also interested in that. So you particularly mentioned if you’re using software, decoding on 10-bit will be a challenge, right? Because it’s actually imposed some of the complexity down there.

17:31 Challenges decoding10-bit video

Thomas: Yep. So the problem with software encoding is that basically, we have to, software encoding and decoding is that we have to operate on basically word sizes that the CPU supports.

So a CPU supports, for example, 8-bits and 16-bits. And so we actually can’t generally use 10-bit storage in the CPU. We have to if we use 10-bit, we have to jump all the way to 16-bit, which basically means a lot of data paths become twice as slow. It’s not as bad as it sounds, though.

Because a lot of the, once it’s out of the memory and into like registers and being operated on, it’s not nearly so bad. So in theory, you might expect like eight to 10-bit because we’re moving from like 8-bit to 16-bit register to be twice as slow on software.

It’s not actually quite twice as slow. It is slower, but you don’t get quite as bad of a hit. There’s also some, I know there’s been some work on doing other things. There’s been attempts that using packed formats in memory to avoid reduced memory bandwidth, even in software encoders and decoders by basically when the frame is not being touched as much, it’s packed into like, the extra high bits are used for another byte of video. And so we don’t have to have those extra wasted bits in memory.

Zoe: Yeah. Do you have idea that how much you relate complexity with.. It’s not double. It’s definitely less than double in terms of complexity because– The decoder is usually because, one, the encoder sometimes complexity is more tolerant than on the decoder’s side. And so I just wonder, as you just mentioned, it’s not as bad if we have a 10-bit software decoder down there. And also furthermore, because you do mention that Vimeo is toward this effort. So in the current market, have you observed like the ecosystem, is there, for example, for supporting AV1 10-bit hardware wise?

Thomas: So if we’re relying on hardware decoders, yes. Basically, the ecosystem seems to have if there’s an AVI decoder, there’s an AV1 10-bit hardware decoder. We are because we’re looking at like doing this in web browsers and the like, we are gonna be using on say desktops, we’re gonna be using software decoders quite a bit of the time, because for like desktops, only very recent GPUs have AV1 hardware decode. But the ones that do do usually always have 10-bit, and they seem to work pretty well. So there will be, as we work on this, right, we’ll probably have more numbers in the future that tell exactly how, you know, what percentage of devices that we can support with this. And I imagine it’s also only gonna increase over time.

Zoe: Yeah, you mean the hardware support, right?

Thomas: Exactly, yup. And like even with, if you have software doing software and 10-bit is totally okay for decode for quite a lot of devices. If you wanna like do 4K decoding, it gets a little bit challenging, especially with like laptops. 1080 is usually totally fine.

Zoe: 1080 at 10-bit usually be fine. And what about on mobile?

Thomas: I actually haven’t run the numbers on mobile recently because there’s a lot of ARM improvements that have still gone into Dav1d. So I can’t actually give you the exact numbers on that one, so we’ll see. The good news is that on mobile, you usually don’t actually have to decode 4K because usually, the screen is not a 4K screen anyway. The bad news is, of course, that mobile is much higher, basically a much slower CPU, and the power consumption matters more. But it is surprisingly effective. I know on my current phone, I can decode 8-bit 1080P AV1 in software just fine. I haven’t actually tried 10-bit on it yet, so I’m not sure exactly on that phone what the results are.

Zoe: Yeah, you mentioned Dav1d on here, so you may want to also have some intro about what Dav1d is. Yeah, Nathan, Dav1d is not the name of a person. It’s the name of the decoder.

Nathan: The decoder, yeah.

Zoe: A open source community decoder. Yeah.

Nathan: And actually, we have Jean-Baptiste Kempf joining us on one of our episodes coming up here who was one of the founders of VideoLAN who developed that. So yeah, so if that’s interesting, then make sure you catch that episode as well ’cause he dives into the college days of when they first developed that. But yeah, Dav1d is the decoder.

Thomas: Yeah. It’s what enables us to do this fast software decode. It has handwritten assembly in it for both. It has a whole set of 8-bit assembly and a whole set of separate 10-bit assembly. And so as the 10-bit assembly has matured, it’s allowed us to basically really embrace that 10-bit in software much more than we could before.

Nathan: Interesting. Tell me if I’m correct, I’m asking the experts here. When HEVC got rolled out, I remember a similar conversation happening where initially, the whole conversation was, can we roll it out with software decoding because not a lot of devices have hardware support? And then eventually, more and more devices did HEVC decoding. Because I think when apple rolled it out as their standard, none of their phones supported it I don’t think. Maybe their newest one they are releasing is. Am I thinking right? Did this happen with HEVC as well?

Thomas: So it kind of did, but the one difference, I think, between the development theory processes of AV1 and HEVC is that with AV1, it was very important for us to have a very fast software decoder from the start.

So a lot of the parts of AV1 are designed so that they were not only efficient to implement in hardware ’cause eventually everyone’s gonna have hardware, but also so that until then, we could also have a fast software implementation. And there’s several features of AV1 that lend itself better to software implementation. Like for example, our transforms just have lower complexity than HEVCs because they’re designed differently. Or entropy coder is generally faster to implement in software from Daala. Not to brag about that, but it does help.

And the other thing is because we had Dav1d being developed near to the end of the standardization, and we spent a lot of effort on making sure Dav1d was a decoder that could be used by everyone. It’s very liberally licensed. Pretty much anyone can use it for free and put it, ship it in their product. And having that really good software implementation ready to use is a big help. The HEVC, I don’t think, ever quite got a similar software implementation that was equally as, you know, in the same ballpark of speed and optimization that Dav1d is.

Nathan: Fair enough. That’s exciting for the future of AV1, for sure.

Zoe: To show you, I learned quite a bit from you, Thomas. We really hope that we can have you again ’cause again, this is really just we want to promote technologies.

How Thomas deployed Daala, an AV1 predecessor
Daala tools later adopted by AV1
Thomas talked about the role that AV1's HDR capabilities
Challenges decoding10-bit video