Welcome, Guest. Please login or register.

Author Topic: Video overlay - essential for fast video playback  (Read 10511 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« on: June 28, 2012, 09:07:28 PM »
I think part of the problem is that many people are confusing the concept with with a particular implementation of the concept. To me, and probably to Hans, a video overlay is a very specific hardware implementation that is distinct from chromakeying and video texturing. To the next guy, maybe not. So, for the sake of clarity, I'll use the term "hardware video surface" to describe the general concept of a hardware assisted mechanism for rendering YCbCr formatted frames without the need for software colour conversion or scaling.

Nobody is arguing against the need for "hardware video surface" support in display drivers. Even when you don't need to scale the output, not having to do the colour conversion can make a difference in the amount of data that has to be transferred over the bus as the YCbCr data is typically more compact than RGB. In a perfect world, various parts of the hardware that are involved in video stream decoding would be openly documented and we could use those too.

The argument has been over the merits of particular ways of implementing support for "hardware video surfaces". The original hardware mechanism that we are used to from the days of Cirrus Logic and S3 chipsets of yesteryear (and variations thereof) are what are being described as obsolete here. I haven't looked too closely, but from what I gather, this particular method doesn't even exist in a lot of current generation cards. Instead, video colour conversion and scaling is handled by parts of the hardware 3D pipeline through the use of texture mapping, in which each YCbCr frame is used as a texture and rendered onto a quad (a rectangle primitive, optionally realized as a triangle fan or strip on hardware without direct support for quads).

Right now, as far as I know no specific implementation exists for the RadeonHD drivers. With adequate 3D support, however, implementing a hardware video surface via video texturing becomes feasible. Since both 3D and video playback are desirable features, it makes sense to focus on the overlap.
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #1 on: June 29, 2012, 09:13:33 PM »
Quote from: Akiko;698374
It goes to show how active the mods are here now days, talk about a slippery slope.


We do have lives too, you know.
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #2 on: June 29, 2012, 09:38:59 PM »
Quote from: klx300r;698379
@ Piru

I respect your work on MOS;however, I'll side with Hans and Karlos's opinions on modern video driver issues.


There's not really a side to take. There's no disagreement that being able to leverage hardware assisted video playback is a good thing (TM). This is all a question of implementation detail as to best to achieve it on more recent RadeonHD cards, as the existing drivers for these cards don't have a working implementation yet.

Ultimately, that decision will be down to Hans as he's the one doing the work, so other than being an interesting discussion on the relative merit of overlay vs chromakey vs video texture, it's all a bit academic.
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Video overlay - essential for fast video playback
« Reply #3 on: June 30, 2012, 08:42:15 AM »
Quote from: stevieu;698405
Anyone for tea?! ;)


A nice cup of Assam, drop of milk, no sugar, would be great, thanks!
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #4 on: June 30, 2012, 10:18:06 AM »
Quote from: Zac67;698433
This very much depends on the platform in question. While traditional overlay and 3D texturing are pretty much interchangeable from the user's POV, overlay is much more efficient.

It's only more efficient if both options exist and and takes less time than texturing. There was a time when that was a given, but these days, thanks to the 3D arms race (of which texture fill rate one of the things that got massively ramped up) it tends not to be the case. In fact, dedicated hardware has become less and less of a feature in favour of packing as many stream processors as possible onto the die.

As I said elsewhere, video texturing has a number of additional advantages too.

Quote
Obviously the latter is more work, so if power or GPU bandwidth is an issue you're better off with overlay. On a desktop computer where neither is a problem it may not be worth the trouble.

(It may be possible to read the raw video just once, process it in the GPU and shove the data directly into the display pipeline, but I don't think they're using this approach.)

GPU <-> memory bandwidth is literally in the hundreds of GB/s these days. Even my now rather old hat G200 based card can manage ~ 105GB/s for VRAM to VRAM (read-process-write) operations and that's while it's still displaying a fully composited desktop.
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #5 on: June 30, 2012, 11:18:01 AM »
Quote from: Fab;698438
@karlos

Fine, and what about the practical write speed on SAM 460 PCIe bus?


I don't own a SAM so I can't say - there are various benchmarks out there.

Quote
Being able to write yuv 420 data instead of plain rgb24 data is twice as fast... On machines like Macs, Pegasos or AmigaOnes it's a huge speedup.


Not disagreeing at all. Don't forget, I still use an BPPC/BVision combo and the VRAM write speed there is ~15MB/s or so, thanks to the bus.

Perhaps I didn't make it clear (though I believe I did) when I first mentioned video texturing but just in case, video texturing does not imply the transfer of already-decoded RGB data from RAM -> VRAM. The transformation from YCbCr -> RGB, along with scaling, is handled by the texture mapping hardware. Without this hardware colourspace conversion, a chip can't really be said to properly support video texturing.

Whatever bus transfer arguments you can make for traditional overlay, you can equally make for video texturing.
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #6 on: June 30, 2012, 11:34:27 AM »
Quote from: slayer;698441
So what does that mean for a HD7970 in the X1000 is capable of, once it's suitably tapped?

Hans is better placed to tell you than I am. The most advanced Radeon card I presently own is an R200 :lol:

But I expect it'll make a significant performance difference. What would be even better would be some support for programming the shader units directly. You could probably offload the entire decode video process then.

Of course, as a long time nVidia fanboi, I'm obliged to point out that all things Radeon suck.

However, that's not actually true at all (only their linux drivers suck in my experience) and I have to say, the 79xx series has certainly caught my attention. It's kicking some serious green corner ass at OpenCL centric tasks right now. It's almost an exact transposition of the G200 v HD47xx days when it was ATI being defensive over compute performance and pointing at gaming as the primary issue.

Quote
I've gone from having no recent graphics cards to 4 x 4890s of various configurations and 1 x 4870X2 and of course the 7970!

That's quite a step up.
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #7 on: June 30, 2012, 05:38:57 PM »
Quote from: klx300r;698472
don't bother posting real facts here Ian as Piru & friends only want data they can manipulate to serve their agenda. Sad part is he has the audacity to think people are daft enough not to realize:whack:

This discussion is getting tedious quickly and it's not helped by remarks like this.

Piru has stated in his opening post:

Quote
Overlay is absolutely critical for low-end systems displaying video

He clarified his use of the word "overlay" to mean any hardware mechanism that offloads colourspace conversion and scaling, or what I would call a "hardware video surface".

His point is thus entirely valid. If you have a low end CPU, you need all the hardware acceleration you can get because colourspace conversion from YUV to RGB is more expensive than you might think (can be done at copyspeed on many faster processors though) and scaling, particularly enlarging with filtering, is not cheap.

And even if you do have a faster CPU that is capable of both without dropping frames, hardware acceleration still gives you better performance per watt, less heat, fan noise etc.
« Last Edit: June 30, 2012, 05:41:12 PM by Karlos »
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #8 on: June 30, 2012, 07:31:15 PM »
Quote from: Piru;698480
Hans has been saying that it is unlikely that he will support the old Picasso96 PIP APIs. This is unfortunate as it will lock out textured video from all old applications using the API. In that sense overlay indeed has been obsoleted for OS4.

I think that's extrapolating what he says to the worst case outcome. I offer the following interpretation. He's not going to implement video texturing within the constraints of the PiP interface. That doesn't preclude any wrapper / glue logic to expose whatever new method/API he implements to the PiP interface.

It wouldn't even require him to do it or for him to release his driver source. Picasso96 uses a bunch of structures filled with function pointers for driver implementations to override. Anybody familiar enough with P96 driver development could probably have a go once there is some method in existence to create the glue for. (All this talk of overlay has piqued my interest in implementing it for the Permedia2 Picasso96 driver which presently has no PiP either. It too will require a video texture implementation.)

Quote
This also means that in the future if some application wishes to support fast video display on OS4 the application has to have two code paths, one for classic Picasso96 PIP API (for older graphics cards) and second for some yet to be determined new API (for newer graphics cards). Hardly an ideal solution.

Only in the worst case scenario.

Quote

It actually is considerably slower to use textured video. There is some setup involved, and you need to wait for the operation to finish (actual performance of course depends on the implementation details).

I don't expect speed to be a major issue. My G200 uses video texturing and handles 1080p just fine and some of these later HD5xxx cards are considerably more powerful. The bottleneck is going to end up being the CPU decode, especially if DMA retrieval of texture data is thrown into the mix.

Quote
Classic overlay gives the application a frame buffer it can write to, and it will be displayed automagically without any extra calls or waiting needed. It is even possible to completely disable the OS and bang the framebuffer and it will update on screen just fine (this is btw why with Mediator setup you could route amiga display to a video grabbing card, and then display the graphics in an overlay window, and run HW banging games and demos).

This is a distinct advantage for such machines, but I doubt that, other than on actual classic Amiga machines, there'll be much hardware banging going on.

Quote
If there's a possibility to have both classic overlay and textured video, overlay would be the better choice for the low-end systems.

That rather depends. Your assertion assumes that video texturing will add significant latency to the operation due to having to wait for texture mapping and so on, but texture fill rate is already measured in gigatexels/second.

If my experience at messing around with decode on my linux system is anything to go by, it will probably take longer to transfer a single 1080p frame of YUV texels to the card from system memory (even with DMA) than it will for the GPU to fill the framebuffer with the RGB output. The reason is obvious here:
Code: [Select]
karlos@Megaburken-II:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release$ ./bandwidthTest
[bandwidthTest] starting...
./bandwidthTest Starting...

Running on...

 Device 0: GeForce GTX 275
 Quick Mode

 Host to Device Bandwidth, 1 Device(s), Paged memory
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     2438.4

 Device to Host Bandwidth, 1 Device(s), Paged memory
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     1961.3

 Device to Device Bandwidth, 1 Device(s)
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     105118.2

[bandwidthTest] test results...
PASSED

The fastest host->device transfer, using DMA, achieves a modest 2438.4 MB/s over a PCIe 16x connection. A 16-bit YUV 1080p frame occupies ~3.955MB, so you'd expect to be able to transfer it in around 1.62 ms. (none of these figures get particularly lower if I use a slower CPU simply because everything in this test is directed by the GPU).

Once on the card, even high quality floating point based conversion to YUV using, for example, CUDA is entirely memory IO limited, let alone doing so using direct hardware support for YUV texture formats and simply blasting out a textured quad to the framebuffer. You need to read 3.955MB and write 7.91MB entirely on the device now (also note, writing VRAM is generally a bit faster than reading, even for the GPU) and you have 105GB/s copy bandwidth to play with. All things being equal, the bottleneck is not on the card here, it's down to how quickly the rest of the system can get the next YUV frame ready. And this is on a card several years old now.

Moving away from this machine, even in the lousiest scenario, where the PPC has to transfer the frame data to VRAM itself because for some reason DMA transfer over the PCIe bus wasn't possible, the bottleneck is still not going to be on the graphics card side, if it is anywhere it will be in the PIO transfer of data to the VRAM for a faster CPU or still in the decode phase for a slower one.

Quote
This also means that it likely is a better idea to use a graphics card with true overlay in such low-end systems instead of a card that doesn't have the classic overlay.

That's certainly true at the moment.

Quote
Textured video has the benefit of being part of the actual display frame buffer though, and then you can perform other effects on it (such as transparency), and for example take screenshot of the video.

Not to mention postprocessing (deinterlace, denoising, deblocking etc), if you have the necessary shader infrastructure for it.
« Last Edit: June 30, 2012, 07:36:26 PM by Karlos »
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #9 on: June 30, 2012, 09:11:49 PM »
Quote from: Piru;698495
True. However, the resulting bitmap needs to be blitted to the screen, too, unless if it somehow bypasses the compositing stage (of course possible as well). While this is not going to take much time, it still is yet another step that needs to be performed compared to classic overlay.

Again, that is possible. However, a lot of graphics hardware allows you pass a rectangle clip list for primitive rendering, so in theory, could can skip a blit too and instead, you render your textured quad directly to the framebuffer, passing it a clip list derived from the actual layers on the screen. Essentially, the hardware renders the the same primitive multiple times for each clip region in the list. Even the Permedia2 could do that but in practise the driver doesn't do it.

Quote
If implemented properly there will be little difference in performance. It will be slower than the old style overlay.

Only on cards that have it. Comparing "Old style" overlay on, for instance, an old Cirrus logic versus video texturing on a HD7970 seems a bit chalk and cheese to me :) It is only a valid comparison on hardware that supports both methods.

Quote
How much, depends on the implementation (if everything is done correctly there's only minimal performance hit, if it's not; say no PIO texture fetch, extra blit required, then it could make a difference between 720P being usable or not).

I really do believe the video stream decode phase is what will make that determination on hardware like the Sam even if the video texture implementation ends up being somewhat sub optimal. Even if the 3D hardware has to render to an offscreen bitmap and then ClipBlit it into a window on the desktop, those operations on modern hardware are so much faster than on, say R200 era hardware that it would make your hair stand on end.

In an optimal case, you'd be using a fullscreen display anyway and a lot of pipelining/parallelism becomes possible. In my imaginary, ideal player:

RAMDAC
Displaying Nth RGB frame from the active Screen's BitMap

GPU 3D Core:
Converting / scaling the (N+1)th YUV frame from primary texture buffer in VRAM, rendering unclipped quad directly to secondary (or tertiary) ScreenBufer BitMap

GPU Memory controller / Busmaster:
Transferring the (N+2)th YUV frame from system RAM to secondary texture buffer in VRAM

CPU
Decoding the (N+3)th frame in the stream to a secondary system RAM buffer, mixing audio etc.

Karlos
Eating popcorn

Quote
Of course. The example was merely to illustrate how the classic overlay is completely automatic and doesn't require any extra blitting, copying or such (that is it comes as "free" resource-consumption-wise).

Don't get me wrong, we all like free stuff :)

Quote
Anyhow, it will be some time before we will see how it performs. Meanwhile anyone wanting to play videos on low-end systems such as SAM460 with Radeon HD are out of luck.

Tell it to my BVision :razz:
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #10 on: June 30, 2012, 11:36:38 PM »
Quote from: Piru;698515
"Tseh Tseh, BVision, jump into that PCI/PCIe slot!"


;)

These RadeonHD owners don't know they're born. In my day we had blinkenlights and we thought we were lucky...

Quote
Anyway, I heard you've been working on Radeon R100/R2x0 as well.

Yeah, but mostly 3D related for now. I narrowly snatched defeat from the jaws of victory in a recent refactor of the R100 code, so right now I'm working through the changesets in my local repository to see what heinous crime I just committed.

Quote
I almost feel sorry for you, such a pain in the behind...

I've been called worse :lol:
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #11 on: June 30, 2012, 11:47:15 PM »
Quote from: Akiko;698519
@Karlos



Silly hypothetical question, do you think this driver could be specially adapted without DMA for a mediator busboard running OS4.1 classic?


Which driver?
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #12 on: July 01, 2012, 12:08:48 AM »
Quote from: Akiko;698521
Sorry any of the currently worked on Radeon HD drivers, as I understand PCI Variants of these cards exist.


It could get pretty messy, particularly with the Mediator1200. The thing is, by the time you get to the classics, the disparity between the processor and the graphics card becomes so great that you have to wonder what the point would be. Classics have a hard time keeping the old radeon cards busy as it is.
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #13 on: July 01, 2012, 12:06:24 PM »
Quote from: magnetic;698546
karlos

I have to say you are giving a very nice technical argument for overlay not important going forward,

Actually, I've not. I've simply made the case that what you all are calling "overlay" is not a single implementation. I chose the terms "hardware video surface" (the abstract concept), "video texturing" (an implementation) and "overlay" (another implementation) carefully and with good reason. To avoid the same arguing in circles that people seem to be insisting on doing.

Quote
however you are missing the main thrust of this whole argument is that CURRENT OS4 SAM users will NOT have proper video playback for a long time due to lack of overlay.

I'm not missing it, I'm confused as to which part of this argument is actually valid. If you own a Sam with a RadeonHD already and you absolutely can't wait for some form of video acceleration, you can opt for an R200 card and shelve your RadeonHD until the work has been done. My R200 (passively cooled, 256MB, 128-bit VGA+DVI) cost me 10 UKP. You might need to reconfigure and get a PCIe SATA card to free up the PCI slot depending on your configuration.

There's no magical legacy overlay feature that can be turned on for RadeonHD users since later RadeonHD cards don't have the feature. And why? Basically, because AMD decided it was obsolete and could find better use for the silicon. It has been replaced by video texturing. In order to use video texturing, you have to have working support for the hardware level 3D. If you can think of a method to implement it that requires no access to the 3D hardware on these cards, I'm sure Hans would love to hear from you as it would surely save him some development man hours.

Incidentally, I do find it strange that the people making the most noise about this aren't generally Sam users.

Quote
So, for Sam, A1, and Peg2 OS4 users Overlay is far from Obselete. Actually its Critical.

All I can say to you is re-read the various threads that have discussed this to death already and understand the difference between overlay as a concept, ie "hardware video surface" and "overlay" as the specific hardware implementation of that concept. Not a single person in any of the numerous threads across various forums has stated that "overlay" (hardware video surface) are obsolete, even if you have ample CPU power to do it all in software to a framebuffer.
« Last Edit: July 01, 2012, 12:15:22 PM by Karlos »
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: Video overlay - essential for fast video playback
« Reply #14 on: July 01, 2012, 05:30:38 PM »
Quote from: klx300r;698571
cough cough, strange indeed as the same people show up on every OS4.x related thread as well.

It's Piru's thread and it's not specifically about OS4. People barging in with (false) statements amounting to "SAM HAS NO POSSIBLE OVERLAY! OMGZOR SLOW VID 4EVR SUXXOR!11! TEH PEG2 FTW!!11!" and others turning it into which-developers-are-more-qualified-to-comment spoiled what was actually a perfectly reasonable discussion about the relative merits and implementation details of different methods of providing hardware assist for video playback.
int p; // A