The more time taken by the CPU to decode the video means that it has the remaining time to do the display and audio. So if a 60 second clip takes 54 second to just decode you only have 6 seconds to display 60 seconds of video and audio which means that it cant be done in real time.
You don't have to decode the whole video before you can start playing it. As long as you're not using more than 6 seconds of CPU time to display the video frames and play the audio you can play the video in real time no problem. The only problem is if you use more than 60 seconds to decode a 60 second video.