GREAT JOBBBB!!!! i just tried it on my a4k 060, its faster than megacz's port of mplayer. the test porky.mov is played back insync with audio on!. i tried also a couple of other more complex avis up to 640x480. up till now if they were playable then only with warpos enchanced players. now with low_quality_mp3 and -lowres and skipframe it is possible to view them even if with distorted sound and frameskip. if it could be optimized further we could even have a working 68k amiga media player. i do not dare to hope...
it seems to be quite stable too!! no hits whatsoever, responsive while execution!
@wawrzon:
Do you have 060@50MHz or maybe overclocked?
I think that I can try to optimize mpegaudio decoder to run faster on the 68060 CPU, but to test if it really work faster I would need someone with a real 68060 CPU + installed OxyPatcher.
The problem is that 68060 CPU don't have "muls" instruction and this instruction SHOULD be used a lot with mpegaudio decoder. Right now this instruction is NOT USED, because I compiled ffmpeg for 68060 CPU, so GCC "emulates" "muls" instruction with longer code supported by 68060 CPU. I think it may be possible that 68060 + OxyPatcher would be faster compared to GCC's "emulated" code.
On the WinUAE mpegaduio decoder compiled for 68040 (with "muls" instruction) needs only 29 sec. to decode 5 min. MP3 file. 68060 build needs for the same task 1:25 min., so you can see how slow GCC code is.
x86, PPC and ARM CPUs have belowe functions asm-optimized:
#ifndef MULL
# define MULL(a,b,s) (((int64_t)(a) * (int64_t)(b)) >> (s))
#endif
#ifndef MULH
//gcc 3.4 creates an incredibly bloated mess out of this
//# define MULH(a,b) (((int64_t)(a) * (int64_t)(b))>>32)
static av_always_inline int MULH(int a, int b){
return ((int64_t)(a) * (int64_t)(b))>>32;
}
#endif
#ifndef MUL64
# define MUL64(a,b) ((int64_t)(a) * (int64_t)(b))
#endif
#ifndef MAC64
# define MAC64(d, a, b) ((d) += MUL64(a, b))
#endif
#ifndef MLS64
# define MLS64(d, a, b) ((d) -= MUL64(a, b))
#endif
/* signed 16x16 -> 32 multiply add accumulate */
#ifndef MAC16
# define MAC16(rt, ra, rb) rt += (ra) * (rb)
#endif
/* signed 16x16 -> 32 multiply */
#ifndef MUL16
# define MUL16(ra, rb) ((ra) * (rb))
#endif
#ifndef MLS16
# define MLS16(rt, ra, rb) ((rt) -= (ra) * (rb))
#endif
Someone optimized for me two of the functions for 68020-68040 CPUs:
static inline av_const int64_t MAC64(int64_t d, int a, int b)
{
union { int64_t x; int hl[2]; } x = { d };
int h, l;
__asm__ ("muls.l %5, %2:%3 \n\t"
"add.l %3, %1 \n\t"
"addx.l %2, %0 \n\t"
: "+dm"(x.hl[0]), "+dm"(x.hl[1]),
"=d"(h), "=&d"(l)
: "3"(a), "dmi"(b));
return x.x;
}
#define MAC64(d, a, b) ((d) = MAC64(d, a, b))
static inline av_const int64_t MLS64(int64_t d, int a, int b)
{
union { int64_t x; int hl[2]; } x = { d };
int h, l;
__asm__ ("muls.l %5, %2:%3 \n\t"
"sub.l %3, %1 \n\t"
"subx.l %2, %0 \n\t"
: "+dm"(x.hl[0]), "+dm"(x.hl[1]),
"=d"(h), "=&d"(l)
: "3"(a), "dmi"(b));
return x.x;
}
#define MLS64(d, a, b) ((d) = MLS64(d, a, b))
but I need 68060 versions. The rest of the C functions should also be asm-optimized for 68060.