Yes, without SDL it might be faster because less code would be used. Also, SDL is not asm-optimized, only C code.
most speedcritical is in sdl the YUV to RGB conversion and scaler.MOS or OS4 SDL support working cgx or P96 overlay code, so code can merge easy i think.
but if on classic amigas cards are out that support HW overlay Bug free i dont know.
i remember on my cybervision PPC and the phase5 videoplayer that support overlay (forget name) crash randomly.
so i also dont know if overlay can work stable.
sound is slow, because it use 64 bit commands a 68060 dont have.this is used in the DCT code and antialiasing Filter code.
if somebody know a fast DCT code for 68k cpu, then this func can add.i dont like the gcc assembler mess and i like a amiga dspmath.library i can code in amiblitz, that is systemspecific for 68060 winuae or other CPU.
on winuae it call then native ISSE code, for 68060 asm optimized code.
BTW: You forget in your feature list, that ffplay with no filename open a amiga filerequester.
here is the code of the DCT.i think this are the few speedcritical lines but MULH use a 64 bit mul.the antialiasing filter can also work with FPU, maybe thats faster on a 060.
maybe this code can move in a amiga library imdct36 and more programs can use it ?
same with other speedcritical funcs ?
static void imdct36(int *out, int *buf, int *in, int *win)
{
int i, j, t0, t1, t2, t3, s0, s1, s2, s3;
int tmp[18], *tmp1, *in1;
for(i=17;i>=1;i--)
in
+= in[i-1];
for(i=17;i>=3;i-=2)
in += in[i-2];
for(j=0;j<2;j++) {
tmp1 = tmp + j;
in1 = in + j;
t2 = in1[2*4] + in1[2*8] - in1[2*2];
t3 = in1[2*0] + (in1[2*6]>>1);
t1 = in1[2*0] - in1[2*6];
tmp1[ 6] = t1 - (t2>>1);
tmp1[16] = t1 + t2;
t0 = MULH(2*(in1[2*2] + in1[2*4]), C2);
t1 = MULH( in1[2*4] - in1[2*8] , -2*C8);
t2 = MULH(2*(in1[2*2] + in1[2*8]), -C4);
tmp1[10] = t3 - t0 - t2;
tmp1[ 2] = t3 + t0 + t1;
tmp1[14] = t3 + t2 - t1;
tmp1[ 4] = MULH(2*(in1[2*5] + in1[2*7] - in1[2*1]), -C3);
t2 = MULH(2*(in1[2*1] + in1[2*5]), C1);
t3 = MULH( in1[2*5] - in1[2*7] , -2*C7);
t0 = MULH(2*in1[2*3], C3);
t1 = MULH(2*(in1[2*1] + in1[2*7]), -C5);
tmp1[ 0] = t2 + t3 + t0;
tmp1[12] = t2 + t1 - t0;
tmp1[ 8] = t3 - t1 - t0;
}
i = 0;
for(j=0;j<4;j++) {
t0 = tmp;
t1 = tmp[i + 2];
s0 = t1 + t0;
s2 = t1 - t0;
t2 = tmp[i + 1];
t3 = tmp[i + 3];
s1 = MULH(2*(t3 + t2), icos36h[j]);
s3 = MULL(t3 - t2, icos36[8 - j], FRAC_BITS);
t0 = s0 + s1;
t1 = s0 - s1;
out[(9 + j)*SBLIMIT] = MULH(t1, win[9 + j]) + buf[9 + j];
out[(8 - j)*SBLIMIT] = MULH(t1, win[8 - j]) + buf[8 - j];
buf[9 + j] = MULH(t0, win[18 + 9 + j]);
buf[8 - j] = MULH(t0, win[18 + 8 - j]);
t0 = s2 + s3;
t1 = s2 - s3;
out[(9 + 8 - j)*SBLIMIT] = MULH(t1, win[9 + 8 - j]) + buf[9 + 8 - j];
out[( j)*SBLIMIT] = MULH(t1, win[ j]) + buf[ j];
buf[9 + 8 - j] = MULH(t0, win[18 + 9 + 8 - j]);
buf[ + j] = MULH(t0, win[18 + j]);
i += 4;
}
s0 = tmp[16];
s1 = MULH(2*tmp[17], icos36h[4]);
t0 = s0 + s1;
t1 = s0 - s1;
out[(9 + 4)*SBLIMIT] = MULH(t1, win[9 + 4]) + buf[9 + 4];
out[(8 - 4)*SBLIMIT] = MULH(t1, win[8 - 4]) + buf[8 - 4];
buf[9 + 4] = MULH(t0, win[18 + 9 + 4]);
buf[8 - 4] = MULH(t0, win[18 + 8 - 4]);
}