Amiga.org
Operating System Specific Discussions => Amiga OS => Amiga OS -- Development => Topic started by: balrogsoft on November 26, 2006, 01:12:55 AM
-
Hi, i made my first demoscene effect in assembler, i made little things in assembler some years ago, but now i'm learning a lot. I decided to make my first assembler effect this weekend, and i got it! I made a Chunky to Copper routine, and a rotozoomer effect. On my A600 is very slow, but on my A1200 with a 060, it runs very smoothly. I'm not very good with assembler (i'm a newbie), if somebody can give some suggestions to optimize my code, i will share my code:
ASM Source code (http://www.balrogsoftware.com/ROTOZOOM.S)
Executable (http://www.balrogsoftware.com/ROTOZOOM.EX)
Thanks in advance.
-
*** Interlude. A specialist from Finland will be along shortly to answer all your questions ***
;-)
-
I'm a bit too drunk atm to make any detailed analysis, but I'll check it out later.
[EDIT]
But even when wasted I can give some ideas how to make it much faster (at least on older systems):
- Inline the innerloop (.NextX) subroutine calls.
- Use registers to store variables instead of memory. Do this at least for variables used in innerloop.
- Move out any 'y' related calculation from the .NextX loop, calculate these values before entering the X loop (this appears to have been done mostly, however).
[/EDIT]
-
:lol:
-
Piru wrote:
I'm a bit too drunk atm to make any detailed analysis, but I'll check it out later.
:crazy: :roflmao:
-
:pint: :pint: I think it helps!
:-o
-
Instead of calculating the coordinates of the source pixel in the texture every time you write a pixel to the copper list, I think it is faster to maintain a set of coordinates and modify them with predetermined values.
so your inner loop could look something like:
NextY:
move #30,d2 ; width of screen
NextX:
add.w d4,d0 ; modify source x coordinate (for every change
add.w d5,d1 ; modify source y coordinate in x direction)
move.w #$07C0,d3 ; mask for y coordinate (32x32 texture?)
and.w d1,d3
lea Texture,a1 ; pointer to texture
add.w d3,a1
move.w #$07C0,d3 ; mask for x coordinate
and.w d0,d3
lsr.w #5,d3
add.w d3,a1 ; now we have the correct address
move.w (a1),(a0)+ ; copy a pixel from texture to screen
subq.b #1,d2
bne NextX
add.w d6,d0 ; modify source x coordinate (for every change
add.w d7,d1 ; modify source y coordinate in y direction)
; probably something needs to be added to copperlist
; pointer in a0 so it points to next line
subq.b #1,LineY
bne NextY
hope that makes some sense and hasn't too many bugs...
for calculating the magic numbers in d4,d5,d6,d7
d4 = x_scale * cos(angle)
d5 = x_scale * -sin(angle)
d6 = y_scale * sin(angle)
d7 = y_scale * cos(angle)
(in my example code I use fixed-point math with 6 bits of fraction so multiply by 64)
-
I bet you guys don't even need to run the executable. You can probably see the effect just by reading the code :lol:
jk
--
moto
-
You can probably see the effect just by reading the code
:roflmao: :laughing:
-
very nice to see the word assembler these days
-
Try to find a use for d3, d4, d5, d6 and d7. It seems you are only using d0 to d2. There are various frequently constants and variables in your code you could put into these registers to improve access times, at least on slower machines. Even on faster ones, a read from the datacache is not as fast as a direct register access.
-
motorollin wrote:
I bet you guys don't even need to run the executable. You can probably see the effect just by reading the code :lol:
jk
--
moto
You don't see the (rotating) girl in a red dress? :lol:
-
Nice roto!!! but it's too slow on my a1200/030 :D
it would be nice to see it with some optimizations... ride on!
:-D :-D :-D
-
Thanks a lot for your help, i decided to recode the routine, now i calculate the rotation of 3 points, with this 3 points i calculate the increment for every horizontal and vertical step, in ADA forum, an user gave me a link with a tutorial to make this effect as i explained. Now my routine works at decent speed on a plain A1200. I uploaded the new source and executable with the same name.
-
Okay, took a look at the code and optimized it couple of cycles. It should be slightly faster, but not much. I also fixed some small bugs from the startup code, made it render as fast as possible (it's much slower than vertical blank, so rendering from VBL doesn't make much sense), removed chipmem allocation, fixed random returncode etc.
ROTOZOOM.S (http://www.iki.fi/sintonen/temp/ROTOZOOM.S)
ROTOZOOM.exe (http://www.iki.fi/sintonen/temp/ROTOZOOM.exe)
-
motorollin wrote:
I bet you guys don't even need to run the executable. You can probably see the effect just by reading the code :lol:
jk
--
moto
It's funny, and it's true...
Also, I found out certain errors in my code when I was asleep (wich were indeed errors wich I fixed next morning).
-
I made some changes from last version, i mixed Piru code with mine, my changes are: improve rotation (more smoothness), constants to define screen size, initial raster position, box map size and useless copper list instructions was removed.
I think that my next step will be make an AGA version with a 4x1 pixel definition, and a small 4k intro maybe...
@Piru: i will learn a lot from your changes, thank you.
-
Damn, I almost feel like posting my own copperchunky-rotozoomer here now, but I guess I won't ;)
-
I'm trying to make an AGA chunky 2 copper routine, but i can't get a 4x1 pixel definition, i got some sources from aminet, and i have some questions.
The examples i tested, create a copper list with 7 bitplanes, change colors of differents color banks, but chunky 2 copper routine work setting color0? then why AGA Chunky2copper changes others colors?
What need to be added to a copper list to have an AGA copper list with a 4x1 pixel definition? i made the swaps of hi and low words on copper list colors, but i got 8x1 pixel definition. I need to add 7 empty bitplanes?
I found on google groups, old messages explaining chunky2copper tips, but i can get it working, i read that you can get 4x1 pixel definition with ECS and 2x1 with AGA. Somebody can explain this tip better?
http://groups.google.es/group/comp.sys.amiga.programmer/browse_thread/thread/ec4c0c74850913fe/06f1e2632d149d7d?lnk=st&q=12bit+aga+copper+list&rnum=4&hl=es#06f1e2632d149d7d
Thanks in advance.
-
I've read two things about getting the 4x1 AGA copper chunky mode, I'm not sure what is right. 1) turn off all bitplanes 2) set the low bits in FMODE (register 1FC) and use a 32-bit aligned copper list
Other than that I guess it is the same as your current program.
-
DamageX wrote:
1) turn off all bitplanes 2) set the low bits in FMODE (register 1FC) and use a 32-bit aligned copper list
I tried it but i can't get 4x1 pixel resolution, i'm not sure if i need to turn off all bitplanes, i read that setting FMODE to 3 can speed up copper, but only if exist bitplanes, i tried it also and don't work.
-
hi,
I suspect that many info in the thread you linked are wrong or confused.
AFAIK, on both OCS and AGA the copper does 1 move every 8 pixel. So, with bitplanes turned off, you can only do 8x1.
To get higher resolution you have to use bitplanes to create a buffer for color changes done by the copper. The usual AGA chunkycopper uses 7 bitplanes, and the BPLCON4 features to switch the palette to bufferize copper moves. You can get
a 107x86 screen made up by 3x3 pixels and a smaller screen with 2x2 pixels.
regards
-
So what you're saying is there isn't really a way to make the copper run faster with AGA?
I guess I understand now. You use the copper to write colors into the palette registers instead of the background color register. Use 7 bitplanes (setup with fixed values counting 0,0,0,1,1,1,2,2,2,etc.) so that the copper can be writing to one half of the palette (128 colors) while the other half is being displayed. But to give the copper enough time to write that many colors to the palette, it must work during two or more scanlines, so the vertical resolution is limitted...