Instead of calculating the coordinates of the source pixel in the texture every time you write a pixel to the copper list, I think it is faster to maintain a set of coordinates and modify them with predetermined values.
so your inner loop could look something like:
NextY:
move #30,d2 ; width of screen
NextX:
add.w d4,d0 ; modify source x coordinate (for every change
add.w d5,d1 ; modify source y coordinate in x direction)
move.w #$07C0,d3 ; mask for y coordinate (32x32 texture?)
and.w d1,d3
lea Texture,a1 ; pointer to texture
add.w d3,a1
move.w #$07C0,d3 ; mask for x coordinate
and.w d0,d3
lsr.w #5,d3
add.w d3,a1 ; now we have the correct address
move.w (a1),(a0)+ ; copy a pixel from texture to screen
subq.b #1,d2
bne NextX
add.w d6,d0 ; modify source x coordinate (for every change
add.w d7,d1 ; modify source y coordinate in y direction)
; probably something needs to be added to copperlist
; pointer in a0 so it points to next line
subq.b #1,LineY
bne NextY
hope that makes some sense and hasn't too many bugs...
for calculating the magic numbers in d4,d5,d6,d7
d4 = x_scale * cos(angle)
d5 = x_scale * -sin(angle)
d6 = y_scale * sin(angle)
d7 = y_scale * cos(angle)
(in my example code I use fixed-point math with 6 bits of fraction so multiply by 64)