Welcome, Guest. Please login or register.

Author Topic: PED81C - pseudo-native, no C2P chunky screens for AGA  (Read 6371 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline saimoTopic starter

PED81C - pseudo-native, no C2P chunky screens for AGA
« on: March 05, 2022, 10:29:14 AM »
PED81C is a video system for AGA Amigas that provides pseudo-native chunky screens, i.e. screens where each byte in CHIP RAM corresponds to a dot on the display. In short, it offers chunky screens without chunky-to-planar conversion or any CPU/Blitter/Copper sweat.

Download: https://www.retream.com/PED81C

Some examples:
 * https://www.youtube.com/watch?v=0xunQ6ldVKU
 * https://www.youtube.com/watch?v=4eikEo45v1I
 * https://www.youtube.com/watch?v=ebxwKm9K4Os
 * https://www.youtube.com/watch?v=tLtLhJXInOY

Notes:
 * due to the nature of the system, the videos must be watched in their original size (1920x1080);
 * YouTube's video processing has slightly reduced the visual quality (i.e. the result is better on real machines).

For the details, please check out the documentation included in the archive.

Originally I had planned to use PED81C to make a new game. However, I could not come up with a satisfactory idea; moreover, due to personal reasons, I had to stop software development. Given that I could not predict when/if I would able to produce something with PED81C and given that the war in Ukraine put the world in deep uncertainty, I decided that it was better to release PED81C to avoid that it went wasted and also as a gift to the Amiga community.
I must admit I have been tempted to provide an implementation of PED81C in the form of a library or of a collection of functions, but since setting up PED81C screens is easy and since general-purpose routines would perform worse than tailor-made ones, I decided to let programmers implement it in the way that fits best their projects.
« Last Edit: April 02, 2024, 09:10:31 PM by saimo »
RETREAM - retro dreams for Amiga, Commodore 64 and PC
 
The following users thanked this post: klx300r

Offline klx300r

  • Amiga 1000+AmigaOne X1000
  • Hero Member
  • *****
  • Join Date: Sep 2007
  • Posts: 3245
  • Country: ca
  • Thanked: 20 times
  • Gender: Male
    • Show only replies by klx300r
    • http://mancave-ramblings.blogspot.ca/
Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #1 on: March 05, 2022, 09:23:13 PM »
I think it will be awesome for a new horizontal shooter ;D
____________________________________________________________________
c64-dual sids, A1000, A1200-060@50, A4000-CSMKIII
Indivision AGA & Catweasel MK4+= Amazing
! My Master Miggies-Amiga 1000 & AmigaOne X1000 !
--- www.mancave-ramblings.blogspot.ca ---
  -AspireOS.com & Amikit- Amiga for your netbook-
***X1000- I BELIEVE *** :angel:
 

Offline saimoTopic starter

Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #2 on: June 21, 2023, 04:05:16 PM »
Uploaded an archive with updated documentation.
While at it, given that I was asked for a source code example, I whipped up an AMOS Professional program that shows how to set up a PED81C screen and to perform some basic operations on it - hopefully, this will be easy to understand and also open the door to AMOS programmers. The program source is included in the archive.

Code: [Select]
'-----------------------------------------------------------------------------
'$VER: PED81C example 1.3 (28.11.2023) (c) 2023 RETREAM
'Legal terms: please refer to the accompanying documentation.
'www.retream.com/PED81C
'contact@retream.com
'-----------------------------------------------------------------------------

'-----------------------------------------------------------------------------
'DESCRIPTION
'This shows how to set up a PED81C screen and to perform some basic operations
'on it.
'Screen features:
' * equivalent to a 319x256 LORES screen
' * 160 dots wide raster
' * single buffer
' * blanked border
' * 64-bit bitplanes fetch mode
' * CMYW color model
'
'NOTES
'The code is written to be readable, not to be general-purpose/optimal.
'-----------------------------------------------------------------------------

'-----------------------------------------------------------------------------
'GLOBAL VARIABLES

Global RASTERADDRESS,RASTERWIDTH,RASTERHEIGHT,RASTERSIZE

RASTERWIDTH=160
RASTERHEIGHT=256
RASTERSIZE=RASTERWIDTH*RASTERHEIGHT

'-----------------------------------------------------------------------------
'MAIN

'Initialize everything.

_INITIALIZE_AMOS_ENVIRONMENT
_INITIALIZE_SCREEN

'If the initialization succeeded, load a picture into the raster and, in case
'of success, execute a simple effect on it.

If Param
   _LOAD_PICTURE_INTO_RASTER["picture-160x256.raw"]
   If Param
      _TURN_DISPLAY_DMA_ON[0]
      _RANDOMIZE_RASTER
      _TURN_DISPLAY_DMA_OFF
   End If
End If

'Deinitialize everything.

_DEINITIALIZE_SCREEN
_RESTORE_AMOS_ENVIRONMENT

'-----------------------------------------------------------------------------
'ROUTINES

Procedure _ALLOCATE_BITPLANE[BANKINDEX,SIZE]
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Allocates a CHIP RAM buffer to be used as a bitplane.
   '
   'INPUT
   'BANKINDEX = index of bank to use
   'SIZE      = size [bytes] of bitplane
   '
   'OUTPUT
   '64-bit-aligned bitplane address (0 = error)
   '
   'WARNINGS
   'The buffer must be freed with Erase BANKINDEX or Erase All.
   '--------------------------------------------------------------------------

   Trap Reserve As Chip Data BANKINDEX,SIZE+8
   If Errtrap=0 Then A=(Start(BANKINDEX)+7) and $FFFFFFF8

End Proc[A]
Procedure _DEINITIALIZE_SCREEN
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Deinitializes the screen.
   '
   'WARNINGS
   'Can be called only if the display is off.
   '--------------------------------------------------------------------------

   Erase All
   Doke $DFF1FC,0 : Rem FMODE

End Proc
Procedure _INITIALIZE_AMOS_ENVIRONMENT
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Ensures the program cannot be interrupted or brought to back, and turns
   'off the AMOS video system.
   '--------------------------------------------------------------------------

   Break Off
   Amos Lock
   Comp Test Off
   Auto View Off
   Update Off
   Copper Off
   _TURN_DISPLAY_DMA_OFF

End Proc
Procedure _INITIALIZE_SCREEN
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Initializes the screen.
   '
   'OUTPUT
   '-1/0 = OK/error
   '
   'WARNINGS
   '_DEINITIALIZE_SCREEN[] must be called also in case of failure.
   '
   'NOTES
   'Sets RASTERADDRESS.
   '--------------------------------------------------------------------------

   'Allocate the raster.

   _ALLOCATE_BITPLANE[10,RASTERSIZE] : If Param=0 Then Pop Proc[0]
   RASTERADDRESS=Param

   'Allocate and fill the selector bitplanes.

   _ALLOCATE_BITPLANE[11,RASTERSIZE] : If Param=0 Then Pop Proc[0]
   B3A=Param
   Fill B3A To B3A+RASTERSIZE,$55555555

   _ALLOCATE_BITPLANE[12,RASTERSIZE] : If Param=0 Then Pop Proc[0]
   B4A=Param
   Fill B4A To B4A+RASTERSIZE,$33333333

   'Set the chipset.

   DIWSTRTX=$81+(160-RASTERWIDTH)
   DIWSTRTY=$2C+(128-RASTERHEIGHT/2)
   DIWSTRT=((DIWSTRTY and $FF)*256) or((DIWSTRTX+1) and $FF)
   DIWSTOPX=DIWSTRTX+RASTERWIDTH*2
   DIWSTOPY=DIWSTRTY+RASTERHEIGHT
   DIWSTOP=((DIWSTOPY and $FF)*256) or(DIWSTOPX and $FF)
   DIWHIGH=((DIWSTOPX and $100)*32) or(DIWSTOPY and $700) or((DIWSTRTX and $100)/8) or(DIWSTRTY/256)
   DDFSTRT=(DIWSTRTX-17)/2
   DDFSTOP=DDFSTRT+RASTERWIDTH-8

   Doke $DFF092,DDFSTRT
   Doke $DFF094,DDFSTOP
   Doke $DFF08E,DIWSTRT
   Doke $DFF090,DIWSTOP
   Doke $DFF1E4,DIWHIGH

   Doke $DFF100,$4241 : Rem BPLCON0
   Doke $DFF102,$10 : Rem BPLCON1
   Doke $DFF104,$224 : Rem BPLCON2
   Doke $DFF108,0 : Rem BPLMOD1
   Doke $DFF10A,0 : Rem BPLMOD2
   Doke $DFF1FC,$3 : Rem FMODE

   'Set COLORxx.

   Doke $DFF106,$20 : Rem BPLCON3
   Doke $DFF180,0
   Doke $DFF182,$88
   Doke $DFF184,$88
   Doke $DFF186,$FF
   Doke $DFF188,0
   Doke $DFF18A,$808
   Doke $DFF18C,$808
   Doke $DFF18E,$F0F
   Doke $DFF190,0
   Doke $DFF192,$880
   Doke $DFF194,$880
   Doke $DFF196,$FF0
   Doke $DFF198,0
   Doke $DFF19A,$888
   Doke $DFF19C,$888
   Doke $DFF19E,$FFF
   Doke $DFF106,$220 : Rem BPLCON3
   Doke $DFF180,0
   Doke $DFF182,0
   Doke $DFF184,0
   Doke $DFF188,0
   Doke $DFF18A,0
   Doke $DFF18C,0
   Doke $DFF190,0
   Doke $DFF192,0
   Doke $DFF194,0
   Doke $DFF198,0
   Doke $DFF19A,0
   Doke $DFF19C,0
   Doke $DFF106,$20 : Rem BPLCON3

   'Build a Copperlist that sets the bitplanes pointers.

   Cop Movel $E0,RASTERADDRESS
   Cop Movel $E4,RASTERADDRESS
   Cop Movel $E8,B3A
   Cop Movel $EC,B4A
   Cop Swap

End Proc[-1]
Procedure _LOAD_PICTURE_INTO_RASTER[FILEPATH$]
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Loads a raw 8-bit chunky picture into the raster, ensuring that its size
   'is correct.
   '
   'IN
   'FILEPATHS = path of picture file
   '
   'OUTPUT
   '-1/0 = OK/error
   '--------------------------------------------------------------------------

   Trap Open In 1,FILEPATH$ : If Errtrap Then Pop Proc[0]
   L=Lof(1)
   Close(1)
   If L<>RASTERSIZE Then Pop Proc[0]
   Trap Bload FILEPATH$,RASTERADDRESS

End Proc[Errtrap=0]
Procedure _RANDOMIZE_RASTER
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Randomizes the raster by swapping 16 dots per frame, until a mouse button
   'is pressed.
   '--------------------------------------------------------------------------

   XM=RASTERWIDTH-1
   YM=RASTERHEIGHT-1
   Repeat
      C=16
      While C
         X0=Rnd(XM)
         Y0=Rnd(YM)
         X1=Rnd(XM)
         Y1=Rnd(YM)
         A0=Y0*RASTERWIDTH+X0+RASTERADDRESS
         A1=Y1*RASTERWIDTH+X1+RASTERADDRESS
         C0=Peek(A0)
         Poke A0,Peek(A1)
         Poke A1,A0
         Dec C
      Wend
      _WAIT_SCREEN_BOTTOM
   Until Mouse Click

End Proc
Procedure _RESTORE_AMOS_ENVIRONMENT
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Restores the AMOS environment.
   '--------------------------------------------------------------------------

   Copper On
   Update On
   Auto View On
   Amos Unlock
   Break On
   _TURN_DISPLAY_DMA_ON[$20]

End Proc
Procedure _TURN_DISPLAY_DMA_OFF
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Disables the bitplanes, Copper and sprites DMA.
   '--------------------------------------------------------------------------

   _WAIT_SCREEN_BOTTOM
   Doke $DFF096,$3A0 : Rem DMACON

End Proc
Procedure _TURN_DISPLAY_DMA_ON[SSPRITESFLAG]
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Enables the bitplanes and Copper DMA.
   '
   'INPUT
   'SSPRITESFLAG = $20/0 = turn / do not turn sprites on
   '
   'WARNINGS
   'The chipset must have been set up properly.
   '--------------------------------------------------------------------------

   _WAIT_SCREEN_BOTTOM
   Doke $DFF096,$8380 or SSPRITESFLAG : Rem DMACON

End Proc
Procedure _WAIT_SCREEN_BOTTOM
   '--------------------------------------------------------------------------
   'DESCRIPTION
   'Waits for the bottom of the screen.
   '--------------------------------------------------------------------------

   While Deek($DFF004) and $3 : Wend
   Repeat : Until(Leek($DFF004) and $3FF00)>$12C00

End Proc
« Last Edit: November 29, 2023, 12:06:23 PM by saimo »
RETREAM - retro dreams for Amiga, Commodore 64 and PC
 

Offline Coolrasta

  • Newbie
  • *
  • Join Date: Jul 2023
  • Posts: 1
    • Show only replies by Coolrasta
Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #3 on: July 18, 2023, 08:41:40 AM »
It's wonderful to see that you've included a source code example for PED81C! This will undoubtedly aid programmers in understanding how to set up a PED81C screen and perform some basic operations on it.

Your detailed documentation and the comments within the code are also very helpful in understanding how each section of the program functions. This demonstrates the attention you've put into making it as accessible and comprehensible as possible for other developers.

Thank you again for sharing these resources.

My website : www.la-crypte-aux-monnaies.fr
« Last Edit: September 11, 2023, 03:36:37 PM by Coolrasta »
 

Offline saimoTopic starter

Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #4 on: July 18, 2023, 08:44:30 PM »
@Coolrasta

Thank you for the nice comments. I'm glad you appreciate the documentation.
RETREAM - retro dreams for Amiga, Commodore 64 and PC
 

Offline saimoTopic starter

Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #5 on: November 28, 2023, 10:59:48 PM »
I have just released a little update, accompanied by the PED81C Voxel Engine (PVE), i.e. a new demo. If you can't be bothered trying it yourself, you can see it in this video - but beware: YouTube's video compression degraded the visual quality (especially the colors saturation and brightness).

https://www.youtube.com/watch?v=0xunQ6ldVKU

Details about PVE straight from the manual:
Code: [Select]
--------------------------------------------------------------------------------
OVERVIEW

PVE is an experiment to test the graphical quality and computational performance
of the PED81C system. It allows to move freely around a typical voxel landscape.


--------------------------------------------------------------------------------
GETTING STARTED

PVE requires:
 * Amiga computer
 * AGA chipset
 * 200 kB of CHIP RAM
 * 4 MB of FAST RAM
 * PAL SHRES support
 * digital joystick/joypad and mouse
 * 2.1 MB of storage space

If the monitor / graphics card / scan doubler do(es) not support SHRES, the
colors will look off or even not show at all.
For example:
 * MNT's VA2000 graphics card displays only the even columns of pixels, so only
   reds and blues show;
 * Irix Labs' ScanPlus AGA displays only the odd columns of pixels (contrary to
   how is was originally marketed), so only greens and grays show.

To install PVE, unpack the LhA archive to any directory of your choice.

To start PVE, open the program directory and double-click the program icon from
Workbench or execute the program from shell.


--------------------------------------------------------------------------------
MISCELLANEOUS

* The map wraps around at its edges.
* The number shown in the top-left corner of the action screen indicates the
  number of frames rendered in the last second.
* Upon returning to AmigaOS, PVE prints out:
   * the total number of frames rendered;
   * the total number of frames shown;
   * the average number of frames rendered per second;
   * the average time (expressed in frames) taken by the rendering of a frame.


--------------------------------------------------------------------------------
TECHNICAL NOTES

* The graphics are first rendered in a raster in FAST RAM and then copied to a
  triple-buffered PED81C raster in CHIP RAM.
* The screen resolution is 1020x200 SHRES pixels, which correspond to 255x200
  LORES-sized dots and to 128x200 logical dots.
* Rendering is done by columns, from bottom to top and then left to right.
* The code applies a depth of 256 steps per column, so it evaluates 256*128 =
  32768 dots per frame (and then renders only those which are actually visible).
* The code is 100% assembly.
* The code is optimized for 68030.
* The program supports only maps of 1024x1024 pixels, but it can be made to
  support maps of other sizes by simply redefining the width and height
  constants and reassembling the code.
* The height of the camera adapts automatically to that of the point it is at,
  but it can be made user-controllable and its maximum value can be increased
  almost to the point that the lanscape disappears at the bottom of the screen.
* On an Amiga 1200 equipped with a Blizzard 1230 IV mounting a 50 MHz 68030 and
  60 ns RAM:
   * the program runs at about 20.2 fps;
   * the rendering of graphics alone runs at about 22.2 fps;
   * the impact of PED81C is of about 22.2-20.2 = 2 fps - in other words,
     writing the graphics to the PED81C raster requires about 50/22.2-50/20.2 =
     0.223 frames (when only the bitplanes DMA is active);
   * rendering the graphics directly to the PED81C raster degrades the
     performance by about 2 to 3 fps (tested only with an older and less
     optimized version).
* On an Amiga 1200 equipped with a PiStorm32, the program runs at 50 fps
  (unsurprisingly).
* The map size is 1024x1024 pixels.
* The map requires 2 MB of FAST RAM.
* The program takes over the system entirely and returns to AmigaOS cleanly.


--------------------------------------------------------------------------------
BACKSTORY

After a hiatus from programming of several months (due to a computer-unrelated
project), I decided to finally create something for PED81C because I had made
nothing with it other than a few little examples, I wanted to test its
graphical quality and computational performance, and... I felt like having some
good fun.
After some inconclusive mental wandering, the idea of making a voxel engine came
to mind for unknown reasons (I had never dabbled with voxel before).
When the engine was mature enough I decided to distribute PVE publicly (which
initially was not planned).

About the update, I fixed some palette values in a table in the documentation, added the formulas for calculating DIWSTRT, DIWSTOP, DIWHIGH, DDFSTRT and DDFSTOP to the documentation and implemented them in the AMOS Professional source code example. This is the snippet relative to the register settings:
Code: [Select]
In general, given a raster which is RASTERWIDTH dots wide and RASTERHEIGHT dots
tall, the values to write to the chipset registers in order to create a centered
screen can be calculated as follows:
 * SCREENWIDTH  = RASTERWIDTH * 8
 * SCREENHEIGHT = RASTERHEIGHT
 * DIWSTRTX     = $81 + (160 - SCREENWIDTH / 8)
 * DIWSTRTY     = $2c + (128 - SCREENHEIGHT / 2)
 * DIWSTRT      = ((DIWSTRTY & $ff) << 8) | ((DIWSTRTX + 1) & $ff)
 * DIWSTOPX     = DIWSTRTX + SCREENWIDTH / 4
 * DIWSTOPY     = DIWSTRTY + SCREENHEIGHT
 * DIWSTOP      = ((DIWSTOPY & $ff) << 8) | (DIWSTOPX & $ff)
 * DIWHIGH      = ((DIWSTOPX & $100) << 5) | (DIWSTOPY & $700) |
                  ((DIWSTRTX & $100) >> 3) | (DIWSTRTY >> 8)
 * DDFSTRT      = (DIWSTRTX - 17) / 2
 * DDFSTOP      = DDFSTRT+SCREENWIDTH / 8 - 8
RETREAM - retro dreams for Amiga, Commodore 64 and PC
 

Offline saimoTopic starter

Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #6 on: December 22, 2023, 10:10:26 AM »
Just released a new version of PVE. Full changelog below. In short: it's faster and it's got a few little additions.

https://retream.itch.io/ped81c

v1.1 (22.12.2023)
* Reworked screen buffering, so that the raster data is more efficiently written to CHIP RAM when bitplanes DMA is inactive.
* Improved 68030 caches handling.
* Added 68040 and 68060 caches handling.
* Added MMU handling to avoid that the MMU affects the speed negatively.
* Optimized rendering core by making it write the dots sequentially.
* Made a little 68060-specific code optimization.
* Ensured 68060 susperscalar dispatch is enabled.
* Added live-toggable staggered lines video filter, which helps see better colors on devices that do not support SHRES and reduces the jailbars effect on devices that support SHRES (to enable/disable: [F1]).
* Made fps indicator live-togglable (to enable/disable: [F2]).
* Made quitting from the voxel screen return to the splash screen.
* Replaced mouse controls with keyboard controls.
* Added benchmark function.
* Added command line switches to control the CPU caches.
* Fixed bug that caused a longword to be written to a random location when the fps indicator was on.
* Fixed an innocuous initialization bug.
* Made cleanup code more robust.
* Updated, extended and fixed documentation.
RETREAM - retro dreams for Amiga, Commodore 64 and PC
 

Offline klx300r

  • Amiga 1000+AmigaOne X1000
  • Hero Member
  • *****
  • Join Date: Sep 2007
  • Posts: 3245
  • Country: ca
  • Thanked: 20 times
  • Gender: Male
    • Show only replies by klx300r
    • http://mancave-ramblings.blogspot.ca/
Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #7 on: December 23, 2023, 07:04:23 PM »
@ saimo
congrats on it's official release ;D
____________________________________________________________________
c64-dual sids, A1000, A1200-060@50, A4000-CSMKIII
Indivision AGA & Catweasel MK4+= Amazing
! My Master Miggies-Amiga 1000 & AmigaOne X1000 !
--- www.mancave-ramblings.blogspot.ca ---
  -AspireOS.com & Amikit- Amiga for your netbook-
***X1000- I BELIEVE *** :angel:
 

Offline saimoTopic starter

Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #8 on: March 27, 2024, 10:42:02 PM »
It was ages that I intended to dig up some 20+ years old code and use it to play with PED81C a little more. Finally I got around to do it and came up with a new test program called Zoomaniac.
Details in the video and in the manual excerpt below. Download available at https://retream.itch.io/ped81c.

https://www.youtube.com/watch?v=eehqapb20fE

Code: [Select]
--------------------------------------------------------------------------------
OVERVIEW

Zoomaniac has been written to evaluate the performance on a stock Amiga 1200 of
a general-purpose texture scaling routine that writes directly to a PED81C
raster.


--------------------------------------------------------------------------------
PERFORMANCE

The following results are relative to the full screen effect that zooms the
cosmonaut in and out.

On a stock Amiga 1200, the execution speed is between 25 and 26 fps. If the
staggered lines are turned on, the performance drops by about 1 fps (which was
unexpected, since all that such option adds is a Copper WAIT and a Copper MOVE
for each rasterline).
Given that the DMA load caused by PED81C is "double" (see its documentation for
the details), a version that uses only half the number (2) of bitplanes has been
made to check the performance as if the Amiga had a native chunky video mode.
Surprisingly, the performance did not improve at all: relatively to the CHIP bus
access, the scaling code must interleave so nicely with the bitplane data
fetches that having more bus cycles available does not make any/much difference.

An Amiga 1200 equipped with a 68030 clocked at 50 MHz and 60 ns FAST RAM easily
performs at steady 50 fps. To find out the maximum performance, new tests were
made with special versions of the program that had the video synchronization
code disabled.
The speed when running the program normally was between 77 and 78 fps. The
staggered lines option lowered the fps by about 2. The 2 bitplanes versions
performed better, reaching 80-81 fps or, with the staggered lines on, 79-80 fps.
Like on the stock Amiga 1200, the extended Copperlist that implements the
staggered lines causes a small and similar performance drop. Instead, the
halving of the bitplanes DMA load did produce a speed increase.

The following table sums up the results.

S = stock Amiga 1200
E = Amiga 1200 68030 @50 MHz / 60 ns FAST RAM (Blizzard 1230 IV)
2 = 2 bitplanes on
4 = 4 bitplanes on
L = staggered lines on

  |     4 |     L4 |     2 |    L2
--+-------+--------+-------+-------
S | 25-65 |  24-25 | 25-26 | 24-25
E | 77-78 |  75-76 | 80-81 | 79-80

Notes:
 * when FAST RAM is detected, an alternative and more suitable scaling routine
   is used (although writes still happen to CHIP RAM);
 * on (some?) machines equipped with FAST RAM an even faster strategy would be
   rendering to FAST RAM and then simply copying at the maximum speed the
   rendered frame to the CHIP RAM raster.


--------------------------------------------------------------------------------
TECHNICAL NOTES

* The scaling routine fits any rectangle from a texture into a rectangle of any
  size and ratio of another texture with nearest-neighbor matching.
* Logic and rendering are totally asynchronous: the logic runs always at 50 Hz
  and the rendering never stops (unless it reaches the limit of 50 fps, imposed
  by the display refresh rate), thus exploiting the machine's full potential.
* The screen buffering employs three buffers in CHIP RAM.
* The screen resolution is 1020x256 SHRES pixels, which correspond to 255x256
  LORES-sized physical dots and to 128x256 logical dots.
* The code is 100% assembly.
* The program takes over the system entirely and returns to AmigaOS cleanly.

CHANGELOG

March 27, 2024
* Added the Zoomaniac demo.
* [PED81C Voxel Engine] Made a couple of minor changes.
* [PED81C Voxel Engine] Updated documentation.

January 1, 2024
* Rebuilt demos against latest custom framework.
* [PED81C Voxel Engine] Optimized slightly background rendering.
* [PED81C Voxel Engine] Corrected benchmark fps calculation (312 rasterlines were considered instead of 313).
* [PED81C Voxel Engine] Built against latest custom framework.
* [PED81C Voxel Engine] Updated, extended and fixed documentation.
RETREAM - retro dreams for Amiga, Commodore 64 and PC
 

Offline saimoTopic starter

Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #9 on: March 29, 2024, 01:49:25 PM »
In response to the feedback received, I have uploaded a new version of Zoomaniac that allows to enable/disable the fps limit by means of [F3].

Code: [Select]
* The number shown in the top-left corner of the effects screen is the fps
  indicator, which reports the number of frames rendered in the last second.
  It is limited to 999.
* When the fps limit is on, the maximum number of frames rendered per second
  is 50 also on the most powerful machines, as the display refresh rate is 50
  Hz. When the fps limit is off, frames are rendered without pausing when the
  previously rendered frame/frames has/have not (completely) displayed yet. On
  machines which cannot run the program at 50 fps or more, turning off the
  limit has no effect whasoever; on the other machines, the only visible effect
  is that the fps indicator goes beyond 50, thus giving a measure of the maximum
  speed that the machine can reach.

Also, this new version runs 1-2 fps faster on 68030 thanks to the data cache burst:

Code: [Select]
* on 68030 tests proved that: it is advantageous to turn the data cache burst
   on when scaling a 128 dots wide rectangle to a rectangle wider than 8 dots
   (i.e. with an X scaling factor greater than 1/16); with a scaling factor of
   1/16 or less the difference proved to be minimal when both the source and
   destination rectangles were 256 dots tall; considering that turning the data
   cache burst off would therefore be advantageous only with very narrow and
   tall rectangles (which are uncommon and intrinsically rather inexpensive),
   it is not worth it to implement a data cache burst management inside the
   scaling routine;

CHANGELOG

v1.1 (28.3.2024)
* Turned the 68030 data cache burst on for slightly faster performance.
* Made a couple of minor optimizations.
* Added frames rendering limit toggle ([F3]).
* Worked on fps indicator: added hundreds digit; made digits smaller; made digits auto-clearing, so that they read correctly also when they are not cleared before drawing.
* Made staggered lines toggle as soon as [F1] is pressed (instead of when it is released).
* Updated splash screen.
* Redesigned the 'M' in the logo.
* Updated and extended manual.
RETREAM - retro dreams for Amiga, Commodore 64 and PC
 

Offline saimoTopic starter

Re: PED81C - pseudo-native, no C2P chunky screens for AGA
« Reply #10 on: April 02, 2024, 09:01:44 PM »
To have a complete set of scaling routines (which hopefully I'll use for something someday), I added support for color-keying, zero-keying (color-keying with color 0), and horizontal and vertical flipping.
Morever, given that initially the focus was on the stock A1200, the performance on expanded machines was not optimal (as the rendering was done directly in CHIP RAM), so I added also an alternative buffering method that, when 2 rasters can be allocated in FAST RAM, allows rendering in FAST RAM and then copies the rendered raster to the raster in CHIP RAM as quickly as possible, starting when the beam reaches the bottom of the screen. This, relatively to the first effect in the test program (which is the only one whose performance was measured until now), produced a gain of 8-9 fps on my 68030-equipped Amiga 1200.

The updated test program (available at https://retream.itch.io/ped81c), to demostrate the new features, streches and shrinks a color/zero-keyed texture covering almost the entire screen over a full-screen zooming background, with all the possible flipping combinations. That is of course a bit taxing for a stock A1200, whose performance drops between 12 and 16 fps in the busiest cases.

https://www.youtube.com/watch?v=ebxwKm9K4Os

(Side note: the video was recorded before finalizing the test program, so it shows an outdated splash screen and zooming jumps relatively to the background when passing from/to the color/zero-keying effects.)

This snippet from the updated manual provides further details.

Code: [Select]
--------------------------------------------------------------------------------
OVERVIEW

Zoomaniac has been written to evaluate the performance on stock and modestly-
accelerated Amiga 1200s of some general-purpose texture scaling routines in
conjunction with PED81C.


--------------------------------------------------------------------------------
GETTING STARTED

Zoomaniac requires:
 * Amiga computer
 * AGA chipset
 * 170 kB of CHIP RAM
 * 1.2 MB of any RAM
 * PAL SHRES support
 * keyboard
 * 1 MB of storage space

To install Zoomaniac, unpack the LhA archive to any directory of your choice.

To start Zoomaniac, open the program directory and double-click the program icon
from Workbench or execute the program from shell.

If your monitor / graphics card / scan doubler do(es) not support SHRES, the
colors will look off or even not show at all. In such case, to hopefully fix the
colors a bit, try the staggered lines option.


--------------------------------------------------------------------------------
CONTROLS

 KEY      | SPLASH SCREEN               | EFFECTS SCREEN
----------+-----------------------------+----------------------------
 [SPACE]  | go to effects screen        |
 [F1]     | turn staggered lines on/off | turn staggered lines on/off
 [F2]     | turn fps indicator on/off   | turn fps indicator on/off
 [F3]     | turn fps limit on/off       | turn fps limit on/off
 [ESCAPE] | quit to AmigaOS             | go to splash screen


--------------------------------------------------------------------------------
MISCELLANEOUS

* The staggered lines shift the odd lines by 1 SHRES pixel to the right. On
  systems which handle SHRES correctly, that will reduce the jailbars effect
  (but give the screen a kind of wavy look). On system which handle SHRES as
  HIRES (for example, MNT's VA2000 graphics card and Irix Labs' ScanPlus AGA -
  contrary to how is was originally marketed - display only the even or odd
  columns of pixels, so only reds and blues or greens and grays show), that
  helps improving the colors a bit (giving the screen a kind of scanline
  effect). On other systems, the results are unpredictable, but the option is
  still worth a try.
* The number shown in the top-left corner of the effects screen is the fps
  indicator, which reports the number of frames rendered in the last second.
  It is limited to 999.
* When the fps limit is on, the maximum number of frames rendered per second
  is 50 also on the most powerful machines, as the display refresh rate is 50
  Hz. When the fps limit is off, frames are rendered without pausing when the
  previously rendered frame/frames has/have not (completely) displayed yet. On
  machines which cannot run the program at 50 fps or more, turning off the
  limit has no effect whasoever; on the other machines, the only visible effect
  is that the fps indicator goes beyond 50, thus giving a measure of the maximum
  speed that the machines can reach.


--------------------------------------------------------------------------------
PERFORMANCE

The following results are relative to the full screen effect that zooms the
cosmonaut in and out without flipping. The source textures are 256x512 dots and
the screen internally consists of 128x256 dots. Since a dot is represented by a
byte, 128x256 = 32768 bytes are fetched and written to render a frame.

On a stock Amiga 1200, the execution speed is between 25 and 26 fps. If the
staggered lines are turned on, the performance drops by about 1 fps (albeit all
that such option adds is a Copper WAIT and a Copper MOVE for each rasterline).
Given that the DMA load caused by PED81C is "double" (see its documentation for
the details), a version that uses only half the number (2) of bitplanes has been
made to check the performance as if the Amiga had a native chunky video mode.
Surprisingly, the performance did not improve at all: relatively to the CHIP bus
access, the scaling code must interleave so nicely with the bitplane data
fetches that having more bus cycles available does not make any/much difference.

An Amiga 1200 equipped with a 68030 clocked at 50 MHz and 60 ns FAST RAM easily
performs at steady 50 fps. To find out the maximum performance, tests were made
with the fps limit off.
The speed when running the program normally was between 84 and 86 fps. The
staggered lines option lowered the fps by about 1. The 2 bitplanes versions ran
at the same speed - in this case, that is because most of the CHIP RAM accesses
happen when no bitplanes DMA is going on (see TECHNICAL DETAILS section).

The following table sums up the results.

   staggered lines |   off |     on
-------------------+-------+--------
  stock Amiga 1200 | 25-26 |  24-25
exanded Amiga 1200 | 84-86 |  84-85

expanded Amiga 1200: Blizzard 1230 IV, 68030 @50 MHz, 60 ns FAST RAM

Notes:
 * given that a stock Amiga 1200 reaches about 25.5 fps, it manages to render
   128*256*25.5 = 835584 dots per second; considering that the 68020 is clocked
   at 14.187580 MHz, rendering 1 dot requires about 14187580/835584 = 17 CPU
   cycles;
 * on 68030 tests proved that: it is advantageous to turn the data cache burst
   on when scaling a 128 dots wide rectangle to a rectangle wider than 8 dots
   (i.e. with an X scaling factor greater than 1/16); with a scaling factor of
   1/16 or less the difference proved to be minimal when both the source and
   destination rectangles were 256 dots tall; considering that turning the data
   cache burst off would therefore be advantageous only with very narrow and
   tall rectangles (which are uncommon and intrinsically rather inexpensive),
   it is not worth it to manage the data cache burst inside the scaling
   routines.


--------------------------------------------------------------------------------
SCALING ROUTINES

The scaling routines fit any rectangle from a texture into a rectangle of any
size and ratio of another texture with nearest-neighbor matching. Optionally,
they can flip the rectangles horizontally and/or vertically, and treat as
transparent the dots of a specific color (color-keying) or of color 0 (zero-
keying).
Color/zero-keying allows to render graphics of arbitrary shapes without masks
(which saves RAM and CPU cycles). Thanks to the fact that PED81C graphics always
use at most 81 colors, there are 256-81 = 175 colors that can be used for color-
keying without causing any visual loss.
For performance reasons, there are the 3 separate routines.

 routine               | color-keying | zero-keying | speed rating
-----------------------+--------------+-------------+--------------
 v_ScaleRectangle()    |              |             |     ***
 v_ScaleRectangle_CK() |      *       |             |       *
 v_ScaleRectangle_ZK() |              |      *      |      **


--------------------------------------------------------------------------------
OTHER TECHNICAL NOTES

* Logic and rendering are totally asynchronous: the logic runs always at 50 Hz
  and the rendering never stops (unless it reaches 50 fps and the fps limit is
  on), thus exploiting the machine's full potential.
* The screen is triple-buffered.
* When 2 rasters can be allocated in FAST RAM:
   1. the graphics are rendered always to the available raster in FAST RAM;
   2. after the rendering has completed and as soon as the bottom rasterline has
      has been displayed, the rendered raster is copied as quickly as possible
      to the raster in CHIP RAM (which is the one that gets displayed).
  The copy successfully races the beam (on the expanded Amiga 1200 mentioned in
  the PERFORMANCE section, it requires about 57 rasterlines during the vertical
  blanking and 35 rasterlines during the fetching of the top rasterlines), so no
  tearing occurs.
  Such method yields a faster performance than rendering directly to a raster in
  CHIP RAM (especially when there is overdraw and/or data gets also read from
  the raster).
* The screen resolution is 1020x256 SHRES pixels, which correspond to 255x256
  LORES-sized physical dots and to 128x256 logical dots.
* The code is 100% assembly.
* The program takes over the system entirely and returns to AmigaOS cleanly.
RETREAM - retro dreams for Amiga, Commodore 64 and PC