Welcome, Guest. Please login or register.

Author Topic: Idea...  (Read 2769 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline patrik

Re: Idea...
« on: November 28, 2003, 10:24:28 PM »
@Karlos:

I got two questions:
1. Does your test program measure the speed of writing or reading or both?

2. Does the PPC have the possibility to access all addresses in the system?

(edit) The results of the busspeed with the Grex should be very sensitive to wether or not the cpu does its linetransfers to/from PCI cards on the Grex in burstmode or not.


/Patrik
 

Offline patrik

Re: Idea...
« Reply #1 on: November 29, 2003, 02:10:34 AM »
@Karlos:

Yes, you are right about the caching, my bad. Accesses to the video memory of a graphics card should indeed be marked as non cachable.

This makes me curious about one thing then. The PCI-buscontroller of the PPC-card/Grex should implement translation from 040/060 burst transfers to PCI burst transfers and the same the other way around if the constructors were aiming at any performance. If this is the case, shouldnt the use of the MOVE16 instruction (which transfers memory blocks of 16Bytes using the same burst as line transfers) to move data to/from the video ram result in higher performance than using regular MOVE:s instructions?


/Patrik
 

Offline patrik

Re: Idea...
« Reply #2 on: November 29, 2003, 03:13:49 AM »
@Karlos:

Non-burst transfers to/from the PCI-bus will inevitable render non-burst transfers on the PCI-bus. The PCI-bus multiplexes the address and data lines so it will take several clockcycles on the PCI-bus to do a basic read or write.  After doing a quick check in the PCI 2.1 specification datasheet (so I reserve myself for errors ;)) I found that a basic read takes 4 clock cycles minimum and a basic write cycle takes 3 clock cycles minimum. With a little calculation this gives:

Basic read maximum bandwidth:
(33*10^6 * (32 / 8)) / 4 = 33000000B/Sec which is 31.47MB/Sec

Basic write maximum bandwidth:
(33*10^6 * (32 / 8)) / 3 = 44000000B/Sec which is 41.96MB/Sec

Taking these figures and your report about 40MB/Sec with the Grex into account (assuming that the largest portion of datashuffling your test-program uses are writes, correct me if I am wrong), it seems like the Grex is able to deliver about maximum theoretical bandwidth using non-bursts transfers on the PCI bus.

Btw, the theoretical figures should be a little bit higher as the bus is clocked to 33 and a third, not 33 as I used in my calculations... lets count that into the +-0.5 fault margin of 33 ;).


/Patrik
 

Offline patrik

Re: Idea...
« Reply #3 on: November 29, 2003, 03:55:33 AM »
@Karlos:

It would indeed be interesting to know if the Grex supports bursts transfers. The 68040 and 68060 bursts are 4 longwords long, so it would translate to a 4 longword burst on the PCI-bus. A read burst of four longwords would take 7 clock cycles minumum on the PCI-bus and a write burst of four longwords on the PCI-bus would take 6 clock cycles minumum. That would give these maximum theoretical bandwidth figures:

Read-burst (4 longwords):
(33*10^6 * (32 / 8) * 4) / 7 = 75428571.43B/Sec which is 71.93MB/Sec

Write-burst (4 longwords):
(33*10^6 * (32 / 8) * 4) / 7 = 88000000B/Sec which is 83.92MB/Sec

These figures are a little more than double the transferspeed using non-burst transfers. Maybe it is of not to great practical use for graphic cards, but it would be very interesting to see if figures like this could be achieved using bursts transfers on the Grex.

Ofcourse the graphics-card has to be able to handle this too, soo when testing this a low resolution, low bitdepth and low refreshrate would be very good - ideally the card shouldnt be refreshing the screen at all during the test.


/Patrik
 

Offline patrik

Re: Idea...
« Reply #4 on: November 29, 2003, 05:09:05 AM »
Just realised one important thing about using MOVE16. MOVE16 loads 16Bytes (a line) from the source-address into a temporary register and then saves it at the destination-address, right?

I continue on babling assuming that :).

If so, testing with MOVE16 would probably result in worse result than the results you achieved with your test program as the fastmemory-bandwidth of the accelerators to which a Grex can be connected will be from about 30MB/Sec with 040@25MHz to maybe 70MB/Sec with 060@50MHz (just rough estimations though) and at that fastmem-bandwidth-utilization the local 68040/68060-bus bandwidth is fully utilized (the time not spent transfering data to/from fastmem is spent with waitstates) with no extra bandwidth room for transfering data to/from the PCI-buscontroller without sacrifising fastmem-bandwidth. Transferring data to/from the PCI-buscontroller at the same time as transferring data to/from fastmemory (as the case would be when testing with MOVE16) should effectively and approximately ;) cut the fastmemory-bandwidth in half. This would make the accelerator-cards with the fastest fastmem-interfaces able to transfer about 35MB/Sek to/from the Grex using MOVE16.

To try to max out the Grex, the 68040/68060 local bus has to be free of bandwith-hogging memory-transfers so the register to VRAM method you are using in your test program must be used! The only problem will be to use that method with bursts....what do you think about marking a part of the VRAM writethrough-cacheable using the MMU and then doing a write-test ala your recipe?


(edit):

Quote

"Transferring data to/from the PCI-buscontroller at the same time as transferring data to/from fastmemory (as the case would be when testing with MOVE16) should effectively and approximately ;) cut the fastmemory-bandwidth in half. "


This is half-true ;).. the transfers are ofcourse not happening at the same time, first you read from fastmemory, then you write to the PCI-buscontroller for example. But as you cant read from fastmemory when writing to the PCI-buscontroller the fastmemory bandwidth is approximately and effectively cut in half (the same ofcourse applies transferring data the other way) - so the important part about the bandwidth was atleast true :).


/Patrik
 

Offline patrik

Re: Idea...
« Reply #5 on: November 29, 2003, 03:58:51 PM »
As you dont have a Grex yet ;) I think you should check out this opportunity :).

If you want to get a quick overview of how the caching areas and modes are controlled by the MMU you should look at these sections in the User Manual for respective processor:

For the 68040 (2.5 Pages of reading):
4.3, 4.3.1, 4.3.1.1, 4.3.1.2 and 3.1.3

For the 68060 (3.5 Pages of reading):
5.4, 5.4.1, 5.4.1.1, 5.4.1.2, and 4.1.3

I do recommend reading the 060 User Manual sections first as it is written in a better way... atleast I think so :).

I understand your reasons, I just got kind of goo-gaa interested in finding out what levels of performance that can be crammed out of the Grex :). As the Grex seems to be able to fully utilize the PCI-bus using non-burst transfers in some configurations I am incredibly curious of how it can perform using burst transfers. There should be some use of burst-transfers btw... when transferring big chunks of data from fastmemory to VRAM (textures for example maybe), burst transfers with MOVE16 should be very sensible.

Though as I said in my earlier post MOVE16 would be crap to measure the raw busspeed performance. Some MMU-tricking,  setting some pages of the VRAM as writethrough cachable and using your method of reading/writing from/to these pages of VRAM should theoretically make it possible to test the raw busspeed performance of the Grex.

If you mean bustest I have used it. Uhm.. was there any point you were coming to regarding this? ;)

The bottom line anyhow is this: I need your expertise for this man!

Have a great day!


/Patrik
 

Offline patrik

Re: Idea...
« Reply #6 on: November 29, 2003, 05:21:05 PM »
No writethrough cache fudge with bustest *sniff* ;).

Uhm, anyhow... When you try the fudge out, you know someone who will be very interested in the results...:)) and I must admit that I get very curious when you say you already got a use for it but you dont say what! ;)

No, unfortunately I am not a Grex-owner. If I had had a BlizzardPPC for my A1200 though, I can tell you that I wouldnt have hesitated one second to buy one of those for 54EUR from www.ggsdata.se! Today it is unsupported by the manufacturer, but drivers are still developed for it as it is supported by the OpenPCI project. With the speed and expansion possibilities it offers at that price that support would be more than enough for me!

And yes - I am interested to an unhealthy degree in this.. hardware with an edge just makes me so :).


/Patrik