Welcome, Guest. Please login or register.

Author Topic: 68k AGA AROS + UAE => winner!  (Read 26336 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Hammer

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 1996
  • Country: 00
    • Show all replies
Re: 68k AGA AROS + UAE => winner!
« on: April 08, 2004, 07:35:38 AM »
Quote
how many registers?

AMD64(X86-64) exposes 16 GPRs 64bit registers, while Pentium IV** (with hyper-treading) has 8 + 8 GPRs configuration.

Quote
how does it do floating point maths?


About X86-64/AMD64
"Instruction set is designed to offer all advantages of CISC and RISC.
- Code Density of CISC
- Register usage and ABI models of RISC"

Reference document
Amiga 1200 PiStorm32-Emu68-RPI 4B 4GB.
Ryzen 9 7900X, DDR5-6000 64 GB, RTX 4080 16 GB PC.
 

Offline Hammer

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 1996
  • Country: 00
    • Show all replies
Re: 68k AGA AROS + UAE => winner!
« Reply #1 on: April 09, 2004, 07:02:17 AM »
Quote
what about 486 and earlier CPUs?

Just 8 32bit GPRs (386 and above).

Quote
(the speed advantage of x86 makes me believe that something about their architecture must be very good, I am curious to know what that is),

It would be more on the micro-architecture side than the ISA side. "Bang per buck" and legacy support (a.k.a "software investment protection") are the other strengths of X86.    

Note that IF PPC32 ISA is a "real RISC", the need for decoding/'cracking' into smaller RISC instructions wouldn’t be required in the PPC 970(52million transistors beast)(IBM's own "Athlon"  but applied for PPC32 ISA).
Amiga 1200 PiStorm32-Emu68-RPI 4B 4GB.
Ryzen 9 7900X, DDR5-6000 64 GB, RTX 4080 16 GB PC.
 

Offline Hammer

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 1996
  • Country: 00
    • Show all replies
Re: 68k AGA AROS + UAE => winner!
« Reply #2 on: April 18, 2004, 08:53:53 AM »
Quote
2) Your assumption that the "external CISC" style approach of x86 could be better based on code density is very difficult to judge. You have to consider that the code decomposition into the internal micro op language is far from simple to achieve. The design of these cores are fantastically complicated, gobbling up silicon like nobody's buisness.

One problem, PowerPC 970 has ~52 million transistors i.e. such transistor count is similar to the Athlon XP(Barton core).

In estimating the transistor count for AMD’s K8 core one can do the following;
1. Athlon FX’s ~90 million transistors minus Athlon 64 ~77 million transistors. This gives us an estimated single channel memory controller count i.e. ~13 million.
2. Athlon 64 ~77 million minus 13 million gives 64 million transistors. Athlon 64 with AGP tunneling/hyper transport interface and 1MB L2 cache consumes about 64 million transistors.

As one can see, K8 core and K7 Barton core is in range of PowerPC 970's transistor count.
 
Quote
All this takes clock cycles - which is partially why modern x86 CPU's have such very long pipelines and "time of flight" for instructions.

Note that PowerPC 970 has a comparable pipeline depth as with Athlon XP(Barton core). Try comparing it with an AMD solution instead of Intel's solution.  

Quote
We had several Windows NT workstations in our spectroscopy labs at Uni, one was using the latter at 266 MHz and we had a newer P-II 300 MHz, with it's new spanking "internal RISC style core" and the alpha still stuffed it

Note that Alpha sports an EV6 bus advantage, why try it with EV6 bus enabled K7 Athlons?
Amiga 1200 PiStorm32-Emu68-RPI 4B 4GB.
Ryzen 9 7900X, DDR5-6000 64 GB, RTX 4080 16 GB PC.
 

Offline Hammer

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 1996
  • Country: 00
    • Show all replies
Re: 68k AGA AROS + UAE => winner!
« Reply #3 on: April 19, 2004, 04:47:37 AM »
Quote
How much of hte 970 transistor count is needed for legacy support? Virtually nothing.

Note that PowerPC 970 applies decode/cracking on PowerPC32 ISA before being executed to FRISC** core. Depending on the instructions, 9 stages are allocated for decoding.

**Further Reduced Instruction Set Computing (FRISC).

Quote
I'm not sure what sort of bus the Alpha was using then

A few rhetorical questions;

1. What was the clock speed of that particlar DEC Alpha at 1997 against the Pentium Pro?
2. What was the price differential between the two?
3. What was transistor count between DEC's Alpha and Intel's Pentium Pro?

Alpha 21264 EV6 was release in early 90s and it's an example of 3rd generation Alpha.

Note that, the aggressive OOO (out of order) scheme of the Alpha runs counter to the pure RISC ideal. Also note that the Alpha is a six-issue wide pipeline. There's no way for Intel's 'Pentium Pro' can match that (former)beast.

Quote
Incidentally, I'm not the one who started the PPC v Intel debate here

Well, you(and other person) opened some issues in relation to PPC and X86.
Amiga 1200 PiStorm32-Emu68-RPI 4B 4GB.
Ryzen 9 7900X, DDR5-6000 64 GB, RTX 4080 16 GB PC.
 

Offline Hammer

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 1996
  • Country: 00
    • Show all replies
Re: 68k AGA AROS + UAE => winner!
« Reply #4 on: April 19, 2004, 05:54:07 AM »
Quote
The problem is, it's not a simple linear process where x86 instruction "X" is always decomposed into micro-op codes "a b c". The early stage of decode may work this way but once it has to start processing "a b c", it almost works like a mini compiler, looking to see what rename registers are/will become free, which instructions have dependencies etc. All this takes clock cycles - which is partially why modern x86 CPU's have such very long pipelines and "time of flight" for instructions.

Note that, the PowerPC 970 has 16 stages for integer type instructions, 17 stages for load/store instructions, 21 stages for floating-point instructions, and up to 25 stages for SIMD instructions.

"The depths of the 970’s pipelines vary greatly, depending on the instruction type. All the pipelines begin with nine fetch and decode stages. This large number of initial stages is extraordinary for a classic RISC architecture. It’s more reminiscent of today’s x86 processors".

Why not compare PowerPC 7447A @~1.5Ghz vs PowerPC 970 @~1.6Ghz?

With AMD's K8 processors, completed instructions can exit the pipeline without going thought the rest it. It's estimated that the K8 processors only has 20 stage pipelines (depending on how AMD count its pipelines i.e. pipeline comparison may be comparable with IBM’s way).    

Reference
1. IBM PowerPC 970
Amiga 1200 PiStorm32-Emu68-RPI 4B 4GB.
Ryzen 9 7900X, DDR5-6000 64 GB, RTX 4080 16 GB PC.
 

Offline Hammer

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 1996
  • Country: 00
    • Show all replies
Re: 68k AGA AROS + UAE => winner!
« Reply #5 on: April 21, 2004, 12:44:48 AM »
Quote
Embarassingly for the senior IT technician who ordered the changes, the newer machines ran the existing x86 version of the software not only slower than the Alpha, they ran it at a speed that was only marginally greater than running the x86 version under FX32 on the alpha.


"Figure 1 shows the relative performance on the ByteBenchmark of a 200Mz Pentium Pro and a 500 MzAlpha running DIGITAL FX!32. For this benchmark,the Alpha running DIGITAL FX!32 provides about thesame performance as a 200Mz Pentium Pro"(1)

References
1. http://www.usenix.org/publications/library/proceedings/usenix-nt97/full_papers/chernoff/chernoff.pdf
2. http://www.hotchips.org/archive/hc9/hc9pres_pdf/hc97_4b_rubin_1up.pdf
Amiga 1200 PiStorm32-Emu68-RPI 4B 4GB.
Ryzen 9 7900X, DDR5-6000 64 GB, RTX 4080 16 GB PC.
 

Offline Hammer

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 1996
  • Country: 00
    • Show all replies
Re: 68k AGA AROS + UAE => winner!
« Reply #6 on: April 21, 2004, 01:27:40 AM »
@Karlos
Note that there are two versions of Alpha @~266Mhz

1. Model 21066 (2 issue superpipeline, EV4 core)
2. Model 21164 (4 issue superpipeline**, on chip ~96KB L2 cache, EV5 core). **2 integer pipelines, 2 floating-point pipelines.

The Model 21064 was release around March 1992, sports a   128-bit bus interface.

The first Pentium II variant to received it’s on chip L2 cache is during introduction of Celeron 300A (Mendocino).  

Other related references (X86 complier improvments)
http://www.aceshardware.com/Spades/read.php?article_id=40000191
Amiga 1200 PiStorm32-Emu68-RPI 4B 4GB.
Ryzen 9 7900X, DDR5-6000 64 GB, RTX 4080 16 GB PC.
 

Offline Hammer

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 1996
  • Country: 00
    • Show all replies
Re: 68k AGA AROS + UAE => winner!
« Reply #7 on: April 21, 2004, 02:39:57 AM »
@Karlos

"FMUL and FDIV that are still 'not pipelined' in the Pentium III. "(3).

Reference.
3. http://www.heise.de/ct/english/99/16/092/

Quote
Interesting. The Alpha machine was there in 1995 and wasn't new then. I'm reasonably sure it was a 21164. When was it released?


Around 1994 for Alpha Model 21164 (industry’s first 300Mhz processor)(4). ~266Mhz variant could be the slightly cheaper version.  

Reference
4. http://vt100.net/timeline/1994-3.html
Amiga 1200 PiStorm32-Emu68-RPI 4B 4GB.
Ryzen 9 7900X, DDR5-6000 64 GB, RTX 4080 16 GB PC.