Welcome, Guest. Please login or register.

Author Topic: A1 now booting OS4 straight from hard drive  (Read 8250 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: A1 now booting OS4 straight from hard drive
« on: November 15, 2003, 02:59:30 AM »
Quote

sicky wrote:

Coffee/tea and doughnuts will be available (free of charge) for the thirsty and hungry!
See you there


Whoa! Free doughnuts?

(sound of rapid footsteps, a door slamming, car door slamming...)
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: A1 now booting OS4 straight from hard drive
« Reply #1 on: November 15, 2003, 08:15:30 PM »
On the subject of ppc emulation...

I can recall back when I was coding some stuff for powerup kernel, there actually was an interpetive PPC emulator for testing PPC code on 680x0 only systems.

It was dog slow (obviously), but it worked. It was not aimed at speed, obviously!
I cant for the life of me recall what it was called, but it demonstrates that ppc emulation is possible (if not feasable at a reaslistic speed).

PPC emulation is a tall order. For instance, consider the 680x0 emulation. It works by having a function LookUp Table (LUT) with an entry for each opcode. Since the opcode is 16-bits wide, thats 65536 of them!
The interpreter calls functions  that emulate the opcode, the JIT calls functions that generate the native code for the instruction emulated instead.

Moving this approach to PPC is very hard. PPC opcodes are 32-bits wide. So a naive LUT for every opcode isnt feasable - it would be 16G in size (assuming each function pointer in the table is 32-bits wide) :-o

Masking the bits required for the opcode and basing a LUT on that is naturally the way to go, but then you have to add checks for all the special instructions that don't conform to the "opcode ra, rb, rc" format. And there are lots of those, too...

In short. Fast PPC emulation - its a tricky bugger ;-)
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: A1 now booting OS4 straight from hard drive
« Reply #2 on: November 15, 2003, 09:00:15 PM »
Quote

bloodline wrote:

So much for "RISC" :-D



:lol: Yeah - ironic really. You think 'fewer instructions' equates to 'simpler to emulate' but in fact it's not really the case at all here.

Of course, in the hardware, individual groupings of bits in the instruction map out to different sections of the hardware etc. as the opcode is decoded. Such mappings are usually hardwired into the decode circutry and effectively 'instantaneous' (maybe one stage in the pipeline) as far as it goes.

Emulation of this level of operation decode behaviour would obviously be ludicrously complex and slow.

Quote

But, yes, emulating the PPC would be a nightmare :-)


If it were no more complex than say 680x0, it would have been done quite some time ago. There's probably more commercial demand for ppc emulation than 680x0 emulation...

-edit-

...and all this is before you get into the really hard part (specifically MMU emulation) :-o
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: A1 now booting OS4 straight from hard drive
« Reply #3 on: November 16, 2003, 08:44:59 PM »
-edit-

Sorry, bit of an essay :lol:

-/edit-

Quote

Waccoon wrote:
Isn't true RISC code about 1.5 times bigger than CISC code in executable format?


Quite often RISC code is bigger than CISC (there are exceptions however), but its not really much of an issue. RISC architectures tend to have highly orthogonal instruction sets (that is all instructions are the same size).
On the other hand CISC instruction sizes can vary like anything. For example, a simple 680x0 'move' instruction can be from 1 to 12 16-bit words long depending on the addressing modes used.

Data usually makes up large parts of any executable. To be flat honest, code I have compiled for both 680x0 and PPC have resulted in pretty similar sized programs, certianly not 50% larger.
Go figure...

Quote

I'm no electrical engineer, obviously, but are all 32 bits used for instructions, or instructions AND data?  How many instructions are there on a PPC?

I don't doubt emulating PPC is tough and slow, but does *any* CPU has 65K+ instructions.   :-?


Not 65K instructions, no. However it isn't about the actual number of instructions, rather it is about the total number of potential opcodes, which in turn depends on the opcode size.

Consider the 680x0 again. It has less than 50 instructions at the assembler level (think of add, addx, etc).

However, at the binary level, the instruction opcode is 16-bits wide. The pattern of bits in the opocde have particular meanings. You can see this in any 680x0 Programmer Manual.

For instance, consider the basic 680x0 integer add instruction. In assembler we can write it thus:

add. , d
add. d,

Where
is the operand size (.b, .w, .l)
is the effective addressing mode
is the number of the data register (0-7)

Just from the above description we can see that actually there is a lot of implicit information here. This information is encoded into the 16-bit instruction word as follows

Bits 0-2 are the address register in the
Bits 3-5 are the mode
Bits 6-8 are the operation mode (size/sense)
Bits 9-11 is the data register number N
Bits 12-15 are the opcode identifier (here 1101)

So in reality there are literally hundreds of possible 16-bit values that correspond to different variations of the above 'add' instruction. Depending upon the effective address mode used, several extension 16-bit words may follow the instruction word.

As I said, emulation of the 680x0 often simply has a table of all the possible 16-bit values that point to functions to handle each specific opcode case.
All of the nonsense values will point to the same function that emulates an illegal instruction trap.

This approach is used because it simplest, hence quickest (and also constant time taken) to lookup a function in a table and call it.

Any other solution that more intelligently breaks the opcode word down into parts involves more stages, increases the complexity and hence the time taken, which is very bad for any emulation.

Back to PPC...

The same breakdown of opcode words into meaningul fields found on the 680x0 also applies ot PPC opcodes.

The PPC, being a RISC system has only a small number of instructions, but like the 680x0 encodes much other data into its 32-bit instruction word.

For example, a typical 'inst rA, rB, rC' type instruction has to encode 3 registers, each of which needs 5 bits (there are 32 integer registers and 2^5 = 32).

Additionally, RISC architectures such as the PPC tend to define subtle variations of instructions. For example there are integer arithmetic instruction variants that do not update the chip status registers - if you dont need to check them, why update them? This means that when writing (or generating) optimal code cycles can be saved etc. by using instruction variations that eliminate redundant work.

So, although the PPC may only define a handful of instructions (similar to 680x0), the number of possible opcode values is very large indeed.

You simply cannot use the same table 'entry every opcode possible' strategy as for 680x0 emulation becasue 2^32 = 4billion entries in the table :-o !

Now as I hinted earlier, the trick is to isolate the part of the 32-bit opcode that defines the instruction, mask out the register fields etc. and make a tables of those (now much smaller). There would then be several tables, each with only a few dozen entries.

However, this is exactly the sort of thing I said we needed to avoid when talking about 680x0 emulation because it increases complexity. Masking out the bits takes time and also there are many instructions that dont follow the same bit patterns that you have to check for.

You can imagine the overhead in first checking that the instruction isn't one of many special cases, then masking out the instruction identifier bits, shifting them down to get an index number, indexing the table, finally calling the function etc. etc.

Unfortunately, the choice either is this, or a 4G entry table (basically impossible).

In short, its a performance nightmare.
int p; // A