Author Topic: 68k AGA AROS + UAE => winner! (Read 26176 times)

Piru · « **on:** April 17, 2004, 09:47:53 PM »

Quote

Code: [Select]
#include <pragmas/exec_pragmas.h> #ifdef LATTICE #include <clib/exec_protos.h> #endif #include <clib/alib_protos.h> #include <clib/exec_protos.h> #include <clib/dos_protos.h>

Use:

Code: [Select]

#include 
#include 
#include

instead. More portable.

Quote

Code: [Select]
int ret = 20;

Use:
int ret = RETURN_FAIL;
later, ret = RETURN_OK; and ret = RETURN_FAIL;

Quote

Code: [Select]
TrackMP = CreatePort( 0 , 0 ) ; if( TrackMP==0 ){ printf("CreatePort error\n"); goto cleanup ; } TrackIO = (struct IOExtTD *)CreateExtIO( TrackMP , sizeof( struct IOExtTD ) ) ;

Use CreateMsgPort/DeleteMsgPort and CreateIORequest/DeleteIORequest. Why use amiga.lib when these routines are in exec.library ?

Quote

Code: [Select]
WriteBuffer = AllocVec( SECTOR_SIZE , MEMF_CLEAR | MEMF_PUBLIC ) ; geom = AllocVec( sizeof( struct DriveGeometry ) , MEMF_PUBLIC | MEMF_CLEAR ) ;

Allocation can fail. You MUST check for NULL result from AllocVec and act accordingly.

Quote

mixing printf and PutStr / Flush

Don't mix stdio and dos.library calls. This won't work properly (output problems due to buffering). Either use only stdio functions or only dos.library functions.

In fact, there is no need to use stdio in the program, just use dos.library and get rid of all stdio stuff. The code will never be portable anyway, so why use stdio?

Quote

Code: [Select]
DoIO( (struct IORequest *)TrackIO ) ;

IO can fail. You should implement retry (maybe 5 times), and if retry fail, report the error.

Piru · « **Reply #1 on:** April 19, 2004, 07:59:27 AM »

Quote

I think anyway that the OS produces a Ram disk full requester before it returns failure,

No. It will just crash due to your program writing to zeropage.

Quote

More portable than what?

More portable than:

Quote

#include
#ifdef LATTICE
#include
#endif
#include
#include
#include

Not to mention easier to remember and shorter to write.

proto/#?.h will work with all amiga and amigalike compilers, instead of just lattice and gcc. Also proto/#?.h will always use optimal features of compiler (inlines instead of stubs on gcc for example).

Quote

my code is as portable as theirs,

Don't add non-portability when it can be easily avoided.

Quote

Quote
Use CreateMsgPort/DeleteMsgPort and CreateIORequest/DeleteIORequest. Why use amiga.lib when these routines are in exec.library ?

because the RKMs say so,

time is a resource, I dont have the 10 minutes
to search through the docs for some synonym which
may have subtly different usage,

CreateMsgPort isnt in the OS1 RKMs so its not
backwardly compatible,

You already crash on OS1.x anyway. V36+ routines you use, without testing for V36+:
- exec/AllocVec
- exec/FreeVec
- dos/PutStr

Quote

its a small one off allocation,

dont use low level disk progs in a low memory situation,

These are bad excuses. Crashing under low mem situations is unforgivable. Just test for NULL result.

Quote

but it runs ok in my compile,

Another VERY bad excuse.

Quote

anyway POSIX says that i/o API calls should be indivisible or something,

Sure. AmigaOS is not POSIX.

Quote

printf anyway must be implemented via PutStr,

No. And it's not. Thus the possible problem when mixing stdio and dos.library IO.

Quote

the program runs correctly, live with the fact,

It doesn't.

Quote

havent you got anything better to do with your time?

Yes. I also try to help beginner programmers, and try to help programmers to avoid basic mistakes. Nothing personal here.

Quote

lots of major programs are written much worse than this,

Which is a shame and source of lots of trouble.

Quote

again I have done it exactly as in the RKM's

RKRMs are known to take shortcuts all over. You MUST check for I/O errors. Either check result from DoIO() or WaitIO(), or use io_Error field in the iorequest.

Quote

, I have never known DoIO to fail,

...Which doesn't mean it never won't. I/O can fail, usually due to bad medium (bad floppy in this case). It is important to tell user if his/her data was not properly read/written.

Quote

Load an autodoc into your editor and do a search
for "bug", there are many,

so it actually pays to ignore the returns of some of the unlikely failures, I even have a feeling that DoIO()'s return may be unreliable, but as I am connected to my PC I cannot check the autodocs,

AllocVec() and DoIO() return values are 100% reliable.

Quote

its silly to disconnect h/w while its busy, dont remove the disk if you've just initiated a binary write, its asking for trouble,

Which reminds me, your code fails to Inhibit() the filesystem. Anyone accessing the disk simultanously can get corrupt results, even corrupt the written data.

Quote

the OS (or compiler) actually does this to some extent via a Ram full message,

No it doesn't. OS will give you NULL ptr.

Piru · « **Reply #2 on:** April 19, 2004, 09:32:44 PM »

Quote

Now c's printf can check for control-C and then it quits, however this would leave various AmigaOS resources unclosed, I could fix this "properly" but its some effort,

so instead I have put in a simple control-E to exit via occasional poll mechanism, I didnt use control-C as that may be intercepted by a C compiler, and I wasnt sure of a portable way of disconnecting control-C,

This indeed is a problem if you use stdio (output) routines. The only way to avoid this is to use compiler specific ways to disable CTRL-C check, or to drop stdio completely and only use AmigaOS libraries.

Quote

do rapid control-E polling by checking the tasks SigRecvd flags, if present I then do a Wait() which immediately clears the flag and returns

Code: [Select]

if (SetSignal(0, SIGBREAKF_CTRL_E) & SIGBREAKF_CTRL_E)
{
}

Piru · « **Reply #3 on:** April 20, 2004, 07:44:31 AM »

Quote

one other thing I once thought I would speed up a program which did huge numbers of small memory allocations by replacing c's memory allocation (calloc and free) by exec's (AllocMem with MEMF_CLEAR and FreeMem), I found to my amazement that the program was now slower,

Yes, typical malloc() implementation is faster than direct AllocMem(). Typical malloc() implementation is buffered, it grabs larger chunk from the memory at a time, and gives that back to caller in smaller chunks, much like memory pools of V39+.

Quote

so c's standard library can be very efficient, eg if you want to copy an arbitrary size of memory you need a very good reason to not use memcpy(), and it deals correctly with overlapping memory,

memcpy() does not deal with overlapping copies. memmove() and bcopy() do.

Depending on the libc implementation, CopyMem() can be several times faster, however.

Quote

Quote
Code: [Select]
if (SetSignal(0, SIGBREAKF_CTRL_E) & SIGBREAKF_CTRL_E) { }

ok, there's an obscure way to do it,

Yes, peeking task tc_SigRecvd is indeed obscure.

exec/SetSignal() or dos/CheckSignal() is the official way to do it. With CheckSignal() it would be:

Code: [Select]

if (CheckSignal(SIGBREAKF_CTRL_E) & SIGBREAKF_CTRL_E)
{
}

Piru · « **Reply #4 on:** April 20, 2004, 11:00:19 PM »

@whoosh777

Quote

Quote
No. It will just crash due to your program writing to zeropage.

this problem is easily fixed by:

1. moving the vbr away from the start of memory, as in fact appears to happen if you run "cpu fastrom", Sysinfo shows where the VBR is,

2. the OS should then write protect the first page of memory after the pointer to ExecBase has been set up in position 4,

...except that device read access can use DMA to write to memory (KS 1.x trackdisk.device always uses blitter to decode, KS 2.x+ trackdisk.device doesn't since !(TypeOfMem(0) & MEMF_CHIP), dunno about mfm.device). If DMA access is used, it is not captured by MMU. Reading to address 0 could happily overwrite whatever data is at address 0 and forward, regardless of MMU. However, if the zeropage is mapped to fastmem, the fastmem copy will not be trashed (so ExecBase ptr would remain valid). Anything in low chipmem would be trashed however (readlen > chipmem start address), including the chipmem MemHeader, leading to swift crash (next memory allocation/deallocation).

Quote

Now when my prog tries to read from disk or file to position 0 the MMU will intercept this and a "first page write violation" requester should come up, "click to remove task",

Might not happen if the read access uses DMA (see above).

Even if the app crashes, there is no way to "remove task" safely, since there is no resource tracking or separated address spaces. When task/process crashes under AmigaOS the safest thing you can do is make it Wait(0) (suspend).

Now, it might be just my twisted personality, but IMO it would make much more sense to check for allocation failure rather than worry about all this.

Quote

printf is dos.library IO:
SAS C 650 and 68k gcc *both* implement printf() via dos.library Write(),

True. dos.library VPrintf() calls VFPrintf() for Output() filehandle. VFPrintf() calls RawDoFmt() with FPutC() putchproc that will eventually use Write() to write the chars. How much is written at a time depends on the filehandle buffering mode (BUF_LINE == Write() is used when newline is reached, BUF_FULL == Write() is used when buffer fills up or BUF_NONE == Write() is used for every char). See dos/SetVBuf().

Quote

this is how I would implement PutStr() :

int PutStr( UBYTE *str )
{
int len ;

len = strlen( str ) ;
if( len==Write( Output() , str , len ) )return( 0 ) ;
else return( -1 ) ;
}

ooh that was difficult

But it also lacks all local buffering. It doesn't handle BUF_NONE and BUF_LINE properly. With such methods all buffering would be left at the lower level (filesystem and device driver), thru several APIs. The "closer" the buffering is to caller, the faster it is. Also when the buffering is local it has much better knowlege of the actual buffer usage, so further optimizations are possible that would not be at lower level.

Quote

my prog tells me they havent used any of Write(), FPuts(), VFPrintf(),

Your snoop program probably only tells you if something calls dos.library thru vectors. It doesn't detect dos.library calling itself directly via bsr or jsr. Also it misses direct DosPacket I/O to filehandler (for example ixemul uses it).

All dos.library, SAS/C libc & GCC ixemul and GCC libnix libc use similar buffering methods, fgetc for reading data and fputc for writing.

Piru · « **Reply #5 on:** April 22, 2004, 11:06:45 AM »

@whoosh777

Quote

I havent used the blitter for a long time but what happens if you give the blitter an odd address, do you get an exception or just a horrible crash??

No exception. The lowest bit of the address is ignored, so you would get odd_address - 1.

Quote

as I said I can turn all the criticisms of my code around into criticisms of the system design, both h/w and OS,

I am sure of that. However, I would still recommend just testing against NULL return as documented.

Quote

Quote

Even if the app crashes, there is no way to "remove task" safely, since there is no resource tracking or separated address spaces.

if you had just memory tracking then you could remove the tasks allocated memory + code + stack,

I'm afraid that won't work. The memory allocated by the task can still be in use by other tasks / processes. Also the program seglist could be used by interrupts, hooks or other processes. This is why you can't free the task memory or unload the seglist.

Quote

Quote

With such methods all buffering would be left at the lower level (filesystem and device driver), thru several APIs.

... Lots of stuff comparing RAM L1/L2 cache and medium cache ...

correction: closer buffering is to the h/w the faster it is,

see CPU caches for example

I'm afraid this comparision is totally unfair, as typically the medium is tens to hundreds times slower than memory, not to mention L1/L2. It certainly makes no sense at all to locally cache memory!

Quote

Quote

Also when the buffering is local it has much better knowlege of the actual buffer usage, so further optimizations are possible that would not be at lower level.

I totally disagree, buffers should be dynamically allocated at the lowest level, preferably not by the programmer, in fact maybe even not by the filesystem but by a lower level still, though integrating the filesystem with the lowest level maybe is the best approach

I totally disagree with you, see below for an example.

Quote

Also choice of caching algorithm can make a huge difference this is part of why I dont want it done by the programmer,

if its not done by the programmer then it can be retargetted ie reimplemented

But the programmer does not need to do it, libc does it for him/her.

Quote

caches arent everything, filesystem design is equally important at determining speed, a well designed system would be really fast even if the programmer doesnt do any buffering, the subroutine call overhead of fputc( fgetc(infp) , outfp) should be quite tiny because this is such a tiny loop it should entirely be in the memory cache, (with fgetc() copying a byte from a low level buffer and fputc() to a low level buffer) note that not only will the instructions of this be entirely in the instruction caches but the file buffer arrays will also be entirely in data caches, so all round very fast,

I think AmigaOS filesystem is much more efficient than that of Windows XP, its a good filesystem, but it could be a lot better,

Well, apparently you don't know how complex the two APIs are, and how much overhead is caused by it. Maybe a simple example will clear it for you.
Lets imagine simple fputc without any buffering, except at the exec device driver level (which you say will be the most efficient):

- fputc calls Write(fh, &ch, 1); to write the char
- Write calls DoPkt with ACTION_WRITE
- DoPkt sets up a DosPacket with ACTION_WRITE and parameters for the write
- DoPkt PutMsg the DosPacket to filesystem MsgPort and Wait for the reply
- filesystem wakes up from Wait() and GetMsg() the DosPacket
- filesystem determines the DosPacket is ACTION_WRITE and process it
- filesystem ACTION_WRITE updates the current block of the file
- filesystem ACTION_WRITE use DoIO CMD_WRITE (or CMD_WRITE64, or HD_SCSICMD etc) to send IORequest to exec device driver
- DoIO use device driver DEV_BEGINIO vector to send the IORequest
- The device driver DEV_BEGINIO link the IORequest to device task for processing, and return
- DoIO WaitIO is waiting for the IO to finish
- The device task will process the IORequest, see that it's CMD_WRITE (or whatever), and update buffers (and perhaps do the actual IO).
- When the IO is finished the IORequest will return (ReplyMsg)
- DoIO's WaitIO call wakes up, and DoIO returns
- filesystem checks for IO error and io_Actual to see write was successful
- filesystem ACTION_WRITE set dp_Ret1 to 1 (written 1 byte) and dp_Ret2 to 0 and PutMsg() the DosPacket back to caller (DoPkt)
- DoPkt's Wait wakes up, and DoPkt GetMsg the reply DosPacket, and moves dp_Ret1 to d0, and dp_Ret2 to pr_Result2 (IoErr) and returns
- Write returns with 1 byte written
- fputc returns with 1 byte written

The above sequence is for writing single byte without local buffering. It involves several task switches (scheduling) and waiting. In all it is very very time consuming and will kill the performance, regardless of caches.

Now, if you put the caching to filesystem the sequence gets a lot shorter:

- fputc calls Write(fh, &ch, 1); to write the char
- Write calls DoPkt with ACTION_WRITE
- DoPkt sets up a DosPacket with ACTION_WRITE and parameters for the write
- DoPkt PutMsg the DosPacket to filesystem MsgPort and Wait for the reply
- filesystem wakes up from Wait() and GetMsg() the DosPacket
- filesystem determines the DosPacket is ACTION_WRITE and process it
- filesystem ACTION_WRITE updates the current block of the file in cache
- filesystem ACTION_WRITE set dp_Ret1 to 1 (written 1 byte) and dp_Ret2 to 0 and PutMsg() the DosPacket back to caller
- DoPkt's Wait wakes up, and DoPkt GetMsg the reply DosPacket, and moves dp_Ret1 to d0, and dp_Ret2 to pr_Result2 (IoErr) and returns
- Write returns with 1 byte written
- fputc returns with 1 byte written

It still involves several task switches (scheduling) and waiting.

Now, lets put the cache to fputc:

- fputc puts the char to local buffer
- fputc return with 1 byte written

No task switching is involved. Depending on the buffering mode, only filling up the buffer or linefeed will cause actually flush of the cache (Write).

If you still fail to see my point, I can't really help it.

Quote

so the OS is sensibly not writing to the hardware till the cylinder or track is full

Only because you write full tracks. If you would do small writes, it would rewrite the same track several times.

Piru · « **Reply #6 on:** April 27, 2004, 11:11:24 AM »

@whoosh777

SAS/C (6.x) memcpy doesn't handle overlapping copies, at least. Looks like they decided to follow the standard and not support overlapping copies with memcpy after all.

Anyway, standard states that memcpy is not guaranteed to handle overlapping copies, so even if some implementation did, you cannot rely on this feature or you're writing non-portable code.

Author Topic: 68k AGA AROS + UAE => winner! (Read 26176 times)

Piru

Re: 68k AGA AROS + UAE => winner!

Piru

Re: 68k AGA AROS + UAE => winner!

Piru

Re: 68k AGA AROS + UAE => winner!

Piru

Re: 68k AGA AROS + UAE => winner!

Piru

Re: 68k AGA AROS + UAE => winner!

Piru

Re: 68k AGA AROS + UAE => winner!

Piru

Re: 68k AGA AROS + UAE => winner!