@Piru,
I think you know tooo much about AmigaOS!
too much knowledge can be a dangerous thing,
I am thankful I dont have such in depth knowledge,
it would cloud my judgement,
De Bono said that forgetting is an essential part of
intelligence as it acts as a filter,
Piru wrote:
@whoosh777
I havent used the blitter for a long time but what happens if you give the blitter an odd address, do you get an exception or just a horrible crash??
No exception. The lowest bit of the address is ignored, so you would get odd_address - 1.
ok so the blitter is basically quite a maniac!
what happens if you send it off into nonexistent RAM
or into Fastram (which presumably is nonexistent from
its POV),
the last time I looked at the blitter was on my A500,
last century,
Even if the app crashes, there is no way to "remove task" safely, since there is no resource tracking or separated address spaces.
if you had just memory tracking then you could remove the tasks allocated memory + code + stack,
I'm afraid that won't work. The memory allocated by the task can still be in use by other tasks / processes. Also the program seglist could be used by interrupts, hooks or other processes. This is why you can't free the task memory or unload the seglist.
ok I see what you mean, again this is bad system design:
namely not anticipating this type of problem and its
probably going to be difficult to fix this,
My problem is I trust the OS too much,
some OS design criticisms here:
1. shared memory for messaging was a bad decision,
if you do this it should be via some new API call
AllocMessage( struct Task *task, int size ) ;
for sending messages to "task", that way the
message would belong to the recipient managed by the OS,
so you could remove it on removal of the recipient,
the design error was about ownership, AmigaOS made
sender the owner, when in fact it should have been
recipient: if I send you a letter, you own that letter
not me, if I own that letter the world is going to
get really complicated and probably grind to a halt
and give up,
this approach would be better because it would cope
with a task on another machine, ie when you send the
message the OS could copy it if necessary,
2. Code for other tasks to execute which I think is
what you are referring to:
this could have been done by eg
SetupPublicCode( "filename" ) ;
where the OS would load the public code from a file,
exactly same thing for my snoop program:
SetVectorOffset("filename");
The Law of Whoosh:
User progs should be forbidden via API from choosing
public bytes,
(ie user progs cannot delineate public memory areas
to the OS)
This includes: user progs shall *not* set up
struct Message's, OS datastructures shall never
be allocated by user code,
user progs shall not set up
code to be executed by other tasks in no shape way
or form, neither Interrupts, nor jump vectors,
really setting up public interrupts is an OS level
thing so should be disallowed for user programs,
setting up private interrupts is fine,
there need to be 2 types of programs: user programs
and system programs, system programs may set up
public interrupts,
perhaps via a file attribute flag: (like rwed),
or via a "priority" system: each program would have
a priority, only the user via a password may change
program "priority" via Workbench information,
I dont want to enter the minefield of computer security
here, but I know from my understanding of computers that
a completely correct system can be done which will
stand the test of time, ie it will be logically foolproof,
what the OS can do then is when it runs a program
it notes whether its a system program,
if a non system program makes an API call to set
public bytes the task should be suspended and removed,
public datastructures and public code must be entirely
set up by the OS,
You tend to look at things in great depth from
the AmigaOS POV,
however I tend to look at things from the POV of an
abstract generalised OS the way I was taught to by
well qualified theorists,
Its a very safe way to look at things,
Also when the buffering is local it has much better knowlege of the actual buffer usage, so further optimizations are possible that would not be at lower level.
I totally disagree, buffers should be dynamically allocated at the lowest level, preferably not by the programmer, in fact maybe even not by the filesystem but by a lower level still, though integrating the filesystem with the lowest level maybe is the best approach
I totally disagree with you, see below for an example.
all your example proves is how terribly designed
dos.library is,
I didnt realise it was that useless,
so yes your point is correct *relative* to AmigaOS,
I was talking relative to a "sane" system,
Also choice of caching algorithm can make a huge difference this is part of why I dont want it done by the programmer,
if its not done by the programmer then it can be retargetted ie reimplemented
But the programmer does not need to do it, libc does it for him/her.
given that dos.library is so useless then yes
libc needs to shield out dos.library,
and it may become more efficient but never really
efficient,
I am learning a lot of things from you,
but things I would prefer not to know,
caches arent everything, filesystem design is equally important at determining speed, a well designed system would be really fast even if the programmer doesnt do any buffering,
read my lips!
(see later),
the subroutine call overhead of fputc( fgetc(infp) , outfp) should be quite tiny because this is such a tiny loop it should entirely be in the memory cache, (with fgetc() copying a byte from a low level buffer and fputc() to a low level buffer) note that not only will the instructions of this be entirely in the instruction caches but the file buffer arrays will also be entirely in data caches, so all round very fast,
I think AmigaOS filesystem is much more efficient than that of Windows XP, its a good filesystem, but it could be a lot better,
Well, apparently you don't know how complex the two APIs are, and how much overhead is caused by it.
correct: and I'm quite shocked by your example,
correction: you need to replace the words "how complex" by
"what an incomprehensible mess",
did someone sabotage dos.library,
I'm beginning to wonder,
Maybe a simple example will clear it for you.
it did, unfortunately,
Lets imagine simple fputc without any buffering, except at the exec device driver level (which you say will be the most efficient):
- fputc calls Write(fh, &ch, 1); to write the char
- Write calls DoPkt with ACTION_WRITE
- DoPkt sets up a DosPacket with ACTION_WRITE and parameters for the write
- DoPkt PutMsg the DosPacket to filesystem MsgPort and Wait for the reply
- filesystem wakes up from Wait() and GetMsg() the DosPacket
- filesystem determines the DosPacket is ACTION_WRITE and process it
- filesystem ACTION_WRITE updates the current block of the file
- filesystem ACTION_WRITE use DoIO CMD_WRITE (or CMD_WRITE64, or HD_SCSICMD etc) to send IORequest to exec device driver
- DoIO use device driver DEV_BEGINIO vector to send the IORequest
- The device driver DEV_BEGINIO link the IORequest to device task for processing, and return
- DoIO WaitIO is waiting for the IO to finish
- The device task will process the IORequest, see that it's CMD_WRITE (or whatever), and update buffers (and perhaps do the actual IO).
- When the IO is finished the IORequest will return (ReplyMsg)
- DoIO's WaitIO call wakes up, and DoIO returns
- filesystem checks for IO error and io_Actual to see write was successful
- filesystem ACTION_WRITE set dp_Ret1 to 1 (written 1 byte) and dp_Ret2 to 0 and PutMsg() the DosPacket back to caller (DoPkt)
- DoPkt's Wait wakes up, and DoPkt GetMsg the reply DosPacket, and moves dp_Ret1 to d0, and dp_Ret2 to pr_Result2 (IoErr) and returns
- Write returns with 1 byte written
- fputc returns with 1 byte written
I'm totally horrified,
The above sequence is for writing single byte without local buffering. It involves several task switches (scheduling) and waiting. In all it is very very time consuming and will kill the performance, regardless of caches.
one of the nice things about a lot of AmigaOS is it
doesnt bother with OS tasks, OS API calls are
executed by the task,
not so with dos.library,
IMO the OS being executed by the task is one of the
redeeming features of AmigaOS, your example proves
how useless the concept of separate OS tasks is,
I now understand why AmigaOS is often so responsive,
and why the filesystem can be so unresponsive,
Access to a particular partition should be controlled
via eg semaphores, so only 1 exclusive task should
write to a partition, but many may read a partition,
before a task may write to a partition, it should be
queued up till all current readers have finished and gone,
all this partition access happening via OS library code,
ie not via user code,
Indisputible Fact: only 1 task may write to a partition at a time, many may read simultaneously, (**)
dos.library appears (maybe you'll correct me as you
have memorised AmigaOS) to achieve (**) by
having filesystem tasks,
Now maybe because disks are heavy duty things
you should also make read access exclusive,
otherwise the disk head will keep flitting between
2 positions and take forever, I have to think about
this further,
to keep the design simple it may be best to make
all access to a given physical drive exclusive,
regardless of filesystem, (1)
note how Semaphores achieve exactly the same
goal as filesystem-tasks,
I have to tell you that filesystem-tasks sounds
very Unixish,
IMO (1) is the best approach via Semaphores,
probably I wouldnt use exec's Semaphores but
implement my own ones specifically for the
filesystem(s): general purpose semaphores
are going to be much slower than special
purpose ones,
via the above explanation I can use
exclusion only Semaphores which can be done
very efficiently via eg #?.library code:
TAS (SEMAPHORE_OFFSET,A2) ; fp in non scratch a2,
BEQ continue
; queue up our task and surrender the cpu,
; time overhead here, atypical situation: rare to queue up,
continue: ; we own the semaphore,
; do the i/o eg fputc,
so the semaphore amounts to 2 asm instructions
when there's no queue, using exec would be too
much overhead for fputc,
I think BSET can also be used instead of TAS,
not had time to see which is better,
semaphores require indivisible read-modify-write
asm instructions,
BTW I think you raise some very interesting points,
so maybe your highly specific knowledge has some value!
Now, if you put the caching to filesystem the sequence gets a lot shorter:
- fputc calls Write(fh, &ch, 1); to write the char
- Write calls DoPkt with ACTION_WRITE
- DoPkt sets up a DosPacket with ACTION_WRITE and parameters for the write
- DoPkt PutMsg the DosPacket to filesystem MsgPort and Wait for the reply
- filesystem wakes up from Wait() and GetMsg() the DosPacket
- filesystem determines the DosPacket is ACTION_WRITE and process it
- filesystem ACTION_WRITE updates the current block of the file in cache
- filesystem ACTION_WRITE set dp_Ret1 to 1 (written 1 byte) and dp_Ret2 to 0 and PutMsg() the DosPacket back to caller
- DoPkt's Wait wakes up, and DoPkt GetMsg the reply DosPacket, and moves dp_Ret1 to d0, and dp_Ret2 to pr_Result2 (IoErr) and returns
- Write returns with 1 byte written
- fputc returns with 1 byte written
It still involves several task switches (scheduling) and waiting.
basically the design flaw is filesystem tasks,
the OS is just datastructures + code to manage
those datastructures, I think its fine if said code is
executed by user tasks, the code itself would be
library code, why have the silly extra task switch +
messaging overhead of having a filesystem execute
exactly the same code??
Now, lets put the cache to fputc:
- fputc puts the char to local buffer
- fputc return with 1 byte written
No task switching is involved. Depending on the buffering mode, only filling up the buffer or linefeed will cause actually flush of the cache (Write).
If you still fail to see my point, I can't really help it.
Not only do I see your point, I also see the cause
of your point and beyond that cause to the fix of the
cause of your point,
your last code fragment underlines my comment that
filesystems should be executed in the calling tasks
context,
note also my comments that there should be
exclusive read + write access at the physical drive level,
its fine to have 5 tasks in parallel accessing 5
different physical drives,
the idea may need some refinements eg you may need
to have exclusive access to all drives of
a given SCSI interface socket,
its all about whether (parallel action) faster than
(serial action),
only verifiable by experiment, = empiricism,
I know that parallel access to a given drive
is really inefficient,
there are 2 totally different issues going on here:
1. datastructure complexity: 5 tasks in parallel
modifying the same datastructures could be too complex,
so you use Semaphores to reduce this to 1 task,
dont use Forbid() Permit() as its a local exclusion
problem, like a local anaesthetic, just
anaesthetize the physical drive, not the entire computer,
2. data transfer speed: if 2 tasks in parallel
access the same drive, with 3cm between the disk head
position, an unacceptable amount of time will be
spent by the head moving between the 2 positions,
Note that the speed issue probably is less relevant
for Ram:, though caching issues will affect speed,
so it may be best not to flit around ram: too much,
IMO you only need 2 levels of caches:
cache for indivisible physical disk reads,
call these cylinders (or maybe tracks),
cache for files, the file write cache probably should be
the size of the largest available free subset of
available indivisible disk reads,
the file read cache will be quite different from
the write cache, may even have a collective file
read cache,
fgetc and fputc would deal almost directly with
this file cache, which is kind of what I was
thinking about in my original comment,
[removed a comment which I realised was false]
Semaphoring adds an overhead of 2 asm instructions
in the typical case (just one user prog, no queues),
once the file write cache is full you read the foreseen
cylinder, write out the cache to the buffer,
write modified buffer to the drive,
the new largest available size will now be smaller
(unless eg file deletions have occured) so the
same buffer could be retained for this file,
If the program flushes the buffer early eg
by closing the file then a different cylinder
may be chosen for the write eg the nearest
cylinder with enough space to reduce disk head
movement,
also for file reads the file cache may only
need to be partially filled, just fill in
the blocks in the current disk read,
no need to fill the whole thing,
possibly achieved via collective read cache,
so if the prog looks at position 100000 of the
file, then say this is in block 200 of the file
and block 200 is in cylinder 10 of the disk,
then cylinder 10 = indivisible disk read (maybe track),
is read to fill in an indivisible disk read cache,
(A)
(read caches are quite different from write caches),
I wont copy other blocks of this file until and unless
requested by the program,
If the program now moves to a totally different
file position in a totally different cylinder,
the cache (A) would still be kept, eg I may keep
allocating such caches according to some policy,
now if the prog references a different subblock of
(A), I copy that over into the file cache,
no disk read required,
I dont just have caches I have caches of caches,
when the collective file read cache is full I will override particular subblocks by new read blocks,
according to a well thought through policy,
anyway I am illustrating here that caching is quite
complex and must not be done by the user program,
user program caches just add an extra array copy
overhead, you are just replacing a function call overhead
with an array copy overhead + index management overhead:
fputc( c , fp ) ;
vs
mybuffer[ i++ ] = c ; /* index management */
/*
unnecessary memory access and index update + address
calculation
*/
if( i==full ) /* index caution */
{
fput_array( mybuffer , full , fp ) ;
i = 0 ; /* index reset */
}
fput_array() has to do memory to memory copying,
via indexing => duplication of indexing effort,
if(i==full) effort also duplicated,
my method: c only gets written to the file cache once,
your method: c is written to memory twice,
read from memory once, 3 x as many memory accesses
your method: indexing happens twice,
my method: indexing happens once,
your method: checking for index full happens twice,
my method: happens once,
in assembler my method could be merely:
move.l c,d0
jsr fputc
if fp is in a nonscratch register argument,
(an argument in favour of nonscratch argument registers )
and an extra 2 asm instructions of overhead for
the semaphoring in OS library function fputc,
1 semaphore per physical drive, pointer to semaphore
is a field in fp,
note that in your method the user progs buffering
is quite a number of instructions which are
themselves memory accesses,
those instruction fetches are probably cached, but
everything here is probably cached,
:instructions, buffers, stack, all in CPU caches,
so the OS is sensibly not writing to the hardware till the cylinder or track is full
Only because you write full tracks. If you would do small writes, it would rewrite the same track several times.
[/quote]
for writes my approach would use several track caches,
ie caching of the caches:
eg on a current Intel I may buffer the entire
floppy disk,
if its a 120Gig disk and 200 Mb free Ram I may
dynamically go up to 100Mb of caches,
for the floppy disk write example on Intel sectors referenced would be read,
writing would happen at CloseDevice() time
and only of modified tracks,
so a total minimum of track writes would happen,
There may be some glitches in what I've said
(its late at night)
but the underlying idea is rock solid,