Amiga.org
Amiga computer related discussion => Amiga Software Issues and Discussion => Topic started by: SpeedGeek on December 28, 2014, 04:54:04 PM
-
Here is link to this thread:
http://eab.abime.net/showthread.php?p=993920
-
Here is link to this thread:
http://eab.abime.net/showthread.php?p=993920
Two notes on this: First, the smallest patch on CopyMemQuick is zero bytes, no patch. Second, MOVE16 is *not* a save instruction. It initiates a burst. Unfortuntely, Zorro does know nothing about bursts. Thus, depending on the CPU card you have, multiple things may happen. If you MOVE16 from on-board RAM to on-board RAM, you're fine. If you copy into RAM on a Zorro board, the result could either fail, or could be slow. For my machine, it is only slow - the glue-logic on the CPU board detects the attempt of the CPU to burst, aborts it, and runs regular cycles instead. Net result: MOVE16 is slower than four moves. General idea: Don't try to optimize unless you're sure what you're doing. MOVE16 is, in general, not a good idea. There are cases where it is, if you can be perfectly sure that there's no Zorro involved and you are moving from CPU-RAM to CPU-RAM, but the entire detection logic for this may take longer than to perform the actual move.
-
Zero bytes? You know how to make a CMQ patch this small? Goodie, I can't wait to try it out.
The case when 4 Long moves is faster than Move16 is when the Copyback cache is enabled and the 4 Long moves obtain best case performance, but in the case of worst case performance Move16 is much faster. The average case performance probably occurs at 50% of the size of the 040's data cache... and that's why I have copy block size limit >= 2048 bytes before any Move16 is enabled!
-
Breaking this down into layman's terms, would you say this version is faster than, not as fast, or equal to this version:
http://aminet.net/package/util/boot/CopyMem
Since it seems like they both rely on Move16?
(bonus points for the author mentioning @Thomas_Richter in the description of this program, haha) ;)
-
Breaking this down into layman's terms, would you say this version is faster than, not as fast, or equal to this version:
You cannot make promises, in general, whether MOVE16 is slower, or faster, or even works at all, this is really the major problem. As said, MOVE16 initiates a burst, even if the memory region is marked as "non-cachable", which may or may not work, depending on the hardware, the bus between and other factors. A burst access over Zorro-II is nothing that is described in the Zorro documents, so whether that works or not is really up to your hardware.
One way or another, it is a corner case, and if it works for you, good for you. It certainly does not work for me, and you should be careful installing such patches in either case. They *may* or *may not* provide a speed benefit, or may even break the system.
-
Zero bytes? You know how to make a CMQ patch this small? Goodie, I can't wait to try it out.
It's called "Use the Os provided function".
The case when 4 Long moves is faster than Move16 is when the Copyback cache is enabled and the 4 Long moves obtain best case performance, ...
No, it's not. Again, whether bursting works over Zorro or not is a matter of luck. For my A2000, MOVE16 is *slow* when I move into the graphics card ram of the GVP spectrum. This is non-cacheable (!), imprecise, non-serial. Thus, the CPU may reorder accesses, does not need to expect bus-errors, but may not cache them, but yet, surprisingly, MOVE16 is slower than four moves. I already said why that is: Bursts over Zorro are no-no's, and the hardware may have to run in circles to get the data over the bus. We tested all this back then for P96, as it was suggested that MOVE16 may improve some blitter emulation cases. It does not. Worst, it may break things. Simply don't try that, it's a bad idea.
Other than that, have you made measurements which speed benefit this program has? I mean, in a realistic use case? If so, I would be interested to learn about your results. Which programs to run, what did the program do, and how did you measure?
-
One way or another, it is a corner case, and if it works for you, good for you. It certainly does not work for me, and you should be careful installing such patches in either case. They *may* or *may not* provide a speed benefit, or may even break the system.
Understood, no warranties. ;) Reason I asked is because I've been running the other version (the one in the Aminet link) on my '040 A2000 for over a year. No issues. I always like having "the latest and greatest", so I was wondering if SpeedGeek thinks his new version is an improvement on this version that's already available, since it seems like they do much the same thing (and the source code for the Aminet version is included in the archive, so it should be pretty easy to compare the two).
But, maybe I should just test it and see. :roflmao:
-
If you MOVE16 from on-board RAM to on-board RAM, you're fine.
It depends. The 040 errata mentions problems with MOVE16 and I don't think anyone ever implemented any of the workrounds.
It's like reducing the weight of your car to make it go quicker, by removing the brakes and airbags.
I'm not convinced that you ever see a real world improvement with these patches. I don't remember my Amiga copying memory constantly, the entire OS design is based around never copying. A lot of software has it's own memcpy() and doesn't use exec anyway because the overhead of calling into exec when you're copying small amounts of data is not worth it.
-
It's called "Use the Os provided function".
So you really don't have a patch that small. If the "OS provided function" was fast enough than why would anyone bother to make a patch in the first place?
No, it's not. Again, whether bursting works over Zorro or not is a matter of luck. For my A2000, MOVE16 is *slow* when I move into the graphics card ram of the GVP spectrum. This is non-cacheable (!), imprecise, non-serial. Thus, the CPU may reorder accesses, does not need to expect bus-errors, but may not cache them, but yet, surprisingly, MOVE16 is slower than four moves. I already said why that is: Bursts over Zorro are no-no's, and the hardware may have to run in circles to get the data over the bus. We tested all this back then for P96, as it was suggested that MOVE16 may improve some blitter emulation cases. It does not. Worst, it may break things. Simply don't try that, it's a bad idea.
Other than that, have you made measurements which speed benefit this program has? I mean, in a realistic use case? If so, I would be interested to learn about your results. Which programs to run, what did the program do, and how did you measure?
The Zorro2 bus does NOT support Burst and so again as with Chip RAM Burst is a non-issue. Move16 does NOT need Burst to obtain a performance benefit. While it's certainly true Burst capable memory can improve Move16 performance it's true to same extent Burst would improve the performance of MoveL and any other instruction.
Move16 get's it main performance benefit because it interacts differently with the data cache than MoveL. This means Move16 is not affected by the worst case performance problem when the Copyback cache is enabled. This also means it can't benefit from the best case performance as MoveL can.
I have already posted a Testit result indicating a 44% speed increase with Move16 on EAB. As I said previously it's the SIZE of the copy which determines whether or not Move16 offers any performance benefit.
Move16 should not cause any problems with the MMU reordering a write to any Zorro2 or Chip RAM since the write cycle will be completed as 4 separate longword writes. But if you want to play it safe you can always fix the MMU config.
What's really surprising here is how people can continue to read the 040 and 060 documentation and ignore the very obvious:
5.4.6 Transfer Burst Inhibit (TBI)
This input signal indicates to the processor that the accessed device cannot support burst mode accesses and that the requested line transfer should be divided into individual longword transfers. Asserting TBI with TA terminates the first data transfer of a line access, which causes the processor to terminate the burst and access the remaining data for the line as three successive long-word transfers. During alternate bus master accesses, the M68040 samples the TBI to detect completion of each bus transfer.
-
It's called "Use the Os provided function".
So you really don't have a patch that small. If the "OS provided function" was really fast enough than why would anyone bother to make a patch in the first place?
No, it's not. Again, whether bursting works over Zorro or not is a matter of luck. For my A2000, MOVE16 is *slow* when I move into the graphics card ram of the GVP spectrum. This is non-cacheable (!), imprecise, non-serial. Thus, the CPU may reorder accesses, does not need to expect bus-errors, but may not cache them, but yet, surprisingly, MOVE16 is slower than four moves. I already said why that is: Bursts over Zorro are no-no's, and the hardware may have to run in circles to get the data over the bus. We tested all this back then for P96, as it was suggested that MOVE16 may improve some blitter emulation cases. It does not. Worst, it may break things. Simply don't try that, it's a bad idea.
Other than that, have you made measurements which speed benefit this program has? I mean, in a realistic use case? If so, I would be interested to learn about your results. Which programs to run, what did the program do, and how did you measure?
The Zorro2 bus does NOT support Burst and so again as with Chip RAM Burst is a non-issue. Move16 does NOT need Burst to obtain a performance benefit. While it's certainly true Burst capable memory can improve Move16 performance it's true to same extent Burst would improve the performance of MoveL and any other instruction.
Move16 get's it main performance benefit because it interacts differently with the data cache than MoveL. This means Move16 is not affected by the worst case performance problem when the Copyback cache is enabled.
I have already posted a Testit result indicating a 44% speed increase with Move16 on EAB. As I said previously it's the SIZE of the copy which determines whether or not Move16 offers any performance benefit.
Move16 should not cause any problems with the MMU reordering a write to any Zorro2 or Chip RAM since the write cycle will be completed as 4 separate longword writes. But if you want to play it safe you can always fix the MMU config.
What's really surprising here is how people can continue to read the 040 and 060 documentation and ignore the very obvious:
5.4.6 Transfer Burst Inhibit (TBI)
This input signal indicates to the processor that the accessed device cannot support burst mode accesses and that the requested line transfer should be divided into individual longword transfers. Asserting TBI with TA terminates the first data transfer of a line access, which causes the processor to terminate the burst and access the remaining data for the line as three successive long-word transfers. During alternate bus master accesses, the M68040 samples the TBI to detect completion of each bus transfer.
-
So you really don't have a patch that small. If the "OS provided function" was fast enough than why would anyone bother to make a patch in the first place?
Some people will spend time doubling the speed of a routine that takes 100ms and is only ever run once.
Do you have any benchmarks of real software before and after installing the patch?
MOVE16 isn't safe on an mmu less 040, although it's arguable that an mmu less 040 is safe in an amiga at all (yet they seem to exist).
-
So you really don't have a patch that small. If the "OS provided function" was fast enough than why would anyone bother to make a patch in the first place?
Some people will spend time doubling the speed of a routine that takes 100ms and is only ever run once.
Do you have any benchmarks of real software before and after installing the patch?
MOVE16 doesn't appear to be safe on an mmu less 040 as you can't use the workaround in the errata, although it's arguable that an mmu less 040 is safe in an amiga at all (yet they seem to exist).
The TBI line isn't a solution, it completes the burst and then throws away the extra results. If you write and the data isn't in the cache it will try to burst read the cache line and throw that away too.
http://amigadev.elowar.com/read/ADCD_2.1/AmigaMail_Vol2_guide/node0161.html
-
Some people will spend time doubling the speed of a routine that takes 100ms and is only ever run once.
IMHO I love that people like SpeedGeek and Cosmos are taking on these "micro optimizations" of old Amiga code. I know other people's mileage may vary, but I'm running a ton of their patches on my A2000, and sitting right next to a 2000MHz PC running the latest version of Lubuntu, my 33MHz Amiga still feels like it flies. :)
-
I know other people's mileage may vary, but I'm running a ton of their patches on my A2000, and sitting right next to a 2000MHz PC running the latest version of Lubuntu, my 33MHz Amiga still feels like it flies. :)
I think that might be a perception bias. I have a 2.5ghz Windows 8.1 laptop and if commodore had anything that felt this quick they wouldn't have gone bankrupt. The boot-up speed is probably the only thing the Amiga wins on, but my c128 boots up even faster.
-
I'm not convinced that you ever see a real world improvement with these patches. I don't remember my Amiga copying memory constantly, the entire OS design is based around never copying. A lot of software has it's own memcpy() and doesn't use exec anyway because the overhead of calling into exec when you're copying small amounts of data is not worth it.
Many RTG-based games use CopyMem() because they manage directly with ARGB/LUT buffers and copy data around. Those could be good candinate for benchmarking CopyMem() patches in real life.
But of course... if CPU is too slow it is too slow and no patch can help it.
-
I think that might be a perception bias. I have a 2.5ghz Windows 8.1 laptop and if commodore had anything that felt this quick they wouldn't have gone bankrupt. The boot-up speed is probably the only thing the Amiga wins on, but my c128 boots up even faster.
Okay, there "might be" some perception bias involved, but let's see... 2000 / 33 = 60. It certainly doesn't feel 60x faster than my Amiga! :laughing:
...and it's for darn sure not nearly as fun, either. :lol:
-
Okay, there "might be" some perception bias involved, but let's see... 2000 / 33 = 60. It certainly doesn't feel 60x faster than my Amiga! :laughing:
If you do similar things on both then how much faster does it feel?
ram speed hasn't kept up with cpu speed, so you can't expect it to be 60 times quicker anyway.
-
The case when 4 Long moves is faster than Move16 is when the Copyback cache is enabled and the 4 Long moves obtain best case performance, but in the case of worst case performance Move16 is much faster. The average case performance probably occurs at 50% of the size of the 040's data cache... and that's why I have copy block size limit >= 2048 bytes before any Move16 is enabled!
The best copy block size limit used before switching to a MOVE16 unrolled copy loop is different for the 68040 and 68060. The 68060 doesn't need an unrolled MOVE.L loop to give maximum copy performance which preserves ICache (faster). Of course the AmigaOS uses a MOVEM.L loop instead of an unrolled MOVE.L loop which is poor for every 68k CPU except large copies on the 68000/68010 and 68020/68030.
Breaking this down into layman's terms, would you say this version is faster than, not as fast, or equal to this version:
http://aminet.net/package/util/boot/CopyMem
Since it seems like they both rely on Move16?
There is a version of CopyMem which does not use MOVE16. ZorroII and custom chip address space can be checked and avoided with little overhead also. I have not heard from anyone experiencing instability on any Amiga from using MOVE16 though.
Many RTG-based games use CopyMem() because they manage directly with ARGB/LUT buffers and copy data around. Those could be good candidate for benchmarking CopyMem() patches in real life.
This is true. With exec.library CopyMem() patched, the overhead of using the library function is less than most compiler memory copy functions (for example the SAS/C copy routine). Some programmers like NovaCoder take advantage of this. GCC 3.4 may use exec.library CopyMem() for C memcpy(). AWeb uses CopyMem() for screen updates and while scrolling where there is a noticeable difference in scrolling speed on a 68060 (and probably 68040) with CopyMem() patched. AmigaOS and MUI use CopyMem() a lot so patching should free some CPU cycles as well.
-
If you do similar things on both then how much faster does it feel?
ram speed hasn't kept up with cpu speed, so you can't expect it to be 60 times quicker anyway.
OMG guys. You all have about the squarest sense of humor, ever. I've been working on computer hardware for 30 years, of course I know that. I was trying to make a joke! *facepalm*
But I'll be damned if Wordsworth on my Amiga doesn't feel faster than OpenOffice on the Linux box. Of course you can say just one word to that: Java. HA! :D
-
** NEWS UPDATE **
CMQ&S040 v1.6 released
v1.6 minor change
- source address compare code misqualified Move16 on 8 byte offset
(This is fixed now but the 4 byte offset still doesn't work for some reason)
-
** NEWS UPDATE **
CMQ&S040 v1.6 released
v1.6 minor change
- source address compare code misqualified Move16 on 8 byte offset
(This is fixed now but the 4 byte offset still doesn't work for some reason)
The problem is - if something breaks, people are rarely aware or even able to relate that to the patch. As I said, MOVE16 *may* work fine on the CPU memory directly on the turbo board, but may fail when going over Zorro, or may be at least slower.
Now think again: How many people will consider your patch faulty if some program creates graphics defects? How many people will benchmark the copy operation to *rtg memory*? Actually, *did you* benchmark? Did you benchmark on every possible hardware combination? I can only re-ensure you that it's slower on my A2000.
-
The problem is - if something breaks, people are rarely aware or even able to relate that to the patch. As I said, MOVE16 *may* work fine on the CPU memory directly on the turbo board, but may fail when going over Zorro, or may be at least slower.
The only solution to that problem is to run something that actually tests every single memory type in your computer and tells you whether it worked and what speed it was.
Then ideally it would be able to create a configuration so that certain types of memory could be excluded etc. Effectively a CopyMem construction kit.
-
Some people will spend time doubling the speed of a routine that takes 100ms and is only ever run once.
IIRC matthey logged copymem/copymemquick calls on an Amiga with >100MB of RAM and ran out of memory in 1 minute!
Do you have any benchmarks of real software before and after installing the patch?
MOVE16 doesn't appear to be safe on an mmu less 040 as you can't use the workaround in the errata, although it's arguable that an mmu less 040 is safe in an amiga at all (yet they seem to exist).
Testit is really not a good program for testing Move16 performance (Of course it was written for 020 and earlier CPUs). I can run CMQ&S040 before Setpatch and any MMU code is installed. I can execute the s-s which then loads Setpatch and the MMU code.
The TBI line isn't a solution, it completes the burst and then throws away the extra results. If you write and the data isn't in the cache it will try to burst read the cache line and throw that away too.
http://amigadev.elowar.com/read/ADCD_2.1/AmigaMail_Vol2_guide/node0161.html
WTF? TBI doesn't complete the burst it TERMINATES the burst! Throws away the extra results? What extra results are there? 4 longwords requested = 4 longwords completed. FYI, the cache control logic really doesn't care if the 4 longwords were transfered in a burst or non-burst cycle.
-
You know what, I'm only going on what I read. It's not new though.
http://www.programd.com/2_f594b4220be791d2_1.htm
-
Breaking this down into layman's terms, would you say this version is faster than, not as fast, or equal to this version:
http://aminet.net/package/util/boot/CopyMem
Since it seems like they both rely on Move16?
That's a very general question to ask, but a question which has very specific and qualified answers.
Faster in which category? Best, average, or worst case copies? Large, medium, or small size copies? Faster on 16 byte, longword, word or byte copies? Faster on aligned or mis-aligned copies. Faster on 020, 030, 040 or 060?
Any CMQ patch can be optimized to give better performance for a specific category but that will reduce it's performance in another category.
-
Any CMQ patch can be optimized to give better performance for a specific category but that will reduce it's performance in another category.
Actually, I would be more curious to hear about any application that profits from such a patch. For me, uses of CopyMemQuick() are too rare to make any measurable difference in everyday usage. There may be exceptions, as always.
I would rather say that a an application that critically depends on memory-copies implements the copy itself, without going through the Os as there are many other factors only the calling program can know. For example, a "move" moves into and out of the cache. A move16 does not. Is that good or bad? MOVE16 doesn't "pollute" the cache. move "already fills the cache with the target data". Whether that is something you want or do not want cannot be distinguished by CopyMemQuick(). It is something only the calling program can known - and hence, only the calling program can select the optimal strategy. CopyMemQuick() is the "Ford Escord" you may select if it is "fast enough", so it's usually not worth the trouble patching into this call, even more so as it is rarely used.
-
I would rather say that a an application that critically depends on memory-copies implements the copy itself, without going through the Os as there are many other factors only the calling program can know. For example, a "move" moves into and out of the cache. A move16 does not. Is that good or bad? MOVE16 doesn't "pollute" the cache. move "already fills the cache with the target data". Whether that is something you want or do not want cannot be distinguished by CopyMemQuick(). It is something only the calling program can known - and hence, only the calling program can select the optimal strategy. CopyMemQuick() is the "Ford Escord" you may select if it is "fast enough", so it's usually not worth the trouble patching into this call, even more so as it is rarely used.
Remember our OS blitting routine argument? You just stated the reason for writing one's own blit routine: A one size fits all routine isn't always the best solution.
-
Remember our OS blitting routine argument? You just stated the reason for writing one's own blit routine: A one size fits all routine isn't always the best solution.
Not exactly. The question is "what is the solution you look for", and "what can the Os do for you", and "is a patch worth doing", and is "not using the Os" worth it. Each decision has advantages and drawbacks.
In case of doubt: Avoid a patch, especially if the average savings are negligible. In case of doubt: Use the Os for the job, unless you get substantial savings doing otherwise.
What happens now in the average program? If you don't care much, you probably pick memcpy() from the standard library or CopyMemQuick(). The former may or may not use the Os - it is rather inlined. If it matters much, you problaby have your own routine.
For the blitter, you get however a substantial disadvantage from not using the Os: If you try to implement a graphics primitivity, it might simply not work on an rtg system if you don't use the Os. Is it worth not using the Os? Typically not, because you "shoot yourself in the foot".
Thus, the situation between "patch" and "program", "copy mem" and "blitter" are not quite as symmetric as you may want to present them.
-
What's the big deal, it's just one out of hundreds of patches around. It is entirely optional to add patches and updates.
-
For the blitter, you get however a substantial disadvantage from not using the Os: If you try to implement a graphics primitivity, it might simply not work on an rtg system if you don't use the Os. Is it worth not using the Os? Typically not, because you "shoot yourself in the foot".
Whether it's worth it or not depends entirely on one's requirements. The OS blit routine is generic, and therefore unsuitable for fast, non-generic blits, even when the nature of the blits is very simple (you can see this clearly when you look at the function call). The right solution for getting both maximum performance on native screen modes and have GFX card compatibility is to simply implement both methods.
-
Whether it's worth it or not depends entirely on one's requirements. The OS blit routine is generic, and therefore unsuitable for fast, non-generic blits, even when the nature of the blits is very simple (you can see this clearly when you look at the function call). The right solution for getting both maximum performance on native screen modes and have GFX card compatibility is to simply implement both methods.
And from where do you know that you have a gfx card in the system? And is it worth to implement both methods? For rendering graphics, it *usually* doesn't matter - not worth spending bytes on this decision. For implementing an I/O driver, the situation might again be different (though it probably shouldn't copy data in first place).
Yes, as usual, it depends on the situation, but I currently cannot come up with a situation where I would need to blit something and not use the Os, or where I would need to copy a lot of memory, and the simple CopyMemQuick() interface is sufficient.
For example, we had the problem in P96 to copy memory quickly around (from the board to an off-board buffer, and reverse), but CopyMemQuick() is not even closely sufficient for that as it was necessary to copy "rectangular" memory regions with a different "modulo" factor from A to B. CopyMemQuick() cannot do that.
-
What's the big deal, it's just one out of hundreds of patches around. It is entirely optional to add patches and updates.
There are two answers I can give, a generic one and one that is specific to this series of patches. The generic one is that I have a problem with the "patch culture" on the Amiga. There are so many bad patches around that actually break interfaces that it is hard for the average user to maintain a stable system. Programs like MCP really *broke* certain functions, and provided functionality that was available in the Os anyhow - you just had to use them. The same goes for this program, a small improvement (if at all) for a patch, probably not worth thinking about.
Or to put it differently: The easyness of the average user to install a patch is in no relation to the complexity of implementing a *correct* patch, and the average quality of patches is pretty "average".
The specific one is that MOVE16 is not a good instruction to use on the Amiga. Problem is that MOVE16 runs a burst-cycle, even into memory or target regions that are marked as "cache-inhibited". The problem is now that it depends on the well-behaivedness of the turbo-board to detect this case and abort the burst. Given the "rather average" quality of some expansions and extensions, I would not be surprised that this actually doesn't always work as it should. Indeed, if I test this on my A2000, *not* trying to burst provides a small but measurable speed advantage over trying to initiate the burst.
-
IMHO I love that people like SpeedGeek and Cosmos are taking on these "micro optimizations" of old Amiga code. I know other people's mileage may vary, but I'm running a ton of their patches on my A2000, and sitting right next to a 2000MHz PC running the latest version of Lubuntu, my 33MHz Amiga still feels like it flies. :)
You can make them faster turning off things like pop up info and prefetch.
You could be limited by bloated programs.
-
There are two answers I can give, a generic one and one that is specific to this series of patches. The generic one is that I have a problem with the "patch culture" on the Amiga.
The "patch culture" is what makes Amiga's the most fun! You can customize and tune it to make it your own. No two Amiga's will ever be the same. Or should we all just go back to running stock 3.1? My other favorite hobby is working on old cars. Replacing the carburetor with a higher flow model. Replacing the stock exhaust manifold with some custom headers. Isn't that the same as what people are doing with their Amiga's? Or would you say everyone should keep their cars exactly stock, as well?
Oh well, to each their own! :D
-
Replacing the stock exhaust manifold with some custom headers. Isn't that the same as what people are doing with their Amiga's?
You realise and take responsibility that your car could explode killing you and your entire family if you start changing things.
If you're running any patches then don't expect to be able to raise any bug reports against any software you run. Also don't release any software if you've only tested it while running unofficial patches.
-
The "patch culture" is what makes Amiga's the most fun! You can customize and tune it to make it your own.
What you don't understand is that this makes developing software for the Amiga needlessly hard and resource intensive. The problem is that I cannot simply go along and implement a program against the Os interfaces. It happened often enough that something broke at user side, and not because of a software bug, but because of a patch.
Thus, patches create additional maintenance issues for any software developer, frustration and cost, in the end both for the user and the program author. Patches prevent "isolation" between software projects - they create dependencies between the patch author and the software author.
-
The "patch culture" is pretty much what keeps this platform alive and moving forward still.
-
You make it sound as if there's much software developed, but really there isn't.
-
You realise and take responsibility that your car could explode killing you and your entire family if you start changing things.
If you're running any patches then don't expect to be able to raise any bug reports against any software you run. Also don't release any software if you've only tested it while running unofficial patches.
Bit of an extreme example, doncha think? ;)
And in 30 years of working on computers, I think the only time I ever emailed someone with a legitimate software bug report was with Tales Of Gorluth, which has a "load game" function, but nowhere is it documented how to actually save the game in progress. Still haven't heard back from that one. Oh well, maybe I should've asked them in German! ;)
I'm just gonna say, there is always people who think a certain way, and always people who think a different. Like they say about opinions and the Internet. Continuing my car metaphor, if you were to go to a classic car forum you could see people arguing all day long about the merits of one spark plug brand over another. Or I'm sure if you went to an Atari ST forum you'd find people arguing about something similar to the arguments here. Hell, I bet if you went to a forum about kitchen utensils, you'd find people arguing about one type of mixer or another. No one has ever said "that Internet argument fundamentally changed my way of thinking", nor will they ever. So, keep on keepin' on. Hate on patches or use them religiously, I think the Amiga is more fun because you can customize it how you want. Some people don't like that, and for them there's also the boring, generic import cars. ;)
-
You make it sound as if there's much software developed, but really there isn't.
But is that because or despite the patchery? Besides, I wouldn't call the average "making the average program 1% faster" "keeping the platform alive". It sounds like a needless waste of talent to me - there are better programs that could be written in the same time, providing more benefits to the average user.
-
But is that because or despite the patchery?
I think the answer to your question is actually $$$$$$. Or more precisely, the near lack of any for doing development on this platform. ;)
-
And from where do you know that you have a gfx card in the system?
1. Ask the user.
2. Icon tool type and have two icons.
And is it worth to implement both methods?
Depends on the software. How much code are we talking about anyway? Two, maybe three kb extra? Hardly a waste if it means more users can enjoy the software.
For rendering graphics, it *usually* doesn't matter - not worth spending bytes on this decision.
It matters for the case I'm talking about:
2x 16 pixel wide background tile.
2x 16 pixel wide sprite mask.
2x 16 pixel wide sprite data.
2x 16 pixel wide second sprite mask.
2x 16 pixel wide second sprite data.
2x 16 pixel wide status gfx mask.
2x 16 pixel wide status gfx data.
That's seven sources. With a handwritten routine you can do this:
move.l (a0)+,d0
and.l (a1)+,d0
or.l (a1)+,d0
and.l (a2)+,d0
or.l (a2)+,d0
and.l (a3)+,d0
or.l (a3)+,d0
Do that twice, transpose, write to chipmem. After that you can unroll to use the pipeline on 20+ and get the transposes almost for free. I don't see how that's going to be anywhere near as fast with the OS blit function, so in this case it's crystal clear that it matters, because it lowers the CPU requirements.
Yes, as usual, it depends on the situation, but I currently cannot come up with a situation where I would need to blit something and not use the Os
I have another example. I wrote a simple real-time memory viewer. It opens a single bit plane 640x512 screen and blits 8x8 chars to the screen with my own code (which contains some optimized transposes from kalms' c2p routines). Very fast, and I highly doubt the OS can match that speed. It's important that such a program is fast because you're also running the program you're working on.
For example, we had the problem in P96 to copy memory quickly around (from the board to an off-board buffer, and reverse), but CopyMemQuick() is not even closely sufficient for that as it was necessary to copy "rectangular" memory regions with a different "modulo" factor from A to B. CopyMemQuick() cannot do that.
That's the whole reason. Something doesn't run at sufficient speed, or you know this is going to happen, or simply want to reduce CPU usage as much as possible, so you write your own code. Nothing wrong with that.
This is Amiga land after all. Lots of not so fast < 68060s out there, and you can do more on those lower end machines if your code is faster.
-
The specific one is that MOVE16 is not a good instruction to use on the Amiga. Problem is that MOVE16 runs a burst-cycle, even into memory or target regions that are marked as "cache-inhibited". The problem is now that it depends on the well-behaivedness of the turbo-board to detect this case and abort the burst. Given the "rather average" quality of some expansions and extensions, I would not be surprised that this actually doesn't always work as it should. Indeed, if I test this on my A2000, *not* trying to burst provides a small but measurable speed advantage over trying to initiate the burst.
There you go again, sounding the Burst warning alarm system you invented. I've tried to explain this many times (but you still don't get it). Burst is just an optional feature which under best case conditions can improve performance but there also worst case conditions where it reduces performance.
The CPU may only request a Burst cycle but the hardware (memory controller logic) makes the final decision on when (if ever) any Burst cycle will happen.
-
One question:
If you compare the time needed to develop the memcopy with the time spend on talking / defneding it here. How does this time compare?
-
The CPU may only request a Burst cycle but the hardware (memory controller logic) makes the final decision on when (if ever) any Burst cycle will happen.
And when you rely on this then it's slower than not requesting the burst in the first place. Does Zorro III actually support it?
-
One question:
If you compare the time needed to develop the memcopy with the time spend on talking / defneding it here. How does this time compare?
Seems to me it's not so much about developing something, it's more about making sure that what is developed has a positive impact and no side-effects. Measure twice, cut once ;)
However, this somewhat sober and "not much fun" side of system software development doesn't seem to be much in favour here. More or less, this speaks of the Amiga in its current form as a hobby.
Nothing wrong with computers as hobbies, or the fun of tinkering with the operating system. Spoilsports like Thomas and me do seem to have the engineering side of the operating system patches in mind, because that's what a lot of software builds upon, and it's sadly too easy to break things and never quite find out what actually caused the problems. If you're playing with the fundamentals of the operating system there comes a bit of responsibility with it, and that can't always be fun.
I'm not sure if this has been mentioned before, but the operating system itself, as shipped, hardly ever uses CopyMem() or CopyMemQuick() in a way which would benefit from optimization. In ROM CopyMem() is used to save space, and it usually moves only small amounts of data around, with the NCR scsi.device and ram-handler being the exceptions. The disk-loaded operating system components tend to prefer their own memcpy()/memmove() implementations, and only programs which were written with saving disk space in mind (e.g. the prefs editors) use CopyMem() to some extent. Again: only small amounts of data are being copied, which in most cases means that you could "optimize" CopyMem() by plugging in a short unrolled move.b (a0)+,(a1)+ loop for transfers shorter than 256 bytes.
I have no data on how third party applications use CopyMem()/CopyMemQuick(), but if these are written in 'C' it's likely that they will use the memcpy()/memmove() function which the standard library provides, and that typically isn't some crude bumbling implementation. However, it might benefit from optimization.
Now if you wanted to make a difference and speed up copying operations that are measurable and will affect a large number of programs, I'd propose a project to scan the executable code loaded by LoadSeg() and friends and replace the SAS/C, Lattice 'C' and Aztec 'C' statically linked library implementations of their respective memcpy()/memmove() functions with something much nicer. That would not be quite the "low-hanging fruit" of changing the CopyMem()/CopyMemQuick() implementation, but it might have a much greater impact.
-
The CPU may only request a Burst cycle but the hardware (memory controller logic) makes the final decision on when (if ever) any Burst cycle will happen.
Exactly. But you silently assume that there is a memory controller logic, and that this memory controller logic is smart enough to pick the right decisions at all times. In fact, you can get away without ever touching the burst. RAM would be on the Turbo card anyhow, chip ram has to be cache inhibited, and the rest of I/O space has to be cache-inhibited as well. Cache-inibited accesses do not burst, hence no extra logic required. Or almost.
IOW, you rely on the hardware to be well-behaived, and that the vendor implemented an extra-logic just for a corner case. I really wonder where you take your confidence from. All what I learned over the years was that whenever there was a chance to cut the budget, hardware vendors took it. Here you have one...
Take it as you like, but I call it "defensive programming".
-
Why not a patch to remove CopyMem() and CopyMemQuick() all together, and then see what breaks ;)
-
1. Ask the user.
2. Icon tool type and have two icons.
You are already demanding too much from the average user. Does it make a measurable difference? Does another option provide an advantage? Or does it confuse the user? Depends on the software. How much code are we talking about anyway?
How much testing are we talking about? The more decisions you have in the code, the easier it is to break. In reality, for any serious sized program, I prefer to have the minimum number of options to perform a given task - for example rendering something on the screen. I can test that once, and then rely on the correctness of the Os (hopefully, with the given patches around, this is a somewhat arbitrary decision nowadays). I'm not talking about the "implemented in two weeks" program. Two, maybe three kb extra? Hardly a waste if it means more users can enjoy the software.
Speed is not the only factor how to enjoy software. What about easiness of use and correctness? There are many factors that play into making such a decision. Do that twice, transpose, write to chipmem. After that you can unroll to use the pipeline on 20+ and get the transposes almost for free. I don't see how that's going to be anywhere near as fast with the OS blit function, so in this case it's crystal clear that it matters, because it lowers the CPU requirements.
How much time of the overall running time of the program is spend in that copy? How much development time does go into writing that? How much into testing? Would a user really bother? You are giving me all factors that make the "coder enjoy the development", but probably no argument concerning the overall "quality of experience" of the resulting program.
I cannot really give you a single "rule of thumb" of what is correct and what isn't. I would probably try first to use the Os. Then check whether the program satisfies my needs. If I see any lags, I benchmark, find where the bottleneck is, and optimize there. If that means that I have to re-implement parts of what I could do with the Os, so might it be, but that's rarely the case. I have another example. I wrote a simple real-time memory viewer. It opens a single bit plane 640x512 screen and blits 8x8 chars to the screen with my own code (which contains some optimized transposes from kalms' c2p routines). Very fast, and I highly doubt the OS can match that speed. It's important that such a program is fast because you're also running the program you're working on.
Maybe the Os wouldn't match the speed, but would it matter, actually? If I have a memory viewer (e.g. MonAm2, or COP), then I don't mind whether the screen updates faster than I type (or view). Actually, I would raise a couple of more important issues, as in "does it cooperate well with the rest of the system", "does it know not to touch I/O spaces to avoid interaction with the hardware", "can it print the memory contents to a printer and make a hardcopy". You have a very single-sighted view on the qualities and requirements of the software, where in real-life a lot of other aspects play a role, too. Whether the output of the program is twice as fast as that of a competing program is in most use-cases not important.
Look, I understand your joy writing such a program, but in reality, users probably have other needs you didn't take into account. That's the whole reason. Something doesn't run at sufficient speed, or you know this is going to happen, or simply want to reduce CPU usage as much as possible, so you write your own code. Nothing wrong with that.
This is Amiga land after all. Lots of not so fast < 68060s out there, and you can do more on those lower end machines if your code is faster.
For P96, the situation was really, "we need to emulate the blitter on even not so fast systems", and "CopyMemQuick() doesn't even provide the interface for doing what needs to be done", hence it was necessary to come up with something in assembler that should better be fast - because you can see the difference when moving windows around. In my average C program, if I copy a string around, I use memcpy(), the built-in compiler primitive from the standard C library, simply because it doesn't make any difference.
The situation was different, the requirements were different, the bottleneck was observed and benchmarked, hence a solution for the problem was required.
-
You are already demanding too much from the average user.
I don't see how providing two icons, one for native and one for GFX card, or having something like a screen mode requester is demanding too much. If someone can't be bothered with that, then they have issues.
Does it make a measurable difference?
Oh, come on! The example I gave makes it crystal clear that it would make a difference.
How much testing are we talking about?
Not much, because we're talking about some simple blit routines here. It's not rocket science.
How much time of the overall running time of the program is spend in that copy?
That particular blit happens for half the screen at about eight frames per second (320x256x7 bpls). The other half is similar, but you have only three sources instead of seven. The faster this runs, the better.
How much development time does go into writing that?
This kind of trivial code is very easy to write in a close to optimal way. Obviously, it's already written.
Would a user really bother?
They would get the option of running native, or if I'm going to implement it, GFX card. What do they have to bother with? They double click an icon and that's it.
You are giving me all factors that make the "coder enjoy the development", but probably no argument concerning the overall "quality of experience" of the resulting program.
Part of the quality of the experience comes from making sure people can actually play the game properly on a 25 mhz 68030 with AGA and some fastmem.
Maybe the Os wouldn't match the speed, but would it matter, actually? If I have a memory viewer (e.g. MonAm2, or COP), then I don't mind whether the screen updates faster than I type (or view). Actually, I would raise a couple of more important issues, as in "does it cooperate well with the rest of the system", "does it know not to touch I/O spaces to avoid interaction with the hardware", "can it print the memory contents to a printer and make a hardcopy".
I wrote this memviewer for my own needs, and wouldn't release it in it's current state. The speed is a requirement, because I want to be able to use it with heavier software without slowing things down too much. Not to mention that it's realtime, and the screen is updated once per VBL.
You have a very single-sighted view on the qualities and requirements of the software, where in real-life a lot of other aspects play a role, too. Whether the output of the program is twice as fast as that of a competing program is in most use-cases not important.
The way I see it software must be well-written (this includes maintainability), functional, easy to use and fast. The reason why I seem focused on speed, is because of the target platform I'm interested in: As close to a 68020 with some fastmem as I can get it without making any concessions (my reason for insisting on ASM has nothing to do with this).
-
They would get the option of running native, or if I'm going to implement it, GFX card. What do they have to bother with? They double click an icon and that's it.
Part of the quality of the experience comes from making sure people can actually play the game properly on a 25 mhz 68030 with AGA and some fastmem.
Ah, you're talking about a game. That's yet another business. If the Os doesn't give you the game speed you need, then this is a justification of course. There is *some* support for moving objects in graphics, and that's even supported natively by P96, but admittedly, the Bobs support of gfx is pretty much broken.
-
Ah, you're talking about a game. That's yet another business. If the Os doesn't give you the game speed you need, then this is a justification of course. There is *some* support for moving objects in graphics, and that's even supported natively by P96, but admittedly, the Bobs support of gfx is pretty much broken.
The requirements for the game are that I can have 320 animated tiles and 320 animated sprites (20x16 tile positions), all at the same time. All of these are 16x16 pixel aligned (sprites are 24 pixels high, and only one can move freely because it's turn based). This all has to run at around eight fps (for the animations) and leave enough CPU time to handle the AI and the 28 khz 14 bit stereo audio (stereo music with sound effects).
However, I would write my own text blitting routine if I were to write a text editor, for example. The OS simply uses the blitter, hence the reason for FBlit and FText making a real difference (also, double scan modes).
It's basically about how much effort you think is worthwhile to put into writing optimized custom code for things. It's also a hobby, and while you should of course try to actually finish software, it's also about writing the software the way you want (without making a mess). In a pro environment it's probably quite different.
-
However, I would write my own text blitting routine if I were to write a text editor, for example. The OS simply uses the blitter, hence the reason for FBlit and FText making a real difference (also, double scan modes).
That would suck badly if antialiasing text was introduced to AmigaOS.
But of course, there wont be new AmigaOS. Hence patches are "future safe". (Not taking "NG" to discussion here.)
-
I'm not sure if this has been mentioned before, but the operating system itself, as shipped, hardly ever uses CopyMem() or CopyMemQuick() in a way which would benefit from optimization. In ROM CopyMem() is used to save space, and it usually moves only small amounts of data around, with the NCR scsi.device and ram-handler being the exceptions. The disk-loaded operating system components tend to prefer their own memcpy()/memmove() implementations, and only programs which were written with saving disk space in mind (e.g. the prefs editors) use CopyMem() to some extent. Again: only small amounts of data are being copied, which in most cases means that you could "optimize" CopyMem() by plugging in a short unrolled move.b (a0)+,(a1)+ loop for transfers shorter than 256 bytes.
It is true that most AmigaOS calls of exec.library CopyMem() are small to medium size but there are *many* calls. It's not efficent to use CopyMem() for small copies because of the library JSR+JMP overhead although CPU optimized code can reduce the overall cost to be close to that of non-library code for all but the smallest copies. AmigaOS CopyMem() uses a MOVE.B (A0)+,(A1)+ loop for small copies and a MOVEM.L loop for large copies. It's actually the MOVEM.L loop that is most inefficient because it is only good for the 68000-68030 with large copies. An unrolled MOVE.L (A0)+,(A1)+ loop would be significantly better for the 68000-68060 in most cases.
tiny static size copy use '=' in C
small size copy use quick loop
medium size copy use unrolled loop (the 68060 doesn't benefit from unrolling in this case)
large size copy use MOVEM.L loop for 68000-68030, use unrolled MOVE16 loop for 68040-68060
There is a trade-off with different memory copy techniques as SpeedGeek has mentioned. This applies to the exec.library CopyMem() as well as the C memcpy() and memmove() functions. Most memcpy()/memmove() calls are small and this is why vbcc uses inlined quick loops to get to work as fast as possible (after minimal alignment). SAS/C uses a subroutine call (BSR+RTS) to a poorly optimized unrolled loop with no aligning and a costly jump table at the end for the 68040+. The SAS/C memcpy() may be faster in some cases than the vbcc memcpy() for large aligned copies. Sadly, the SAS/C memcpy() probably beats the exec.library CopyMem() for medium to large copies on the 68040+.
With vbcc, it is best speed to use:
tiny static size copy use '=' in C
small size copy use C memcpy() and memmove()
medium size copy use CPU optimized exec.library CopyMem() and CopyMemQuick()
large size copy use CPU optimized exec.library CopyMem() and CopyMemQuick()
If the exec.library CopyMem()/CopyMemQuick() used unrolled MOVE.L copy loops then we could be in good shape without patching. Patching for the uncommon large copies would become optional. MOVE16 does have the advantage of not flushing the DCache on large copies although it's questionable whether this is common enough and bug free enough to be standard.
The new version of vbcc 0.9d was recently released by the way:
http://sun.hasenbraten.de/vbcc/
With SAS/C, it is best speed to use:
tiny static size copy use '=' in C
small size copy use C memcpy() and memmove() for 68000-68030, use CPU optimized exec.library CopyMem() and CopyMemQuick() for 68040+
medium size copy use CPU optimized exec.library CopyMem() and CopyMemQuick()
large size copy use CPU optimized exec.library CopyMem() and CopyMemQuick()
Without patching or changing exec.library CopyMem() and CopyMemQuick(), the available options are not good. Most 68040-68060 memory copies will be significantly slower than what is possible. We want programmers to be able to use compilers and the AmigaOS without wasting time and code creating faster re-implementations of functions. This is what ThoR seems to ignore. He wants to stop the patching chaos but ignores the reason for the patching and the solution.
Now if you wanted to make a difference and speed up copying operations that are measurable and will affect a large number of programs, I'd propose a project to scan the executable code loaded by LoadSeg() and friends and replace the SAS/C, Lattice 'C' and Aztec 'C' statically linked library implementations of their respective memcpy()/memmove() functions with something much nicer. That would not be quite the "low-hanging fruit" of changing the CopyMem()/CopyMemQuick() implementation, but it might have a much greater impact.
I considered this but decided it was better to optimize compiler link lib code next and vbcc was the easiest place to start ;).
-
However, I would write my own text blitting routine if I were to write a text editor, for example. The OS simply uses the blitter, hence the reason for FBlit and FText making a real difference (also, double scan modes).
This, however, is a pretty bad idea. The Os routine is quite ok, and there is little to be gained if your text editor should support arbitrary fonts (and yes, that's really a desired and useful feature given that you can adjust the screen size and hence the resolution).
The Os 1.3 Text() was a rather poor implementation that blitted glyph by glyph, but from 2.0 on graphics is smart enough to place the text into an off-screen buffer first and blit from there. The function used there is quite optimal given its genericity, and optimizations are likely only possible if you aim at specific font sizes only, e.g. 8x8 glyphs as for the topaz.font. However, this font is typically too small for today's applications.
For the records, there was a patch for 1.3 that optimized Text() for topaz.8 only ("FastFonts") and a similar patch by myself that optimized topaz.8 (8x8) and topaz.9 (10x9) only. Both of which are pretty much obsolete by today's standards due to their lack to support arbitrary fonts or styles.
-
Without patching or changing exec.library CopyMem() and CopyMemQuick(), the available options are not good. Most 68040-68060 memory copies will be significantly slower than what is possible. We want programmers to be able to use compilers and the AmigaOS without wasting time and code creating faster re-implementations of functions. This is what ThoR seems to ignore. He wants to stop the patching chaos but ignores the reason for the patching and the solution.
To find a solution, one first has to identify the problem. And that's exactly what I do not see here. So far, nobody has mentioned yet a real-world problem (e.g. a program, a series of programs, a particular use case) where the current CopyMemQuick() is the bottleneck, and not fast enough to address the needs of the user. I would rather say that if memory copy is your bottleneck, there is probably something wrong with your algorithm requiring to copy so much data in first place.
But anyhow - I had little problem to exchange it should there ever be a new version of exec, but as the situation currently is, I consider the option of a patch for an otherwise bug-free Os function less desireable than the small speed impact (if at all) of CopyMemQuick() as we have it now.
-
Exactly. But you silently assume that there is a memory controller logic, and that this memory controller logic is smart enough to pick the right decisions at all times. In fact, you can get away without ever touching the burst. RAM would be on the Turbo card anyhow, chip ram has to be cache inhibited, and the rest of I/O space has to be cache-inhibited as well. Cache-inibited accesses do not burst, hence no extra logic required. Or almost.
IOW, you rely on the hardware to be well-behaived, and that the vendor implemented an extra-logic just for a corner case. I really wonder where you take your confidence from. All what I learned over the years was that whenever there was a chance to cut the budget, hardware vendors took it. Here you have one...
Take it as you like, but I call it "defensive programming".
Yes, I can implicitly (and correctly) make the assumption the Accelerator card logic disables Burst by default or permanently disables it for cards which don't support it (It could be memory controller logic, glue logic, PLD logic or even a pull down/up resistor). Otherwise, you won't even be able to boot your Amiga. It's as simple as that.
Exec tries to enable the instruction cache in early startup. Now, what would happen when the CPU tries to run a Burst cycle to the Kickstart ROMs, Chip RAM or the ZorroII bus with Burst enabled and none of the above support Burst?
To find a solution, one first has to identify the problem. And that's exactly what I do not see here. So far, nobody has mentioned yet a real-world problem (e.g. a program, a series of programs, a particular use case) where the current CopyMemQuick() is the bottleneck, and not fast enough to address the needs of the user. I would rather say that if memory copy is your bottleneck, there is probably something wrong with your algorithm requiring to copy so much data in first place.
But anyhow - I had little problem to exchange it should there ever be a new version of exec, but as the situation currently is, I consider the option of a patch for an otherwise bug-free Os function less desireable than the small speed impact (if at all) of CopyMemQuick() as we have it now.
One of many examples from Aminet (Vbak2091):
INTRODUCTION ZorroII boards can only reach the lower 16MB of address space. So DMA SCSI controllers must find another way to transfer data to expansion RAM. Some of them (especially the A2091) do a very bad job in this situation. In an A4000/40 transfer rates may drop to 50KB/s. This program patches the (2nd.)scsi.device to use MEMF_24BITDMA RAM as a buffer followed (in case of CMD_READ) by CopyMem(). It was developed with the A4000/A2091 combinbation in mind, but should work with other configurations, too (see REQUIREMENTS). Some people reported good results with GVP controllers.
-
The Os routine is quite ok
No, it's not, hence the reason FBlit+FText makes a real difference.
but from 2.0 on graphics is smart enough to place the text into an off-screen buffer first and blit from there.
Which is slow, because you get additional memory accesses. Far better to do everything in registers, write to chipmem and be able to use the CPU pipeline on 68020+.
optimizations are likely only possible if you aim at specific font sizes only, e.g. 8x8 glyphs as for the topaz.font.
You can write a properly optimized font renderer for any normal text editor font size. You can also take syntax coloring in account and not write all bit planes for each character.
-
The Os 1.3 Text() was a rather poor implementation that blitted glyph by glyph, but from 2.0 on graphics is smart enough to place the text into an off-screen buffer first and blit from there.
Maybe smart if this works on a "per cliprect" basis, but does it?
Otherwise for things like text output in hidden simple refresh windows (like output in a shell window while compiling something, with the source code text editor in the front hiding all or most of it) it can do a lot of unnecessary work in the off-screen buffers.
Similar for long text strings where big parts may ends up being clipped away. Like maybe in a listview gadget.
-
Maybe smart if this works on a "per cliprect" basis, but does it?
Actually, it is a single buffer. Manually clipping the text before rendering it to screen would complicate matters a lot. Clipping is done in BltTemplate() of the graphics library once rendering is complete. Otherwise for things like text output in hidden simple refresh windows (like output in a shell window while compiling something, with the source code text editor in the front hiding all or most of it) it can do a lot of unnecessary work in the off-screen buffers.
I wouldn't be so sure. Look, you have to clip at some point. You can either clip while rendering the glyphs (which is what 1.3 did) or clip only once. Given that the complexity of the clipping is pretty high compared to rendering the text itself, it is probably better to "do the additional work" because it results in a much simpler algorithm. I believe the right approach is to optimize for the *typical* case, and the typical case is that the window you render text to is front-most, thus no clipping done. Similar for long text strings where big parts may ends up being clipped away. Like maybe in a listview gadget.
Actually, the typical ASL/Reqtools requester isn't *that* stupid. I don't know how MUI works, but the system requesters only render those lines that are actually visible on the screen and not those that are clipped away completely.
-
No, it's not, hence the reason FBlit+FText makes a real difference.
How much, and is that due to FBlit? How does that work on graphics cards? Which is slow, because you get additional memory accesses. Far better to do everything in registers, write to chipmem and be able to use the CPU pipeline on 68020+.
Well, there isn't really much chance to avoid memory accesses. You can probably get away rendering in fast ram for graphics cards in first place and then copy directly to the screen, but in one way or another, you need to fiddle all the bits in the right places to begin with, and there isn't much to be optimized *unless* you restrict yourself to some "nice" font sizes. Optimizing topaz.8 is pretty easy and you can double the speed of the Os, but that's really the exception. You can write a properly optimized font renderer for any normal text editor font size. You can also take syntax coloring in account and not write all bit planes for each character.
Actually, all this bit-plane handling is pretty much obsolete in first place (I mean, custom-chip graphics), but leave this as it is: Rendering only a single bitplane is pretty dangerous for an Os function because it cannot know what else is on the screen. For the program, it may be possible (I believe ViNCed even does that, but my memory is fading) - but you don't need a new Os function for that, or need to write your own renderer. You can just set the rastport flags.
-
How much, and is that due to FBlit?
Don't have any numbers, but text rendering is definitely faster with with FBlit+FText on my system. I use 688x564 double scan screen modes, and these patches improve things quite a bit. FrexxEd in 16 colors benefits a lot from this. Goes from annoying to use in 16 colors to working perfectly fine.
How does that work on graphics cards?
It doesn't, because it patches the OS blitter functions with CPU based functions. You don't need FBlit for graphics cards anyway, so it doesn't really matter.
Well, there isn't really much chance to avoid memory accesses.
Of course, but instead of reading font data and writing to a buffer which you later have to copy to chipmem, you can do the work in registers, write those to chipmem directly and utilize the pipeline on 68020+.
You can probably get away rendering in fast ram for graphics cards in first place and then copy directly to the screen
For graphics cards you might not have to bother with anything. I wouldn't optimize for graphics cards anyway. Not interested, and probably always faster than native anyway.
but in one way or another, you need to fiddle all the bits in the right places to begin with, and there isn't much to be optimized *unless* you restrict yourself to some "nice" font sizes.
There's plenty of room for optimizing, and you don't have to restrict yourself to nice font sizes at all. The optimizations come from doing the work in registers and using the pipeline when doing chipmem writes.
Actually, all this bit-plane handling is pretty much obsolete in first place (I mean, custom-chip graphics)
Yes, but it's what you have to deal with when writing native Amiga software. Native chipset is also the most important to get fast, because it needs extra speed the most. This is one of the reasons why I'm not concerned with graphics cards: They just don't need optimizing as much as the chipset does.
but leave this as it is: Rendering only a single bitplane is pretty dangerous for an Os function because it cannot know what else is on the screen.
That's one of the reasons why I would write my own font blitting routine.
-
Of course, but instead of reading font data and writing to a buffer which you later have to copy to chipmem, you can do the work in registers, write those to chipmem directly and utilize the pipeline on 68020+.
Just to give you an idea what I'm talking about: There are fonts that are wider than 32 pixels and higher than 32 pixels, thus no chance to put that everything into registers. The Os also keeps care about making the font bold, italic, underline or any combination thereof.
-
There are fonts that are wider than 32 pixels and higher than 32 pixels
And why would anyone use those for text editors on native screens? Typical font sizes are much smaller for text editing.
thus no chance to put that everything into registers.
Sure you can, because you don't have to copy whole characters to registers. You only copy parts of characters.
The Os also keeps care about making the font bold, italic, underline or any combination thereof.
You can do that in your own font renderer, too.
When dealing with native graphics there are quite a few ways to get things to run faster. You have to decide for yourself if it's worth doing or not. In my opinion it is.
-
And why would anyone use those for text editors on native screens? Typical font sizes are much smaller for text editing.
I would rather say that this depends on the screen resolution and on the eyes of the user. At least, I wouldn't make base an optimization on this unless I have also a fallback mode that allows arbitrary fonts.
When dealing with native graphics there are quite a few ways to get things to run faster. You have to decide for yourself if it's worth doing or not. In my opinion it is.
Well, as you wish. I personally would pick an editor on other qualities, though.
-
I would rather say that this depends on the screen resolution and on the eyes of the user. At least, I wouldn't make base an optimization on this unless I have also a fallback mode that allows arbitrary fonts.
Just saying that it seems odd that you'd use fonts wider than 32 pixels for text editing, that's all. And no, I don't have hawk eyes (glasses) ;)
Well, as you wish. I personally would pick an editor on other qualities, though.
You're implying that I only look at one thing. In fact, for existing text editors the only speed requirement I have is that the editor is fast enough. FrexxEd is an example of this. It's not the fastest (not by a long shot), but it's undoubtedly one of the most powerful editors in Amiga 68k land (makes CygnusEd look like Ed).
If I would write my own editor, then I'd go for speed as well as power, ease of use and try to write a tidy, clean program that's maintainable (but yeah, in asm). I know how to get speed, so why not put that into the software I might write? This is especially important on low end 68k. Not much CPU time available, so getting good speed without sacrificing features is important.
-
If I would write my own editor, then I'd go for speed as well as power, ease of use and try to write a tidy, clean program that's maintainable (but yeah, in asm). I know how to get speed, so why not put that into the software I might write? This is especially important on low end 68k. Not much CPU time available, so getting good speed without sacrificing features is important.
Well, here is my math. How much time does an average editor sped in rendering text, compared to waiting for my input? My personal guess is that it doesn't really matter that much in real world applications, unless the editor is "brain dead". For example, the editor of Microsoft Basic (aka AmigaBasic) was brain dead and too slow for any reasonable work, but everything else I remember was simply fast enough, including "Ed", and all of them used the plain simple Os routines.
As far as my editor choices are concerned, I'm still using GoldEd on the Amiga, mostly because it can be customized to the very end. It runs here compiler, linker, debugger, configuration editor, jumps to errors, between sources... It is considerably more powerful than CED. Ok, Ced is certainly a nice editor, but not quite on par with GED.
For unix, it's emacs. Actually, more an operating system written in Lisp with an editor front-end.
-
Well, here is my math. How much time does an average editor sped in rendering text, compared to waiting for my input?
It's about the scrolling (line and page). The scroll speed depends on the combination of text rendering speed, syntax coloring system speed and scroll routine speed. If it's too slow, then editor becomes uncomfortable to use. Especially in hires double scan modes with 16 colors. That's why speed is important.
-
CygnusEd has the option of using OS routines, and on native chipset that is a major slowdown. Ditto for MuchMore iirc.
-
I think that might be a perception bias. I have a 2.5ghz Windows 8.1 laptop and if commodore had anything that felt this quick they wouldn't have gone bankrupt.
IMHO, the corrupt heads of Commodore and I mean Irving Gould (Chariman) & Mehdi Ali (president of the Commodore) would have ruined ANY company in their "Need for Greed."
-
It's about the scrolling (line and page). The scroll speed depends on the combination of text rendering speed, syntax coloring system speed and scroll routine speed. If it's too slow, then editor becomes uncomfortable to use. Especially in hires double scan modes with 16 colors. That's why speed is important.
There are other alternatives, though. You don't need to scroll every line. Buffer scroll commands or output commands, interpret several commands at once and use jump scrolling then. ViNCEd does that to avoid slowing down the output, i.e. while it prints its data, it already buffers new incoming commands, then executes all at once without scrolling through each of them individually.
-
There are other alternatives, though. You don't need to scroll every line. Buffer scroll commands or output commands, interpret several commands at once and use jump scrolling then. ViNCEd does that to avoid slowing down the output, i.e. while it prints its data, it already buffers new incoming commands, then executes all at once without scrolling through each of them individually.
Yes, of course you don't scroll ten lines individually when you want to scroll ten lines. The software creates ten lines worth of space with one copy operation, and prints ten lines, otherwise you get ten screen copy operations, which is very slow. However, if those lines are printed slowly, then it's still going to be slow (especially bad for page up and page down when you have lots of long lines).
I understand that you prefer using the OS for things. It's less work and it's easier to get things to run on graphics cards and what not. On the peecee that's usually fine, but on Amiga hardware you can do better if you write your own optimized code. You just have to put in the extra effort that's required do it properly, and when you do, you'll end up with software that runs better on lower end machines, and I think that's important.
-
Well, here is my math. How much time does an average editor sped in rendering text, compared to waiting for my input? My personal guess is that it doesn't really matter that much in real world applications,
I've heard that argument before and I don't buy it. If it's easily possible to write code that can render text faster then you should do that, because there are easily situations where an average editor is too slow. Like if you're running something reasonably intensive in the background.
Just being fast enough when nothing else is running isn't fast enough.
Sure we need it all to be standardised and consistent so it makes it easy to write software, but that should be doable.
-
I've heard that argument before and I don't buy it. If it's easily possible to write code that can render text faster then you should do that, because there are easily situations where an average editor is too slow. Like if you're running something reasonably intensive in the background.
If you are running something CPU intensive in the background, like compiling large project with GCC, all you need is a good scheduler.
-
I've heard that argument before and I don't buy it. If it's easily possible to write code that can render text faster then you should do that, because there are easily situations where an average editor is too slow. Like if you're running something reasonably intensive in the background.
Just being fast enough when nothing else is running isn't fast enough.
Sure we need it all to be standardised and consistent so it makes it easy to write software, but that should be doable.
I agree. I like the idea of using the OS but it needs to provide reasonably optimal functions. Is aligning the destination and using an urolled MOVE.L loop too much to ask for CopyMem()/CopyMemQuick() when it is competitively the fastest for the 68000-68060? Would it be a bad thing if Olsen sold more copies of Roadshow because the memory copying bottleneck was reduced? We need to improve and use Amiga profilers but memory copying is a CPU intensive task that is easily improved. The Amiga philosophy has always been about efficiency and not just replacing the CPU with a faster one.
If you are running something CPU intensive in the background, like compiling large project with GCC, all you need is a good scheduler.
The 68k frontend for vbcc, vc, had the task priority lowered for better multi-tasking. Editing is now practical while compiling which is very convenient.
I believe 68k GCC will use the current shell process priority (ChangeTaskPri).
-
The Amiga philosophy has always been about efficiency and not just replacing the CPU with a faster one.
Wrong... When Amiga/Commodore was at its height everyone needing better performance replaced the CPU with a faster one. Now 68k CPU family has ceased and the best CPU available is 20 years old.
-
I've heard that argument before and I don't buy it. If it's easily possible to write code that can render text faster then you should do that, because there are easily situations where an average editor is too slow.
The problem is then really as soon as something in the Os gets updated, you'll break things or the renderer does no longer work correctly. Besides, the Os routine isn't exactly slow either. Unlike the 1.3 version.
So yes, one can write stupid editors, but it doesn't require the Os for them to be slow. (AmigaBasic is an example). But one can do pretty well with the Os, without compromizing speed or compatibility.
-
I agree. I like the idea of using the OS but it needs to provide reasonably optimal functions. Is aligning the destination and using an urolled MOVE.L loop too much to ask for CopyMem()/CopyMemQuick() when it is competitively the fastest for the 68000-68060? Would it be a bad thing if Olsen sold more copies of Roadshow because the memory copying bottleneck was reduced?
You are here mixing a couple of things that do not really belong together. First of all, CopyMemQuick() is reasonably optimal. Yes, one can improve it here or there, but it's surely better than the compiler-generated byte-wise copy - which has also its justification for short string moves, though.
I don't know how roadshow could or would depend on CopyMemQuick(). Actually, the trick is to avoid the copy in first place, and if that is not possible for one reason or another, it is still possible to implement something yourself. A memory copy is a reasonably trivial operation and trivial to do if it needs to be done, and it can then be tuned towards your precise requirements, unlike the Os function. But if your program spends a lot of type copying data around, you'll likely have a design problem somewhere.
-
If you are running something CPU intensive in the background, like compiling large project with GCC, all you need is a good scheduler.
A good scheduler would be nice, however it doesn't solve the problem. If your text editor was written by someone who thinks it doesn't matter if it needs 100% cpu as long as it's fast enough to keep up with you typing, the scheduler will allow the text editor to be responsive but now your GCC builds get no CPU time at all.
There is a point where optimising further makes no sense, but giving up when something is just barely fast enough to keep up with a slow typist is not that point.
As for a good scheduler, it would be nice if this was open sourced.
http://aminet.net/package/util/misc/Executive
At least it's now free http://aminet.net/package/util/misc/Executive_key
The problem of course is it's another patch....
-
CygnusEd has the option of using OS routines, and on native chipset that is a major slowdown. Ditto for MuchMore iirc.
CygnusEd's own custom display update routines bypass several layers of operating system routines which need to be able to handle any case of moving and clearing the screen contents. By comparison CygnusEd can restrict itself to dealing with just one single bit plane and there is no need to handle clipping or occlusion. Actually, if you are using the topaz/8 font then CygnusEd can even bypass the operating system's text rendering operations altogether.
Well, this is how it can work out if you know exactly which special case you need to cater for. If you have to have a general solution you'll always end up making sacrifices with regards to performance and resource usage.
-
I agree. I like the idea of using the OS but it needs to provide reasonably optimal functions. Is aligning the destination and using an urolled MOVE.L loop too much to ask for CopyMem()/CopyMemQuick() when it is competitively the fastest for the 68000-68060? Would it be a bad thing if Olsen sold more copies of Roadshow because the memory copying bottleneck was reduced?
Roadshow is a peculiar case. Because incoming and outgoing data needs to be copied repeatedly, the copying operation better be really, really well-optimized. For Roadshow I adapted the most efficient copying routine I could find, and this is what accounts for Roadshow's performance (among a few other tricks).
Because the TCP/IP stack needs to handle overlapping copying operations it would not have been possible to use CopyMem(), which is not specified to support it.
There is also special case copying code in Roadshow in order to support the original Ariadne card which due to a hardware bug handles single byte writes to its transmit buffer incorrectly (a single byte write operation is treated like a word write operation with a random MSB value). This is what the S2_CopyFromBuff16 SANA-IIR3 command is for, in case you always wondered why this oddball command is part of the standard ;)
A general, efficient copying function has its merits, but it also ought to be sufficiently general in operation. Neither CopyMem() nor its subset CopyMemQuick() will handle overlapping copy operations, and none even flag an error condition if they cannot do what the caller asks them to (they show "undefined behaviour" instead).
The lack of support for overlapping copy operations is likely a deliberate design choice. If you read the addendum to the original exec.library AutoDocs, it mentions that a future version of CopyMem() might use hardware acceleration to perform the operation. To me it's not quite clear what was meant by that. It could mean that somebody was thinking about putting the blitter to use, if available, to move the data around (which could have been twice as fast as copying the data using the CPU on the original 68000 Amiga design; you would have to wait for the blitter to become ready for use, and you'd have to wait for it to complete its work, which taken together may have nullified any possible speed advantage over using the CPU straight away). Or it could mean that somebody was considering adding DMA-assisted memory copying functionality to the Amiga system design, like the 1985 Sun workstations reportedly had.
Now there is an interesting question which I'd like to ask Carl Sassenrath :)
-
A good scheduler would be nice, however it doesn't solve the problem. If your text editor was written by someone who thinks it doesn't matter if it needs 100% cpu as long as it's fast enough to keep up with you typing, the scheduler will allow the text editor to be responsive but now your GCC builds get no CPU time at all.
So what? GCC gains CPU time again when you stop typing.
There is a point where optimising further makes no sense, but giving up when something is just barely fast enough to keep up with a slow typist is not that point.
I probably would optimize it as much as possible but rationally it makes no sense. Most developers optimize code only for fun. To see how fast it can run but it rarely gives anything back. In commercial software development they are often money pits...
...unless you are going to add intellisense kind of features to 68k class systems. Then you might want to skip OS routines to cram more features to your software package.
As for a good scheduler, it would be nice if this was open sourced.
http://aminet.net/package/util/misc/Executive
At least it's now free http://aminet.net/package/util/misc/Executive_key
The problem of course is it's another patch....
It is very good but it works against documented behaviour so some software may just lock up. But multitasking was so much smoother on my A1200 that I didnt care. I recall utilities were written in C but scheduler is pure 68k asm.
-
CygnusEd's own custom display update routines bypass several layers of operating system routines which need to be able to handle any case of moving and clearing the screen contents. By comparison CygnusEd can restrict itself to dealing with just one single bit plane and there is no need to handle clipping or occlusion. Actually, if you are using the topaz/8 font then CygnusEd can even bypass the operating system's text rendering operations altogether.
For the records: I took the time and checked what ViNCEd does (the 3.9 Shell console). Actually, it does have the raster-scroll optimization as well, however, if I look at the sources nowadays and see how I had to "jump in circles" to get this done correctly, I would really not recommend doing so anymore.
Here, every line has a "raster mask" that describes which bitplanes it uses, and each window has a raster mask, too. When scrolling, not only the raster mask does have to be taken into account, but also whether the user has installed a custom border (probably drawing something in the console with other pens I'm not aware of) and whether the stuff is running on a graphics board, disabling the masking because it makes things slower, not faster. Then, one has to take into account wether the user has marked a block, changing the raster due to the inversion of the marked text, and so on...
Anyhow, it's quite an amount of code that went into this, and quite an amount of debugging, too. Looking at this today makes it a maintenance nightmare because changing the contents of the screen requires updating a lot of flags and settings to have everything consistent. No, I really won't do it like this anymore anytime...
There is no particular optimzation for topaz.8 in ViNCEd, though. Back then, I had my own "FastFonts" in the system which optimized both topaz.8 and topaz.9 (unlike the CBM version which claimed to do the latter, but never did), only for solid, non-underline, non-italic, non-bold. All that stopped making sense, already with Workbench 2.0 and its better Text() (Yes, ViNCEd is really *that* old, it goes back to workbench 1.3).
If you see how much code is between calling Text() and making the graphics appear on the screen with a graphics card, you would know why. Actually, I believe the "planar to chunky" conversion within BlitTemplate() that lies beyond Text() is probably the major contributor.
Thus, if you would want to optimize Text() nowadays, you would be rather well-adviced to change the internal representation of fonts right-away, as soon as you open one, change it from planar to chunky and have a chunky-optimized text renderer in first place. That makes a lot more sense than trying to squeeze out the bits from a couple of bitplanes or bit-shifts on a rather obsolete graphics representation.
-
For the records: I took the time and checked what ViNCEd does (the 3.9 Shell console). Actually, it does have the raster-scroll optimization as well, however, if I look at the sources nowadays and see how I had to "jump in circles" to get this done correctly, I would really not recommend doing so anymore.
What CygnusEd does is not limited to picking a single bit plane to render into, and which should be scrolled: it directly talks to the blitter itself to perform the update operation, bracketed between LockLayers()/OwnBlitter() and DisownBlitter()/UnlockLayers(), calling WaitBlit() before hitting the blitter registers.
The parameters which are fed into the blitter registers are precalculated in 'C', and only the hardware access itself is written in assembly language.
Cool stuff indeed :)
-
So what? GCC gains CPU time again when you stop typing.
Why should you have to stop typing?
Also the text editor could use 100% cpu all the time and still fit the requirement of being "fast enough because it keeps up with your typing".
-
Thus, if you would want to optimize Text() nowadays, you would be rather well-adviced to change the internal representation of fonts right-away, as soon as you open one, change it from planar to chunky and have a chunky-optimized text renderer in first place.
To optimise for modern hardware you'd probably want a much more complex system where you stored fonts in multiple different formats, potentially in the vram of each graphics card you are using the font on. So topaz might be stored in chip ram for ocs/ecs/aga and if you also have a couple of pci graphics cards it's probably stored on those too.
-
Why should you have to stop typing?
Dont stop if there is still more text coming. However, you must be pretty good coder to type so quickly that GCC never got any CPU time.
Also the text editor could use 100% cpu all the time and still fit the requirement of being "fast enough because it keeps up with your typing".
That is just multitasking unfriendly editor. It is different requirement.
-
I remember when compiling linux kernels on remote machines became so fast that atelnet (or whatever I was using) could no longer keep up with the output and locked up my A3000 (CSPPC/CVPPC), haha.
-
Great ideas and work, SpeedGeek !
Your patch give me motivation to update the BVisionPPC monitor :
BVisionPPC v4.4
- section removed
- four 060 emulated fmovecr removed
- two realtime 68030 checking removed (what's that ?)
- one realtime FPU checking removed
- ugly internal SAS/C copyroutine replaced by a _LVOCopyMem call
They are still two ugly ugly ugly copyroutines with move16 into this new version : ashamed of slowness...
I'll email them to you, you will see...
:)
-
That is just multitasking unfriendly editor. It is different requirement.
No it is exactly the same requirement as the text editor being able to use as much of your cpu as it wants as long as it can keep up with your typing. If you have other secret requirements that are obvious to you then you should specify them.
There may be software running in the background that needs to run at a high priority so that only 50% of cpu is left. The text editor needing 100% of cpu to keep up with typing is now unable to keep up with your typing.
I'm using quite extreme examples because it makes the maths easier, I understand that you can't practically optimise everything completely and there is an upper limit on the cpu resources available.
There are other vague issues like not saying how big the document is when you measure if it's able to keep up. A text editor that is designed for editing large documents will be able to insert and delete text in the middle without CopyMem()'ing the rest of the document around every time you type a character. On windows I regularly edit text files 100's of megabytes in size.
-
No it is exactly the same requirement as the text editor being able to use as much of your cpu as it wants as long as it can keep up with your typing. If you have other secret requirements that are obvious to you then you should specify them.
I thought it was obvious to you that Amiga software should not use busy loops when doing nothing.
There may be software running in the background that needs to run at a high priority so that only 50% of cpu is left. The text editor needing 100% of cpu to keep up with typing is now unable to keep up with your typing.
It is matter of evaluating ROI (return of investment). If it is slow for everyone then it might be necessary to optimize. But if it is slow only for one or few users, maybe I would spend my time elsewhere.
I'm using quite extreme examples because it makes the maths easier,
Sure, I got that.
I understand that you can't practically optimise everything completely and there is an upper limit on the cpu resources available.
It is also how much resources you allocate to optimizations versus new features. You can have ultimately optimized fast text editor but poor in features. Or you can have feature rich text editor that is too slow to use. Or you can try to find balnace with optimizations and features.
There are other vague issues like not saying how big the document is when you measure if it's able to keep up. A text editor that is designed for editing large documents will be able to insert and delete text in the middle without CopyMem()'ing the rest of the document around every time you type a character. On windows I regularly edit text files 100's of megabytes in size.
If you choose your algorithm wisely text editor scales up to GB documents easily.
Interestingly, the original Ed found from Workbench 1.3 does not. On Amiga 500 it could not handle even 50 kB text files very well. There was always noticeable lag when inserting new line.
-
Interestingly, the original Ed found from Workbench 1.3 does not. On Amiga 500 it could not handle even 50 kB text files very well. There was always noticeable lag when inserting new line.
The original Ed managed its text buffer in a very peculiar manner. The entire text was stored in a single consecutive buffer, whose contents had to be moved around prior to inserting text, and after removing text.
Because of how the BCPL memory management worked out, the text was not managed by storing a list of pointers which then referenced the individual lines. Instead, the management data structures were interleaved with the text itself. If you looked at it, you would find that the whole text buffer would be broken down into individual lines, which would begin with a pointer to the next line, followed by the text itself (which would begin with a byte indicating how long the text is). This is one of the reasons why the size of the file managed by Ed is restricted.
Because the text has to be shuffled around, and because there is another layer of management data structures which keeps track of where the first and the last line currently displayed ends up, which lines precede that, etc. every display update and every editing operation kicks off an avalanche of data structure maintenance operations.
The code is clearly optimized so as to minimize the impact of display updates, which was a sensible and necessary strategy back in the 1970'ies when you had one server connected to a bunch of terminals through slow serial links.
The scalability of the implementation is poor, though, and that's because the code is not optimized to minimize the impact of making changes in the document. Even reading and writing files is painfully slow because the editor has to extract the interleaved management/text data in order to store it to disk, and it has to interleave the management/text data when it reads documents from disk.
Here comes the fun part: the operating system which the old "Ed" was a part of was likely written using "Ed" itself. How can you tell? The "Ed" source code is larger than 64K, and the author broke it down into six individual files, presumably in order to keep it manageable. The longest line in that source code is about 105 characters, and almost every line is shorter than 80 characters. The same holds true for the dos.library code, which contains only a few files that are larger than 64K (by 1985 somebody must have used a different text editor, I suppose).
-
What an absolutely horrible editor that Ed thing :(
-
I thought it was obvious to you that Amiga software should not use busy loops when doing nothing.
When making a statement that the ONLY thing important with performance of a text editor is it can keep up with typing speed then it's actually not obvious at all. If you are making assumptions about what the statement really means, then you're agreeing with something without actually understanding the point it's making.
By saying there are loads of little implied things that we have to know, then making the statement in the first place is irrelevant as it's trying to say there is only one thing you have to know.
Ed isn't fast enough.
-
What an absolutely horrible editor that Ed thing :(
We can only judge the design from today's point of view. Nobody sets out to write a slow, restricted text editor. What drove the design must have made great sense at the time "Ed" was written, but it did not hold up. I suppose even by 1985 the limitations may have become hard to bear.
Or maybe the Amiga as it was back then may have been just as powerful as the minicomputers of the late 1970'ies on which the TRIPOS system, from which AmigaDOS and its shell, commands, etc. derive, was developed, and the constraints we see today did not seem like constraints at all back then. There must have been a reason why TRIPOS was picked for the Amiga, other than there was time pressure and not many other options were available then.
-
Remember when I discovered that by removing ed-startup, I got a whole lot more menu entries :) Actually, I like Ed, it seems related to vi. Personally I use vim all day long, which started out on Amiga, I still have a Fish disk somewhere with version 1.0.5 or something like that :)
-
What an absolutely horrible editor that Ed thing :(
It is not. If there were not such limits (slowness and max text length) it does its job. I cant remember any of its commands anymore (perhaps esc+x, was it save and quit?) but I used it for a while for coding until I found DME.
@olsen
We can only judge the design from today's point of view. Nobody sets out to write a slow, restricted text editor. What drove the design must have made great sense at the time "Ed" was written, but it did not hold up. I suppose even by 1985 the limitations may have become hard to bear.
For coding its limitations are hard to bear but it was still good to edit startup-sequence or create small text files. After all it was not easy to find replacement text editor... such thing had to be ordered by snail mail from shaddy PD distributors.
-
There must have been a reason why TRIPOS was picked for the Amiga, other than there was time pressure and not many other options were available then.
I think the main consideration was that it was quick to graft it onto exec. If they'd had to wait a long time then they may as well have developed CAOS.
It probably was the only 68000 operating system that they could have used.
-
Ed is a crappy editor, period. Why defend the damned thing? If it's so good, then how come no one really uses it for anything? The answer is because it sucks.
Just because some software is old doesn't mean that higher standards don't apply to it, and the platform it had to run on back then can do much better than that. Especially if you have an A500 with one megabyte of memory there's absolutely no reason whatsoever to use Ed.
Really, defending Ed? Why?
-
** 2ND NEWS UPDATE **
CMQ&S v1.6 released
v1.6 minor change
- fixed install code which could (but seldom ever did) trash a few bytes
of memory past the end of the patch
CMQ&S040 v1.7 released
v1.7 minor changes
- fixed install code which could (but seldom ever did) trash a few bytes
of memory past the end of the patch
- fixed 4 byte offset on Move16 compare code
-
Ed is a crappy editor, period. Why defend the damned thing? If it's so good, then how come no one really uses it for anything? The answer is because it sucks.
Just because some software is old doesn't mean that higher standards don't apply to it, and the platform it had to run on back then can do much better than that. Especially if you have an A500 with one megabyte of memory there's absolutely no reason whatsoever to use Ed.
Really, defending Ed? Why?
Why not? I could download DME and use it on my Amiga 500 but why bother? For quick text editing Ed does its job. It works, it is reliable, it is small and it is easy to use if you know the commands. Very simple: open text file, edit, save, quit.
Heck, I am even using Pico occasionally to edit some source code files or configs on Linux.
-
Actually, I like Ed, it seems related to vi.
When I used "vi" for the first time (that must have been around 1991-1992; I never used a Unix system before I went to university), I was puzzled by the fact that some of the control sequences were exactly the same as in "Ed".
Since TRIPOS seems to have had so much in common with early Unix, by way of imitation and reimplementation, you can't rule out that the design of "Ed" was shaped by "vi". Both were created in about the same time frame around 1978. Also, there's another odd "parallel evolution" in that what "Edit" is for "Ed", "ed" is for "vi".
-
When I used "vi" for the first time (that must have been around 1991-1992; I never used a Unix system before I went to university), I was puzzled by the fact that some of the control sequences were exactly the same as in "Ed".
Since TRIPOS seems to have had so much in common with early Unix, by way of imitation and reimplementation, you can't rule out that the design of "Ed" was shaped by "vi". Both were created in about the same time frame around 1978. Also, there's another odd "parallel evolution" in that what "Edit" is for "Ed", "ed" is for "vi".
There's actually more stuff like this. Look at the Aztec-C editor "Z" that came with one of the later versions. The same type of crude (aka "unusable") editor. "Ed" has a lot in common with "vi", though the v37 version finally got a menu (which improved usability by about 200%).
"Ed" was partially ok, good enough to modify the startup-sequence, but not really usable for anything beyond that. "vi" is pretty much the same, and I'm still scared that people use that (or vim) to work on projects, but who am I to judge... :wq
-
There's actually more stuff like this. Look at the Aztec-C editor "Z" that came with one of the later versions. The same type of crude (aka "unusable") editor. "Ed" has a lot in common with "vi", though the v37 version finally got a menu (which improved usability by about 200%).
Back in the early days of Amiga programming (that would have been 1987/1988 in my case) it was hard to find a decent programmer's editor.
I knew "Z" but quickly discarded it for being too obtuse. Funny that the Aztec 'C' documentation gave it such prominence, stressing the fact how compatible it was with "vi". I think the defining sentence in the documentation was "if you know vi, then you know Z", which works the other around, too, but not in Z's favour: I didn't have a clue what the documentation was talking about in the first place ("vi"? was that a roman numeral or something? and what does the number six have to do with text editors anyway?) and had to conclude that whatever the authors were so excited about probably wasn't for me.
My first 'C' programs were written using "Ed", until the programs became too large to endure the time it took for "Ed" to read and write them. At some point "Ed" even complained that the file was too large. I could take a hint: if the row of '@' characters "Ed" printed as it read a file was so long that it caused the screen to scroll it was high time to look for something else. The more '@' characters "Ed" printed, the slower it became, like it was climbing a steep hill and sweating & cursing with every step; I held out for more cartoon character swearing but "Ed" never even once admitted that it wanted to use one of "#$&%*", possibly because it was too well bred, coming from a posh British university. I now know that the original "Ed" prefers files to be not much larger than 10.000 characters. I could have used the often overlooked "size" parameter for "Ed", but then again who has that much patience in the long run?
Back then the next best text editor which I could find was on a Fish Disk with a number < 100, written by a French author (if I remember correctly). That too had its limitations. Don't get me started on "Microemacs" which, while it shipped on the Workbench disk, was barely usable either. And then there was "uEdit" (also found on a Fish Disk), which at the time appeared to me to be some sort of science fiction experiment gone terribly wrong. There were other Amiga text editors which I tried along the way. There was something called "SuperEd" which was not only fast, but also had a crash recovery feature (which I learned to appreciate). Then there was strange editor which was ported over from the Atari ST which had a split screen feature that promised to be super wonderful: how odd that it only supported *vertical* split screen (and didn't have a crash recovery feature, which I quickly learned that it ought have had).
These were really tough times. Eventually, I was saved by discovering what still is my Amiga text editor of choice, and in fact would be *the* text editor of choice on any platform, if it were more portable than it is. "CygnusEd" for life ;)
"Ed" was partially ok, good enough to modify the startup-sequence, but not really usable for anything beyond that. "vi" is pretty much the same, and I'm still scared that people use that (or vim) to work on projects, but who am I to judge... :wq
I suppose "vi" is somewhere in the sweet spot of being quick to launch and (given enough available brain capacity) quickly allows you to commit keystroke sequences to muscle memory. Yes, it's a weird design, but so is the standard keyboard layout. If you learned touch-typing, it's amazing how well you can use that weird layout at great speed. It doesn't work quite so well with more heavy-weight editors such as the original "emacs".
-
Lol at Vim (68k AOS).
-
Don't get me started on "Microemacs" which, while it shipped on the Workbench disk, was barely usable either.
Don't dare to say anything more bad on (micro)emacs or I'll turn this thread in a classic emacs vs. vi war thread. And if you think the infighting on amiga.org is bad; I assure you you ain't seen nothing yet...
:D
I suppose "vi" is somewhere in the sweet spot of being quick to launch and (given enough available brain capacity) quickly allows you to commit keystroke sequences to muscle memory. Yes, it's a weird design, but so is the standard keyboard layout. If you learned touch-typing, it's amazing how well you can use that weird layout at great speed. It doesn't work quite so well with more heavy-weight editors such as the original "emacs".
I've been told that the weird design was actually thought through: it was to reduce the risk on mechanical typewriters with the next letter getting stuck on the returning previous letter.
-
I've been told that the weird design was actually thought through: it was to reduce the risk on mechanical typewriters with the next letter getting stuck on the returning previous letter.
well there are two reasons listed on Wikipedia http://en.wikipedia.org/wiki/QWERTY#History_and_purposes
reduce jams and trying to distribute letters evenly.
-
Don't dare to say anything more bad on (micro)emacs or I'll turn this thread in a classic emacs vs. vi war thread. And if you think the infighting on amiga.org is bad; I assure you you ain't seen nothing yet...
:D
On 68k AOS that's irrelevant, because the mighty FrexxEd destroys all :D
-
Don't you just love it (not) when Ed gets its knickers in a twist and trashes the top line after saving? Whenever I edit anything with Ed, I always make sure to leave the top line blank or use a ';' just in case it decides to trash it.
I like BED (Blacks Editor).
-
Why use Ed at all?
-
Back in the early days of Amiga programming (that would have been 1987/1988 in my case) it was hard to find a decent programmer's editor.
I knew "Z" but quickly discarded it for being too obtuse. Funny that the Aztec 'C' documentation gave it such prominence, stressing the fact how compatible it was with "vi". I think the defining sentence in the documentation was "if you know vi, then you know Z", which works the other around, too, but not in Z's favour: I didn't have a clue what the documentation was talking about in the first place ("vi"? was that a roman numeral or something? and what does the number six have to do with text editors anyway?) and had to conclude that whatever the authors were so excited about probably wasn't for me.
People seem to forget the history and how everything that wasn't assembler was related. We have BPTRs in dos.library which I believe came from the BCPL language?
BCPL -> B -> C
The Amiga was one of the first affordable computers to use C for most of the OS and it was a common development environment. The 68000 chip made it easier to use a high level language which was popular on non-affordable hardware (the 68k is a cheaper successor to a VAX and PDP-11). This was another important choice in foresight by Jay Miner. The Amiga and Atari ST helped make C popular even though most computer people would think C came from the PC where it was slow to catch on or Unix which is partially true but rare outside universities and a few big businesses at the time. Dennis Ritchie, Jay Minor, Carl Sassenrath and even RJ Mical were pioneers and innovators that few people know about today while Steve Jobs and Bill Gates get the glory for being good at marketing inferior products.
Why use Ed at all?
Because it is free (with AmigaOS), available and works. Ed was at one time not too bad. It has powerful ARexx support and the menus are configurable so maybe it was the FrexxEd of the day? I did a lot with Ed and ARexx but the vanishing 1st line bug and the slow speed finally killed it for me.
I went to CED 3.5 and then CED 4.20 where I am now. CED is fast and very powerful but not perfect either.
o I wish I could change the menus to be more style guide compliant like Ed ;).
o I wish all major bugs were fixed before moving to a payed upgrade. I shouldn't have to pay for bug fixes or upgrade to get bug fixes. CED 4.20 has 2 major bugs. Some files will not load and this seems to have something to do with the path and file name to the file. The other is the tab size changing when using an ARexx script which can be worked around by restoring the tab setting with ARexx after an ARexx script. These are very annoying bugs even though they don't cause data loss. I have installed the patch from Aminet which didn't fix the problem.
o I wish there was a 68020 compiled version. It's amazing that CED is as fast as it is when SAS/C uses a branch to a branch because there is no 32 bit branch on the 68000. A multiply or divide can take several times longer without 68020 MUL/DIV instructions. That SAS/C memory copy routine is less than spectacular also. Fortunatly, the good algorithms are more important than optimal compiler code generation.
o I wish an "editor" wasn't so expensive to upgrade and the process easy (my CD has no serial number).
The Amiga has many good editors now like CED, GoldEd, FrexxEd and BED. There are better free editors on Aminet now than ED, sometimes with source code.
-
When I used "vi" for the first time (that must have been around 1991-1992; I never used a Unix system before I went to university), I was puzzled by the fact that some of the control sequences were exactly the same as in "Ed".
same here. I used 'ed' for editing text files on Amiga years before experiencing 'vi' on Slackware Linux. Even then I started using 'joe' which I preferred over 'vi' even if it was similarly familiar. 'joe' is similar to WordStar (or so I have heard - only used WS a few times)
For a while I was using xwpe under linux which was a excellent clone of Borland's 'turbo' IDE...
In my opinion CygnusED is the best on Amiga :)
-
Don't you just love it (not) when Ed gets its knickers in a twist and trashes the top line after saving? Whenever I edit anything with Ed, I always make sure to leave the top line blank or use a ';' just in case it decides to trash it.
I like BED (Blacks Editor).
Me too. I have used BED about 20 years now. I have used it so long that using any other editor on Amiga has become difficult to me.
I tried to contact author long time ago to get source code but unfortunately he was not willing to share.
-
Because it is free (with AmigaOS), available and works.
Didn't just about every version of the OS come with MEmacs?
I went to CED 3.5 and then CED 4.20 where I am now.
Yeah, CED. Used to be my favorite, until I wanted more features and found FrexxEd.
I wish an "editor" wasn't so expensive to upgrade and the process easy (my CD has no serial number).
You could use a free editor. What does Ced do that free editors don't?
The Amiga has many good editors now like CED, GoldEd, FrexxEd and BED.
Never did understand why people like GoldEd. I tried that thing once and ran away screaming.
In my opinion CygnusED is the best on Amiga :)
It depends on your needs and what you want. CygnusEd is a little on the simple side for me now. Just the other day I was thinking about NotePad++'s nice multi line editing feature, so I added it to FrexxEd with a simple script. Hard to beat that kind of power.
-
Didn't just about every version of the OS come with MEmacs?
Yes, but that didn't make it any better. Never did understand why people like GoldEd. I tried that thing once and ran away screaming.
Because you can configure it to your liking. In fact, once you know how to handle it, you could configure it for an entire IDE. In my personal configuration, I compiled from it, installed compiler settings, makefiles, jumped to errors in the source file and much more. It was a very powerful editor, and for Amiga business, I still use it.
Version 4 was something I never liked, though, simply because it did not integrate into the Look and Feel of the Os. Dietmar apparently had the idea that he could do something "better than the Os". While some of the gadgets and handlings were indeed "better" in some sense in revision 4, they broke with the traditions of the AmigaOs. For that reason, I never bothered to upgrade. I was happy with revision 3.
Nowadays, I'm mostly happy with emacs, also customized, even though I never tried to customize to the same extend as I had Golded customized.
-
Yes, but that didn't make it any better.
It still seems better than Ed at least. Then again, it's hard to be worse than Ed :D
Because you can configure it to your liking. In fact, once you know how to handle it, you could configure it for an entire IDE. In my personal configuration, I compiled from it, installed compiler settings, makefiles, jumped to errors in the source file and much more. It was a very powerful editor, and for Amiga business, I still use it.
What I hate about GoldEd is the fact that it has a weird editing model that I can't stand. I like things to work like Ced and Notepad++ (standard editing model). FrexxEd does that, and offers full programmability. You can play Tetris in that editor. I also hate how CubicIde uses Lisp as it's scripting language. Terrible! FrexxEd uses FPL which is just a C interpreter.
The big drawback of FrexxEd is that it's default setup isn't all that great.
-
I also hate how CubicIde uses Lisp as it's scripting language. Terrible!
Well, it worked for Emacs ;)
Should you wake me up in the dead of the night, stressing that the fate of the world depended upon me instantly adding scripting language support to an application, I'd probably start yawning, make coffee and write a Lisp-like language interpreter.
With the exception of "Forth", there's probably no other type of programming language which is both robust and powerful, and as easy to implement. Whether this necessarily translates into a language which empowers the user or just succeeds in making his life harder is up for debate.
Sometimes it's enough just to make a system scriptable which wasn't scriptable before.
-
Well, it worked for Emacs ;)
Just that is reason enough for me to never use Emacs.
Should you wake me up in the dead of the night, stressing that the fate of the world depended upon me instantly adding scripting language support to an application, I'd probably start yawning, make coffee and write a Lisp-like language interpreter.
Or you could get Lua, and add that. Seems a lot easier than writing a script language from scratch. Not to mention that Lua is a lot nicer than Lisp.
With the exception of "Forth", there's probably no other type of programming language which is both robust and powerful, and as easy to implement. Whether this necessarily translates into a language which empowers the user or just succeeds in making his life harder is up for debate.
Lisp is probably quite usable once you're used to it. The question is whether you want to get used to it or not. I certainly don't.
Sometimes it's enough just to make a system scriptable which wasn't scriptable before.
Why not just add a nice language? Best choices for a script language seem to be Lua or a C interpreter.
Lua is easy, and easy to add if you're working in C. It's also pretty fast, works well on old systems like lower end 68k Amigas (68020/30), and very portable (SASC compiles it properly).
Adding a C interpreter is good, because many programmers know C. That's why FrexxEd's script system is so nice. If you know C, then you know FrexxEd's script language.
-
It still seems better than Ed at least. Then again, it's hard to be worse than Ed.
Well, look at Z. Or vi for that matter. This is worse. But anyhow, being better than Ed is really not quite a challenge. Yes, memacs was better, on a very low quality scale, though. I looked at it once, maybe twice, then put it away. I used ed for the startup-sequence. Worked. That's the best I can say about Ed. What I hate about GoldEd is the fact that it has a weird editing model that I can't stand. I like things to work like Ced and Notepad++ (standard editing model). FrexxEd does that, and offers full programmability. You can play Tetris in that editor. I also hate how CubicIde uses Lisp as it's scripting language. Terrible! FrexxEd uses FPL which is just a C interpreter.
I don't remember what was so particularly strange about it. Maybe I configured it to behave less strange. I worked on it again over Christmas, for some old Amiga stuff, and it was still quite good. I'd wish emacs had be less of emacs and more of ged, but that's too late now.
I don't even remember which editor I used in the old days when I did a lot of in assembly. Really a lot. I looked at seka, and had to p*ke about its user interface (or lack thereof) and decided against this primitive beast (a good decision), then used the Databecker "Profimat", which had a somewhat useful IDE, though a pretty limited assembler (not a good decision). Luckely, decided against the GFA assembler (I also used GFA Basic quite a bit, fast but buggy) and bought DevPac (2.0 back then), never regretted it, it was a decent choice. I believe I used the DevPac editor for quite a while, still a good thing. Then with Lattice C, I believe it was LSE, which had a couple of bugs, but still worked quite ok. Then came GoldEd, SAS/C and DevPac 3.0, again good investments. I guess I was never a particular fan of CED, but I already had good editors for what I needed. GED I used for almost everything, C, Assembler, PasTeX. Except for the Startup-Sequence. That was still in the hands of "Ed" because GED was a bit too bulky.
-
Should you wake me up in the dead of the night, stressing that the fate of the world depended upon me instantly adding scripting language support to an application, I'd probably start yawning, make coffee and write a Lisp-like language interpreter.
Or you could get Lua, and add that. Seems a lot easier than writing a script language from scratch. Not to mention that Lua is a lot nicer than Lisp
Sometimes you don't get to choose, and there are overriding constraints which spell out in so many words why we can't always have nice things.
I've been in that position several times, and although I don't recommend the approach, it can make great sense to explore the boundaries set by the constraints and use that playing field to the best of your ability.
That can lead to wicked strange solutions which you'd rather not admit having cooked up in a moment of weakness, but then sometimes the overriding constraints are what guide your decision-making and not that nagging conscience of yours that keeps reminding you that the choices you are forced to make may not look so good in the long run. Being a programmer can suck.
Sometimes it's enough just to make a system scriptable which wasn't scriptable before.
Why not just add a nice language? Best choices for a script language seem to be Lua or a C interpreter.
You don't always get to choose. The last time I was really upset by the choices made in a scripting language design was when I had to write something in AppleScript to clean up my iTunes library. What a bizarre language. Why is Apple holding onto it, and its equally bizarre ecosystem? Because that scripting language, and all the ideas that went into its design, has been around for decades with nobody willing to admit that it doesn't hold up well.
Lua is easy, and easy to add if you're working in C. It's also pretty fast, works well on old systems like lower end 68k Amigas (68020/30), and very portable (SASC compiles it properly).
Adding a C interpreter is good, because many programmers know C. That's why FrexxEd's script system is so nice. If you know C, then you know FrexxEd's script language.
I think that Lua's a decent enough design, which is both powerful, well-documented and something newcomers can learn and apply. It's also embeddable with a small memory footprint. I once came close to using it in one of my applications, but then time constraints made me - wait for it - knock off one of those Lisp-like language interpreters instead (the fate of the world didn't exactly depend upon it, and if it did I didn't notice, but sometimes you just want to finish a project and not keep on tinkering).
As for using a 'C'-like language for the purpose of scripting, I can see the attraction for programmers who are already familiar with the language. For everybody else it's a long and ardous journey to even become competent in using the language, so I wouldn't want to force it upon anybody.
-
(SASC compiles it properly)
For the love of the Amiga, please DO NOT USE or write on forum this compilator at all...
Here an example of what I have found in the GRex Voodoo3 monitor :
JL_0_CC68
move.l a5,-(sp)
move.l d0,d1
move.l a0,a5
bmi.b JL_0_CC76
moveq #8,d0
cmp.l d0,d1
ble.b JL_0_CC7E
JL_0_CC76
move.w #-$0001,a0
move.l a0,d0
bra.b JL_0_CC92
JL_0_CC7E
asl.l #2,d1
lea $2AC(a4),a0
lea $2AC(a4),a1
move.l (a0,d1.l),a0
move.l a5,(a1,d1.l)
move.l a0,d0
JL_0_CC92
move.l (sp)+,a5
rts
It's AWFULL : DO NOT USE THIS COMPILATOR !!
:(
-
I used ed for the startup-sequence.
We've all done that at some point :lol: Ced is nice for stuff like that. Solid and fast (but a little on the basic side).
I don't even remember which editor I used in the old days when I did a lot of in assembly. Really a lot. I looked at seka, and had to p*ke about its user interface (or lack thereof) and decided against this primitive beast (a good decision), then used the Databecker "Profimat", which had a somewhat useful IDE, though a pretty limited assembler (not a good decision). Luckely, decided against the GFA assembler (I also used GFA Basic quite a bit, fast but buggy) and bought DevPac (2.0 back then), never regretted it, it was a decent choice. I believe I used the DevPac editor for quite a while, still a good thing. Then with Lattice C, I believe it was LSE, which had a couple of bugs, but still worked quite ok. Then came GoldEd, SAS/C and DevPac 3.0, again good investments. I guess I was never a particular fan of CED, but I already had good editors for what I needed. GED I used for almost everything, C, Assembler, PasTeX. Except for the Startup-Sequence. That was still in the hands of "Ed" because GED was a bit too bulky.
Back in the day I used to do everything in AsmOne. Now I do everything in FrexxEd, with Barfly for assembly language and SASC for C.
Sometimes you don't get to choose, and there are overriding constraints which spell out in so many words why we can't always have nice things.
Yes, but not when you're developing your own software from scratch.
Being a programmer can suck.
Yes, but not when you're doing it on a hobby basis. Then you get to do whatever you want in whatever way you want. You simply need the discipline to actually finish the project, or get it into a state where it can be released and used properly (after that you can keep working on it to make it better, but at least you already have something decent).
I think that Lua's a decent enough design, which is both powerful, well-documented and something newcomers can learn and apply. It's also embeddable with a small memory footprint. I once came close to using it in one of my applications, but then time constraints made me - wait for it - knock off one of those Lisp-like language interpreters instead (the fate of the world didn't exactly depend upon it, and if it did I didn't notice, but sometimes you just want to finish a project and not keep on tinkering).
What kind of time constraints cause you to have to make concessions like that?
As for using a 'C'-like language for the purpose of scripting, I can see the attraction for programmers who are already familiar with the language. For everybody else it's a long and ardous journey to even become competent in using the language, so I wouldn't want to force it upon anybody.
True, but it's still great for programmers editors because so many programmers know C. For something that's not related to programming I'd pick something else, too.
-
For the love of the Amiga, please DO NOT USE or write on forum this compilator at all...
Here an example of what I have found in the GRex Voodoo3 monitor :
It's AWFULL : DO NOT USE THIS COMPILATOR !!
:(
I'll keep using SASC because it's fine. Perhaps you should do some proper tests first instead of looking at one piece of code. Also, what's the alternative?
Anyway, 68k compilers in general won't produce top notch code. If you really want good code, then use assembly language.
-
I think that Lua's a decent enough design, which is both powerful, well-documented and something newcomers can learn and apply. It's also embeddable with a small memory footprint. I once came close to using it in one of my applications, but then time constraints made me - wait for it - knock off one of those Lisp-like language interpreters instead (the fate of the world didn't exactly depend upon it, and if it did I didn't notice, but sometimes you just want to finish a project and not keep on tinkering)
What kind of time constraints cause you to have to make concessions like that?
I create, modify and debug programs both as part of my day job and as a hobby. The programming I do for fun happens in my somewhat limited spare time. In that situation you can weigh the benefits and drawbacks of one solution to a specific problem against a different solution by looking at how long it would take to implement it, and how well it would solve the problem at hand.
In the case of wiring up a Lua interpreter vs. plugging in some slightly grubby but well-tested Lisp-like language interpreter it was tempting to use the leverage which Lua would have provided, because it was a "real" language with variables, loops and all those shiny things that just might come in handy (or may never get used).
However, not all applications really need that power, and they still get the job done which they were intended for. So plugging in the Lisp-like language interpreter solved the problem at hand, with a minimum of implementation and testing effort. Lua would have provided a much more powerful and well-rounded solution, but I would have had to spend another day or two to get it working properly, and just maybe I wouldn't have used the flexibility and options which Lua would have provided anyway.
So, sometimes a "good enough" solution can beat a "good" or even "perfect" solution.
-
However, not all applications really need that power, and they still get the job done which they were intended for. So plugging in the Lisp-like language interpreter solved the problem at hand, with a minimum of implementation and testing effort. Lua would have provided a much more powerful and well-rounded solution, but I would have had to spend another day or two to get it working properly, and just maybe I wouldn't have used the flexibility and options which Lua would have provided anyway.
It depends on the software, sure, but for some software you shouldn't make such concessions.
For some software, 'limitless power' is part of the design goal. FrexxEd is a good example of that, and it's script language is the main reason why it's so powerful.
-
For some software, 'limitless power' is part of the design goal.
Not really. It is "getting the job done with the resources available". In real life, the cost factor of software is *your* time, not *computer time*. Thus, if I can get something done in a high-level scripting language that satisfies the needs of my customers, and I take two days for that, then that's a better solution than working on the same project for a month in C even if the resulting C code would run probably at three times the speed. There are situations where this slow-down is not acceptable, of course, that depends on the problem. But the cases where you can justify for your clients a six-month development time for an Assembly program that runs 50% faster than a C code that could have been done in a month are pretty rare. Yes, I've seen such cases, but that's really the exception.
-
Not really. It is "getting the job done with the resources available". In real life, the cost factor of software is *your* time, not *computer time*. Thus, if I can get something done in a high-level scripting language that satisfies the needs of my customers, and I take two days for that, then that's a better solution than working on the same project for a month in C even if the resulting C code would run probably at three times the speed. There are situations where this slow-down is not acceptable, of course, that depends on the problem. But the cases where you can justify for your clients a six-month development time for an Assembly program that runs 50% faster than a C code that could have been done in a month are pretty rare. Yes, I've seen such cases, but that's really the exception.
Don't care about the code time for me : I just want PERFECT code...
Sometimes I can think many days about one small routine to find finally his Truth... Just like SpeedGeek with CopyMem & CopyMemQuick !
:)
-
Not really.
No, really. I'm not talking about all software, I'm talking about some software. In these specific cases the goal of the author/authors was to create software with 'limitless power'.
FrexxEd is an example of such software. From the manual:
What is FrexxEd?
================
FrexxEd is an advanced, highly customizable, extensible, real-time, zero
limitation, fully programmable, function driven full screen display editor
(not a word processor) for editing text files (even though it's possible to
edit any kind of file).
We say that FrexxEd is a "display" editor because normally the text being
edited is visible on the screen and is updated automatically as you type your
commands.
We call FrexxEd advanced because it provides facilities that go beyond simple
insertion and deletion.
"Customizable" means that you can change the definitions of FrexxEd commands
in many ways. For example, if you don't want FrexxEd to query if you kill a
modified buffer, you can simply tell it so. Another sort of customization is
rearrangement of the command set. For example, if you prefer the four basic
cursor motion commands (up, down, left and right) on keys in a diamond pattern
on the keyboard, you can have it.
"Extensible" and "fully programmable" means that you can go beyond simple
customization and write entirely new commands (programs in the FPL language).
FrexxEd is an "on-line extensible" system, which means that it is divided into
many functions that call each other, any of which can be redefined in the
middle of an editing session. Any part of FrexxEd can be replaced without
making a separate copy of all of FrexxEd. Many of the editing commands of
FrexxEd are written in FPL already; the exceptions could have been written in
FPL but are written in C for improved efficiency. Although only a programmer
can write an extension, anybody can use it when it's done.
We call it a "real-time" editor because the display is updated very
frequently, usually after each character or pair of characters you type. This
minimizes the amount of information you must keep in mind as you edit. (The
term 'real-time' is, according to some, not used in its right sense here, but
I think you all get my point!)
"Zero limitation" means that there are hardly no limits in amount or size in
FrexxEd. Your amount of primary memory is the biggest limitation.
Every keystroke in FrexxEd invokes a function. Most keystrokes invoke the
`Output()' command which inserts the string/character stored in your AmigaDOS
keymap for that key, but there is no real limit to what can be done with
merely a simple keystroke. If FrexxEd cannot already do it, it can be
programmed by the user to do it.
FrexxEd is not an every man text editor. It's for people with a large
customizable need, brains and more than a 512KB or 1MB floppy system.
FrexxEd is ShareWare, coded with the intension to give the world a superb
editor to everyone for a low price.
-
Don't care about the code time for me : I just want PERFECT code...
Excellent, that's better business for me. What are you willing to pay for it, as hourly rate?
-
Don't care about the code time for me : I just want PERFECT code...
Sometimes I can think many days about one small routine to find finally his Truth... Just like SpeedGeek with CopyMem & CopyMemQuick !
Sure we've all been there. If you can write something the fastest possible then it feels good. If you ever get the ambition to work on something bigger then you will run out of time pretty quick.
I try to balance optimising algorithms and code quality and have the compiler worry about what registers to use etc.
-
However, not all applications really need that power, and they still get the job done which they were intended for. So plugging in the Lisp-like language interpreter solved the problem at hand, with a minimum of implementation and testing effort. Lua would have provided a much more powerful and well-rounded solution, but I would have had to spend another day or two to get it working properly, and just maybe I wouldn't have used the flexibility and options which Lua would have provided anyway.
It depends on the software, sure, but for some software you shouldn't make such concessions.
For some software, 'limitless power' is part of the design goal. FrexxEd is a good example of that, and it's script language is the main reason why it's so powerful.
My outlook on software quality and how to get it has changed over the years. When I started noodling around with BASIC on whatever home computer I could get my hands on my curiousity was the driving force in getting stuff done. A couple of years down the line I got it into my head that working on a program actually is a task that can always be finished.
At times that even was true when somebody was willing to pay me money for the work I did, or when somebody was very keen on putting my work to good use. That's when you had to make sure that everything you promised or hoped for was accounted for and in the box, before you closed it, tied a bow around it and handed it over.
I've been programming for some odd 30-31 years now, both as a hobby and as a profession, and I couldn't help noticing that there were recurring patterns in the work I did. One important pattern is in that your work is rarely finished, and that you will end up iterating on it. You'll invariably find bugs, understand your own work and working method better, understand what the requirements of the project were better than you did before, and with that insight will come the need to give the job another go, so as to make things better.
This is one of the key insights I gained: your choices, when it comes to designing and implementing software, may not be the best at the time you make them, but that is not the end of the story. You will return to your work, and this time it may improve. Even if it doesn't, then maybe the next iteration will be better.
With this insight you gain a different perspective on how you spend your time on the project. You begin accept that you will be unable to make the best choices, and that the next best thing you can do is focus on specific parts of the task which benefit most from your attention. This is where you'll discover that you have been making trade-offs all the time. Some code may be best in a state in which it's readable and not necessarily optimized for time or space. Some code may be best in a state in which it's optimized. Some code just doesn't benefit from any polishing at all. Turns out that some of the trade-offs you make don't look so good in hindsight, and off you'll go for another round of making better choices.
And that's about it: just because I pick one quirky scripting language that has trouble walking and chewing gum at the same time over an arguably superior alternative it doesn't have to stay that way forever.
I distrust the notion of perfect code or the perfect solution for a problem, as implemented by a program. Perfect code has no bugs and always solves the problem at hand. I've seen that, but the scope such perfect code covers is usually tiny, and if it isn't, it takes a crazy amount of work to produce it. I'm not in the business of producing that kind of work ;) Trying to get to the perfect solution, that I can agree with as part of a process. But you can only get very, very close (asymptotically close, for the mathematically inclined among us) to it and never quite reach that point. Close enough is good enough for me, as otherwise you'll spend your time chipping away at only one small part of the interesting stuff you might otherwise get a chance to explore instead.
-
To add a bit to what Olsen said (because I believe he missed an important point). In reality, you want to optimize code quality Q (measured in whatever units you want to measure it), but you only have limited resources R for that. It can be as simple as only having a limited time on this planet, or let it be as trivial as a customer demanding the task ready by a deadline. Or let it be money you are willing to pay.
Olsen said correctly that the definition of Q is not quite as trivial as it may seem. Q is dependent on many things, and may even change over time as you rate quality differently as you learn more.
What I want to add is that the problem is that it is not defined by optimizing Q. It is to optimize Q under the constraint R, which is a different problem.
Mathematically, you solve such problems by adding the constraint with a Lagrangian multiplier \lambda, and then solve that as an unconstraint problem. So, in reality, you want to optimize
J = Q + \lambda R
where \lambda is the "cost for the quality". If you only say that Q should be ideal, then this is identical to saying that "\lambda = 0", or "you don't value your resources R". That is, seen above, a unrealistic choice.
I personally want to make the best out of my life, and that's finding the maximum of J, and not of Q. That's simply because my R is bounded.
So much for today's math lesson. Yes, the older you get, the more you rate R.
-
^^^ That is brilliant. But I still love @Cosmos and @SpeedGeek's dedication to the micro-optimizations of this code. It takes a special kind of OCD that I can appreciate in my job, as well. ;)
Kudos, all of you, especially for not letting this turn into a flame-fest! :pint: :pint:
-
** 3RD NEWS UPDATE **
No version change
- New 1024-8192 byte Block Size versions added to archive
(The new Block Size versions allow you to "Tune" the
MoveL vs. Move16 performance of your system).
-
"Little by little the bird makes its nest"
:)
-
Here some very interesting and precise benchs about cycles penality for the fastram !
Test machine : Apollo 1260 with 68060@90
DataCache and Store Buffer enable :
1)
addr read.l 0(a0) : 1060 us (19 cycles)
addr read.l 1(a0) : 1115 us (20 cycles) => +1
addr read.l 2(a0) : 1115 us (20 cycles) => +1
addr read.l 3(a0) : 1115 us (20 cycles) => +1
2)
addr read.w 0(a0) : 1060 us (19 cycles)
addr read.w 1(a0) : 1059 us (19 cycles) => +0
addr read.w 2(a0) : 1059 us (19 cycles) => +0
addr read.w 3(a0) : 1115 us (20 cycles) => +1
3)
addr write.l 0(a0) : 1170 us (21 cycles)
addr write.l 1(a0) : 1170 us (21 cycles) => +0
addr write.l 2(a0) : 1282 us (23 cycles) => +2
addr write.l 3(a0) : 1282 us (23 cycles) => +2
4)
addr write.w 0(a0) : 1170 us (21 cycles)
addr write.w 1(a0) : 1170 us (21 cycles) => +0
addr write.w 2(a0) : 1170 us (21 cycles) => +0
addr write.w 3(a0) : 1282 us (23 cycles) => +2
DataCache and Store Buffer disable :
1)
addr read.l 0(a0) : 4985 us (89 cycles)
addr read.l 1(a0) : 6665 us (119 cycles) => +30
addr read.l 2(a0) : 5880 us (105 cycles) => +16
addr read.l 3(a0) : 6665 us (119 cycles) => +30
2)
addr read.w 0(a0) : 4986 us (89 cycles)
addr read.w 1(a0) : 5774 us (103 cycles) => +14
addr read.w 2(a0) : 4986 us (89 cycles) => +0
addr read.w 3(a0) : 5880 us (105 cycles) => +16
3)
addr write.l 0(a0) : 5102 us (91 cycles)
addr write.l 1(a0) : 5102 us (91 cycles) => +0
addr write.l 2(a0) : 5883 us (105 cycles) => +14
addr write.l 3(a0) : 6664 us (119 cycles) => +28
4)
addr write.w 0(a0) : 5102 us (91 cycles)
addr write.w 1(a0) : 5883 us (105 cycles) => +14
addr write.w 2(a0) : 5102 us (91 cycles) => +0
addr write.w 3(a0) : 5883 us (105 cycles) => +14
1)
nop
move.l (a0),d0
nop
2)
nop
move.w (a0),d0
nop
3)
nop
move.l d0,(a0)
nop
4)
nop
move.w d0,(a0)
nop
:)
-
Here some very interesting and precise benchs about cycles penality for the fastram !
Test machine : GVP Turbo+ Jaws 1230 with 68030@40
DataCache enable :
1)
addr read.l 0(a0) : 1424 us (14 cycles)
addr read.l 1(a0) : 1530 us (15 cycles) => +1
addr read.l 2(a0) : 1530 us (15 cycles) => +1
addr read.l 3(a0) : 1530 us (15 cycles) => +1
2)
addr read.w 0(a0) : 1424 us (14 cycles)
addr read.w 1(a0) : 1426 us (14 cycles) => +0
addr read.w 2(a0) : 1426 us (14 cycles) => +0
addr read.w 3(a0) : 1530 us (15 cycles) => +1
3)
addr write.l 0(a0) : 1540 us (15 cycles)
addr write.l 1(a0) : 1540 us (15 cycles) => +0
addr write.l 2(a0) : 2174 us (21 cycles) => +6
addr write.l 3(a0) : 2174 us (21 cycles) => +6
4)
addr write.w 0(a0) : 1530 us (15 cycles)
addr write.w 1(a0) : 1530 us (15 cycles) => +0
addr write.w 2(a0) : 1530 us (15 cycles) => +0
addr write.w 3(a0) : 2174 us (21 cycles) => +6
DataCache disable :
1)
addr read.l 0(a0) : 2173 us (21 cycles)
addr read.l 1(a0) : 2813 us (28 cycles) => +7
addr read.l 2(a0) : 2813 us (28 cycles) => +7
addr read.l 3(a0) : 2813 us (28 cycles) => +7
2)
addr read.w 0(a0) : 2173 us (21 cycles)
addr read.w 1(a0) : 2173 us (21 cycles) => +0
addr read.w 2(a0) : 2173 us (21 cycles) => +0
addr read.w 3(a0) : 2813 us (28 cycles) => +7
3)
addr write.l 0(a0) : 1792 us (17 cycles)
addr write.l 1(a0) : 1792 us (17 cycles) => +0
addr write.l 2(a0) : 2419 us (24 cycles) => +7
addr write.l 3(a0) : 2419 us (24 cycles) => +7
4)
addr write.w 0(a0) : 1431 us (14 cycles)
addr write.w 1(a0) : 1792 us (17 cycles) => +3
addr write.w 2(a0) : 1792 us (17 cycles) => +3
addr write.w 3(a0) : 2419 us (24 cycles) => +10
:)