Welcome, Guest. Please login or register.

Author Topic: CopyMem Quick & Small released!  (Read 14177 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« on: January 02, 2015, 07:22:48 PM »
Quote from: Thomas Richter;780921
I would rather say that a an application that critically depends on memory-copies implements the copy itself, without going through the Os as there are many other factors only the calling program can know. For example, a "move" moves into and out of the cache. A move16 does not. Is that good or bad? MOVE16 doesn't "pollute" the cache. move "already fills the cache with the target data". Whether that is something you want or do not want cannot be distinguished by CopyMemQuick(). It is something only the calling program can known - and hence, only the calling program can select the optimal strategy. CopyMemQuick() is the "Ford Escord" you may select if it is "fast enough", so it's usually not worth the trouble patching into this call, even more so as it is rarely used.
Remember our OS blitting routine argument? You just stated the reason for writing one's own blit routine: A one size fits all routine isn't always the best solution.
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #1 on: January 03, 2015, 07:03:08 AM »
Quote from: Thomas Richter;780935
For the blitter, you get however a substantial disadvantage from not using the Os: If you try to implement a graphics primitivity, it might simply not work on an rtg system if you don't use the Os. Is it worth not using the Os? Typically not, because you "shoot yourself in the foot".
Whether it's worth it or not depends entirely on one's requirements. The OS blit routine is generic, and therefore unsuitable for fast, non-generic blits, even when the nature of the blits is very simple (you can see this clearly when you look at the function call). The right solution for getting both maximum performance on native screen modes and have GFX card compatibility is to simply implement both methods.
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #2 on: January 03, 2015, 02:52:15 PM »
Quote from: Thomas Richter;780956
And from where do you know that you have a gfx card in the system?
1. Ask the user.
2. Icon tool type and have two icons.

Quote from: Thomas Richter;780956
And is it worth to implement both methods?
Depends on the software. How much code are we talking about anyway? Two, maybe three kb extra? Hardly a waste if it means more users can enjoy the software.

Quote from: Thomas Richter;780956
For rendering graphics, it *usually* doesn't matter - not worth spending bytes on this decision.
It matters for the case I'm talking about:

2x 16 pixel wide background tile.
2x 16 pixel wide sprite mask.
2x 16 pixel wide sprite data.
2x 16 pixel wide second sprite mask.
2x 16 pixel wide second sprite data.
2x 16 pixel wide status gfx mask.
2x 16 pixel wide status gfx data.

That's seven sources. With a handwritten routine you can do this:
Code: [Select]
   move.l  (a0)+,d0
    and.l   (a1)+,d0
    or.l    (a1)+,d0
    and.l   (a2)+,d0
    or.l    (a2)+,d0
    and.l   (a3)+,d0
    or.l    (a3)+,d0
Do that twice, transpose, write to chipmem. After that you can unroll to use the pipeline on 20+ and get the transposes almost for free. I don't see how that's going to be anywhere near as fast with the OS blit function, so in this case it's crystal clear that it matters, because it lowers the CPU requirements.

Quote from: Thomas Richter;780956
Yes, as usual, it depends on the situation, but I currently cannot come up with a situation where I would need to blit something and not use the Os
I have another example. I wrote a simple real-time memory viewer. It opens a single bit plane 640x512 screen and blits 8x8 chars to the screen with my own code (which contains some optimized transposes from kalms' c2p routines). Very fast, and I highly doubt the OS can match that speed. It's important that such a program is fast because you're also running the program you're working on.

Quote from: Thomas Richter;780956
For example, we had the problem in P96 to copy memory quickly around (from the board to an off-board buffer, and reverse), but CopyMemQuick() is not even closely sufficient for that as it was necessary to copy "rectangular" memory regions with a different "modulo" factor from A to B. CopyMemQuick() cannot do that.
That's the whole reason. Something doesn't run at sufficient speed, or you know this is going to happen, or simply want to reduce CPU usage as much as possible, so you write your own code. Nothing wrong with that.

This is Amiga land after all. Lots of not so fast < 68060s out there, and you can do more on those lower end machines if your code is faster.
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #3 on: January 03, 2015, 08:08:59 PM »
Quote from: Thomas Richter;780985
You are already demanding too much from the average user.
I don't see how providing two icons, one for native and one for GFX card, or having something like a screen mode requester is demanding too much. If someone can't be bothered with that, then they have issues.

Quote from: Thomas Richter;780985
Does it make a measurable difference?
Oh, come on! The example I gave makes it crystal clear that it would make a difference.

Quote from: Thomas Richter;780985
How much testing are we talking about?
Not much, because we're talking about some simple blit routines here. It's not rocket science.

Quote from: Thomas Richter;780985
How much time of the overall running time of the program is spend in that copy?
That particular blit happens for half the screen at about eight frames per second (320x256x7 bpls). The other half is similar, but you have only three sources instead of seven. The faster this runs, the better.

Quote from: Thomas Richter;780985
How much development time does go into writing that?
This kind of trivial code is very easy to write in a close to optimal way. Obviously, it's already written.

Quote from: Thomas Richter;780985
Would a user really bother?
They would get the option of running native, or if I'm going to implement it, GFX card. What do they have to bother with? They double click an icon and that's it.

Quote from: Thomas Richter;780985
You are giving me all factors that make the "coder enjoy the development", but probably no argument concerning the overall "quality of experience" of the resulting program.
Part of the quality of the experience comes from making sure people can actually play the game properly on a 25 mhz 68030 with AGA and some fastmem.

Quote from: Thomas Richter;780985
Maybe the Os wouldn't match the speed, but would it matter, actually? If I have a memory viewer (e.g. MonAm2, or COP), then I don't mind whether the screen updates faster than I type (or view). Actually, I would raise a couple of more important issues, as in "does it cooperate well with the rest of the system", "does it know not to touch I/O spaces to avoid interaction with the hardware", "can it print the memory contents to a printer and make a hardcopy".
I wrote this memviewer for my own needs, and wouldn't release it in it's current state. The speed is a requirement, because I want to be able to use it with heavier software without slowing things down too much. Not to mention that it's realtime, and the screen is updated once per VBL.

Quote from: Thomas Richter;780985
You have a very single-sighted view on the qualities and requirements of the software, where in real-life a lot of other aspects play a role, too. Whether the output of the program is twice as fast as that of a competing program is in most use-cases not important.
The way I see it software must be well-written (this includes maintainability), functional, easy to use and fast. The reason why I seem focused on speed, is because of the target platform I'm interested in: As close to a 68020 with some fastmem as I can get it without making any concessions (my reason for insisting on ASM has nothing to do with this).
« Last Edit: January 03, 2015, 08:11:28 PM by Thorham »
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #4 on: January 03, 2015, 08:51:30 PM »
Quote from: Thomas Richter;780989
Ah, you're talking about a game. That's yet another business. If the Os doesn't give you the game speed you need, then this is a justification of course. There is *some* support for moving objects in graphics, and that's even supported natively by P96, but admittedly, the Bobs support of gfx is pretty much broken.
The requirements for the game are that I can have 320 animated tiles and 320 animated sprites (20x16 tile positions), all at the same time. All of these are 16x16 pixel aligned (sprites are 24 pixels high, and only one can move freely because it's turn based). This all has to run at around eight fps (for the animations) and leave enough CPU time to handle the AI and the 28 khz 14 bit stereo audio (stereo music with sound effects).

However, I would write my own text blitting routine if I were to write a text editor, for example. The OS simply uses the blitter, hence the reason for FBlit and FText making a real difference (also, double scan modes).

It's basically about how much effort you think is worthwhile to put into writing optimized custom code for things. It's also a hobby, and while you should of course try to actually finish software, it's also about writing the software the way you want (without making a mess). In a pro environment it's probably quite different.
« Last Edit: January 03, 2015, 08:55:11 PM by Thorham »
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #5 on: January 05, 2015, 11:04:27 AM »
Quote from: Thomas Richter;780998
The Os routine is quite ok
No, it's not, hence the reason FBlit+FText makes a real difference.

Quote from: Thomas Richter;780998
but from 2.0 on graphics is smart enough to place the text into an off-screen buffer first and blit from there.
Which is slow, because you get additional memory accesses. Far better to do everything in registers, write to chipmem and be able to use the CPU pipeline on 68020+.

Quote from: Thomas Richter;780998
optimizations are likely only possible if you aim at specific font sizes only, e.g. 8x8 glyphs as for the topaz.font.
You can write a properly optimized font renderer for any normal text editor font size. You can also take syntax coloring in account and not write all bit planes for each character.
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #6 on: January 05, 2015, 04:45:44 PM »
Quote from: Thomas Richter;781078
How much, and is that due to FBlit?
Don't have any numbers, but text rendering is definitely faster with with FBlit+FText on my system. I use 688x564 double scan screen modes, and these patches improve things quite a bit. FrexxEd in 16 colors benefits a lot from this. Goes from annoying to use in 16 colors to working perfectly fine.

Quote from: Thomas Richter;781078
How does that work on graphics cards?
It doesn't, because it patches the OS blitter functions with CPU based functions. You don't need FBlit for graphics cards anyway, so it doesn't really matter.

Quote from: Thomas Richter;781078
Well, there isn't really much chance to avoid memory accesses.
Of course, but instead of reading font data and writing to a buffer which you later have to copy to chipmem, you can do the work in registers, write those to chipmem directly and utilize the pipeline on 68020+.

Quote from: Thomas Richter;781078
You can probably get away rendering in fast ram for graphics cards in first place and then copy directly to the screen
For graphics cards you might not have to bother with anything. I wouldn't optimize for graphics cards anyway. Not interested, and probably always faster than native anyway.

Quote from: Thomas Richter;781078
but in one way or another, you need to fiddle all the bits in the right places to begin with, and there isn't much to be optimized *unless* you restrict yourself to some "nice" font sizes.
There's plenty of room for optimizing, and you don't have to restrict yourself to nice font sizes at all. The optimizations come from doing the work in registers and using the pipeline when doing chipmem writes.

Quote from: Thomas Richter;781078
Actually, all this bit-plane handling is pretty much obsolete in first place (I mean, custom-chip graphics)
Yes, but it's what you have to deal with when writing native Amiga software. Native chipset is also the most important to get fast, because it needs extra speed the most. This is one of the reasons why I'm not concerned with graphics cards: They just don't need optimizing as much as the chipset does.

Quote from: Thomas Richter;781078
but leave this as it is: Rendering only a single bitplane is pretty dangerous for an Os function because it cannot know what else is on the screen.
That's one of the reasons why I would write my own font blitting routine.
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #7 on: January 05, 2015, 06:50:19 PM »
Quote from: Thomas Richter;781086
There are fonts that are wider than 32 pixels and higher than 32 pixels
And why would anyone use those for text editors on native screens? Typical font sizes are much smaller for text editing.

Quote from: Thomas Richter;781086
thus no chance to put that everything into registers.
Sure you can, because you don't have to copy whole characters to registers. You only copy parts of characters.

Quote from: Thomas Richter;781086
The Os also keeps care about making the font bold, italic, underline or any combination thereof.
You can do that in your own font renderer, too.

When dealing with native graphics there are quite a few ways to get things to run faster. You have to decide for yourself if it's worth doing or not. In my opinion it is.
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #8 on: January 05, 2015, 10:23:36 PM »
Quote from: Thomas Richter;781096
I would rather say that this depends on the screen resolution and on the eyes of the user. At least, I wouldn't make base an optimization on this unless I have also a fallback mode that allows arbitrary fonts.
Just saying that it seems odd that you'd use fonts wider than 32 pixels for text editing, that's all. And no, I don't have hawk eyes (glasses) ;)

Quote from: Thomas Richter;781096
Well, as you wish. I personally would pick an editor on other qualities, though.
You're implying that I only look at one thing. In fact, for existing text editors the only speed requirement I have is that the editor is fast enough. FrexxEd is an example of this. It's not the fastest (not by a long shot), but it's undoubtedly one of the most powerful editors in Amiga 68k land (makes CygnusEd look like Ed).

If I would write my own editor, then I'd go for speed as well as power, ease of use and try to write a tidy, clean program that's maintainable (but yeah, in asm). I know how to get speed, so why not put that into the software I might write? This is especially important on low end 68k. Not much CPU time available, so getting good speed without sacrificing features is important.
« Last Edit: January 05, 2015, 10:26:31 PM by Thorham »
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #9 on: January 05, 2015, 11:52:27 PM »
Quote from: Thomas Richter;781101
Well, here is my math. How much time does an average editor sped in rendering text, compared to waiting for my input?
It's about the scrolling (line and page). The scroll speed depends on the combination of text rendering speed, syntax coloring system speed and scroll routine speed. If it's too slow, then editor becomes uncomfortable to use. Especially in hires double scan modes with 16 colors. That's why speed is important.
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #10 on: January 06, 2015, 09:44:49 AM »
Quote from: Thomas Richter;781127
There are other alternatives, though. You don't need to scroll every line. Buffer scroll commands or output commands, interpret several commands at once and use jump scrolling then. ViNCEd does that to avoid slowing down the output, i.e. while it prints its data, it already buffers new incoming commands, then executes all at once without scrolling through each of them individually.
Yes, of course you don't scroll ten lines individually when you want to scroll ten lines. The software creates ten lines worth of space with one copy operation, and prints ten lines, otherwise you get ten screen copy operations, which is very slow. However, if those lines are printed slowly, then it's still going to be slow (especially bad for page up and page down when you have lots of long lines).

I understand that you prefer using the OS for things. It's less work and it's easier to get things to run on graphics cards and what not. On the peecee that's usually fine, but on Amiga hardware you can do better if you write your own optimized code. You just have to put in the extra effort that's required do it properly, and when you do, you'll end up with software that runs better on lower end machines, and I think that's important.
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #11 on: January 11, 2015, 01:27:22 PM »
What an absolutely horrible editor that Ed thing :(
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #12 on: January 11, 2015, 10:11:04 PM »
Ed is a crappy editor, period. Why defend the damned thing? If it's so good, then how come no one really uses it for anything? The answer is because it sucks.

Just because some software is old doesn't mean that higher standards don't apply to it, and the platform it had to run on back then can do much better than that. Especially if you have an A500 with one megabyte of memory there's absolutely no reason whatsoever to use Ed.

Really, defending Ed? Why?
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #13 on: January 12, 2015, 06:09:39 PM »
Lol at Vim (68k AOS).
« Last Edit: January 12, 2015, 06:19:46 PM by Thorham »
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1150
    • Show all replies
Re: CopyMem Quick & Small released!
« Reply #14 on: January 12, 2015, 09:28:59 PM »
Quote from: Fats;781552
Don't dare to say anything more bad on (micro)emacs or I'll turn this thread in a classic emacs vs. vi war thread. And if you think the infighting on amiga.org is bad; I assure you you ain't seen nothing yet...
:D
On 68k AOS that's irrelevant, because the mighty FrexxEd destroys all :D