Author Topic: GLFW (OpenGL toolkit) port for AmigaOS (Read 33401 times)

Karlos · « **Reply #14 on:** May 18, 2003, 09:35:37 PM »

Hi marcus,

The only problem would be if you mixed up a normal thread with one one of your GLFW ones and proceeded to use it as if it were.

My Threadable service class (base for all threaded objects, wraps a task) uses the same strategy as you describe.

On creation of the task I set tc_UserData to point to the actual Threadable class that encapsulates it.

To avoid problems in cases where other threads that use the same field for their own purposes, the Threadable class has an identifer member value inside used to identify it as a 'Threadable' class object. This is basically an integer which must match a defined value 'IS_THREADABLE'

So, to get the current Threadable object for which the code is executing uses a static function thus

Threadable* Threadable::getCurrent()
{
  Task* who = FindTask(NULL);
  if (who)
  {
    // this is the dodgy bit, we cast to Threadable and see if the id matches
    Threadable* thread = (Threadable*)(who->tc_UserData);
    if (thread && thread->identity == IS_THREADABLE)
      return thread;
  }
  return 0; // not a Threadable thread
}

I certianly haven't ran into any problems with this approach. You could just put a check like the one above use identifier field in your structure. If you ever do somehow handle a thread which is not one of your GLFW ones, it can be differentiated easily by seeing if the data pointed to has this identifier.

Works for me :-)

Karlos · « **Reply #15 on:** May 18, 2003, 10:21:54 PM »

Incidentally, I should point out that I rarely need to determine the current thread using the method I mentioned since 99% of the time the task is running within the context of the Threadable object itself...

For code calls external to the Threadable object (and not given a reference to it) to the method described is essential and works a treat...

marcus256 · « **Reply #16 on:** May 19, 2003, 08:49:06 AM »

That's a good solution (at least as good as it gets, I suppose). Currently the interface is not very dependant on the custom GLFW thread structure, but I am planning to add support for multiple windows in a way that would require every single windowing function (event handling, window management etc) to access the thread private area.

Why? Thread aware window management!

The idea is very similar to OpenGL per thread rendering contexts (one context per thread - transparent from a coders point of view). For instance, I will have a function called glfwBindWindow(), which binds a window to the currently calling thread. All window operations will apply to that window. Since different threads can work on different windows, I need to store this "window binding" information in an easily accessible thread private area - my GLFW thread structure, pointed to by tc_UserData, is the obvious choice.

I can still use the "fool-proof" check that you described, but that means that using non-GLFW threads will limit the use of GLFW functions (e.g. you can't open a window from a non-GLFW thread).

I wish AmigaOS had something similar to POSIX pthread thread-specific data keys or Win32 thread local storage (TLS), which are basically indexed arrays of tc_UserData that you can dynamically allocate, per process. I suppose this does not work well without a proper process class though...

marcus256 · « **Reply #17 on:** May 19, 2003, 08:57:41 AM »

> Considered a port to AROS?

Is it worth it?

How is OpenGL support under AROS?

How different is AROS from AmigaOS? I need AmigaDOS processes, intuition, graphics and input.device. The rest is quite easy to replace (timer.device/ReadEClock, gameport.device, StormMESA etc).

Can I run AROS on my PC (natively, or under Windows, Linux or WinUAE)?

T_Bone · « **Reply #18 on:** May 19, 2003, 09:50:46 AM »

Quote

marcus256 wrote:
> Considered a port to AROS?

Is it worth it?

How is OpenGL support under AROS?

How different is AROS from AmigaOS? I need AmigaDOS processes, intuition, graphics and input.device. The rest is quite easy to replace (timer.device/ReadEClock, gameport.device, StormMESA etc).

Can I run AROS on my PC (natively, or under Windows, Linux or WinUAE)?

I don't know about OpenGL support in AROS, but it will run natively on a PC, and it will run hosted on Linux. other than that, it's pretty much source compatible with AmigaOS.

Definately worth checking out!

screenshot AROS

Karlos · « **Reply #19 on:** May 19, 2003, 01:03:14 PM »

Quote

marcus256 wrote:
That's a good solution (at least as good as it gets, I suppose).

Yeah. It's relatively simple and no way you can actually get access to a thread which isnt one of your own - the function just returns null if the thread that called it isn't one of your custom kind.

Quote

Why? Thread aware window management!

The idea is very similar to OpenGL per thread rendering contexts (one context per thread - transparent from a coders point of view). For instance, I will have a function called glfwBindWindow(), which binds a window to the currently calling thread. All window operations will apply to that window. Since different threads can work on different windows, I need to store this "window binding" information in an easily accessible thread private area - my GLFW thread structure, pointed to by tc_UserData, is the obvious choice.

That should work. According to the RKM, tc_UserData is entirely free for the programmer to use for pointing to task specific data that is meaningful to them. IIRC, exec pays it no attention whatsoever.

My only point is that multithreaded rendering doesn't do much performance wise since at the end of the day you only (usually) have one hardware rendering device that needs to be exclusively locked. I guess that's not your point anyway - most uses for multithreaded code are to simplify design rather than a speed optimisation.

Hwever, I did find one use. I made a Threadable Rasterizer class (still in development) that has a double buffered vertex array / command queue. The rendering calls fill one buffer whilst the previous one is being rendered by the internal thread.
In this instance there was an overall perfomance increase, principally because the rendering code often has to wait for the hardware to complete an operation before it begins a new one. By running it as a seperate task at a lower priority than the parent, when it's waiting the parent gets the cpu to continue working on other stuff realtively unimpeeded.

Quote

I can still use the "fool-proof" check that you described, but that means that using non-GLFW threads will limit the use of GLFW functions (e.g. you can't open a window from a non-GLFW thread).

I should point out that I do have a singleton MainThread class (derived from Threadable) that wraps the main thread of execution. That way the main process is seen as Threadable to the rest of the system.
You could probably manage something similar.

Quote

I wish AmigaOS had something similar to POSIX pthread thread-specific data keys or Win32 thread local storage (TLS), which are basically indexed arrays of tc_UserData that you can dynamically allocate, per process. I suppose this does not work well without a proper process class though...

That's the beauty of Threadable. You can extend it however you wish :-)

But why not just create a structure thus...

typedef struct {
  long identity;
  size_t numDataHandles;
  void* dataHandles[1];
} GLFWThreadLocalStore;

...and allocate that dynamically with a function eg :

GLFWThreadLocalStore* CreateLocalStore(size_t numHandles)
{
  GLFWThreadLocalStore *tls = (GLFWThreadLocalStore*)malloc(sizeof(GLFWThreadLocalStore)+(numHandles-1)*sizeof(void*));
  if (tls)
  {
    size_t n;
    tls->identity = IS_GLFWTHREAD;
    tls->numDataHandles = numHandles;
    for (n=0; ntls->dataHandles[n] = 0;
    return tls;
  }
  return 0;
}

When you need to resize the TLS, you can basically use a standard library function like realloc() to preserve whats in there..

If youre interested I can send you the source code to my kernel classes that will allow you to see how I overcome some of the problems you describe.

Karlos · « **Reply #20 on:** May 19, 2003, 01:04:31 PM »

....er and I forgot to say make tc_UserData point to your GLFWThreadLocalStore object :-D

Karlos · « **Reply #21 on:** May 19, 2003, 05:16:04 PM »

oops..wrong post

PiR · « **Reply #22 on:** May 19, 2003, 06:15:55 PM »

Greetings Gentelmen!

Sorry for the delay, I should have looked at this thread earlier, but (stupid me) I thought it may be another boring mumbo-jumbo. ;-)

AFAIK using standard 68k libraries has ONLY TWO requirements uppon CPU registers:
A7 - stack
A6 - library base
The rest of it is free to arrange by the guy who implements the library.
Additionally remember that:
A0,A1,D0,D1,FP0, FP1are so-called trash register, so the library user is obliged not to rely on any data that were previously in them
A2-A5,D2-D7,FP2-FP7 are supposed to be unchanged by any function, so if the library programer decides to use any of it, he MUST remember to preserve and restore them.

So

It's up to you where you want your arguments, so why not to have them in FPU registers? However if you decide to use FPU I think it would be extemally good practice to check for the its existance in Open() library function. If no FPU discovered library should refuse to open.

Of course we're talking about old Amiga standards here.

Good luck
PiR

marcus256 · « **Reply #23 on:** May 20, 2003, 12:24:06 PM »

What I meant with TLS, is that it would be nice if AmigaOS had support for TLS natively, so that I can use it to realize some of the GLFW threading things with it.

tc_UserData works very much as TLS, but the problem is that there is no OS-friendly way of allocating/deallocating it (as you said - exec simply does not care), so if GLFW is to use it, the application (that uses GLFW) can not use it. If AmigaOS provided proper TLS support, GLFW could allocate a TLS "key" or "index" private to GLFW, and the application is free to allocate other keys, meaning that there are no potential conflict situations.

By the way, I solved the condition variable (signalling primitive) support by adding a field to the GLFW thread structure called "waiting_for" (or something similar), so that when a thread is to signal or broadcast a condition to any waiting thread(s), it loops through all the known GLFW threads and checks the waiting_for field, to see if it is waiting for this particular condition. If so, a signal (that is private to the waiting thread, and whose ID is stored in its thread structure) is generated. Of course, critical sections (Forbid/Permit) is used wherever necessary.

I think this is the most viable solution for broadcasting signals to multiple threads in the way that is required for condition variables to work (it should be quite cheap too, since mostly you don't have more than a couple of threads, perhaps 10 at most or so). Have you done anything similar?

> I made a Threadable Rasterizer class (still in
> development) that has a double buffered vertex
> array / command queue. The rendering calls fill
> one buffer whilst the previous one is being
> rendered by the internal thread.

In the GLFW distribution I have an example program that works this way. It's a particle system, where the particle physics is carried out in one thread, and the rendering/billboarding is done in another thread.

First I did a straight forward solution without double buffering (meaning potential stalls). It roughly gave a 100% of the speed of a single threaded implementation on a single processor system, and 105-150% on dual CPU systems.

Then I added double buffering, actually resulting in a performance drop (about 95% on the single processor system, and slightly degraded performance on the SMP systems compared to single buffering).

I reccon the reason is that:

A) OpenGL hardware already runs asynchronously on most decent implementations (at least under Windows and Linux), so that there is no gain in using separate threads on single processor systems

B) Double buffering means more cache trashing, effectively degrading CPU performance

I still think multi threading like this is a good thing if it does not cost performance for single processor systems. Future systems are very likely to have multiple CPU cores (either SMP or SMT), meaning that multi threaded programs will gain performace "for free" on those systems. And, as you said, in many situations it can help the design to use multiple threads.

marcus256 · « **Reply #24 on:** May 20, 2003, 12:25:32 PM »

oops - dual post...

Karlos · « **Reply #25 on:** May 20, 2003, 11:18:39 PM »

Hi marcus,

-edit-

Is it a problem that the application running on GLFW cant use the tc_UserData if you use it? I thought the point was to avoid system dependencies. Just use your own GLFW threads within a multthreaded GLFW program, surely. As I see it, your GLFW threads are an interface. If you add your own TLS to it then the users of your framework will just use that instead.
-end edit-

Signalling in my system is realitvely straightforward. Since I have threadable objects, as opposed to just seperate threads running through some arbitrary code, I just perform a method for that object. If that method changes internal data, the thread will be aware of that automatically (having acces to the protected level internals). Methods which require synchronised access can simpy use a Lockable object that is bound to the internal thread (you can't lock it until the internal thread is done with it). Lockable is a service class that encapsulates the Semaphore mechanism.

The actual Amiga task, running within the context of the object, can always see the (protected) state information. Due to this, the only real signalling I need is to be able to go to sleep and wait for an event, or a time out. The theadable service provides a delay timer feature too, using the DelayTimer class (itse;f an encapsulation of the timer.device).

There is a sleep() method for theadable objects that actually uses the amiga Wait()/Signal() system.
So the internal thread can literally go to sleep. When you then kick the object by invoking the wake() method, the appropriate exec level signal is sent to the internal task which is then woken up and carry on.

So really I don't use a lot of different signalling, just sleeping and waking. All other state info is actually part of the object definition. There is also a shutdown signal defined which basically tells the internal thread to remove itself. The internal thread code can simply call the method that checks for a shutdown call and then do whatever is required to finish and exit.
It does this(cleanly) by a return from the run() method.
The thread which invoked the stop() method (which may be part of the destruction for example) is then forced to wait on the internal thread to finish.

The only other thing is that a call to shutdown() will wake up the thread if it is waiting for something already. This allows threads to respond quickly to getting told to finish up.

It's a robust system and the interface ensures that youd have to especially set out to break it in order to screw it up.

Quote

Of course, critical sections (Forbid/Permit) is used wherever necessary

Try to avoid this. If you can, use semaphore locking for shared resources, its much friendlier - especially if your going all multithreaded...
The only place I use this is inside the start() method that creates the task. With task switching momentarily disabled, I write the tc_UserData to point to the object and thats it.
I don't use it anywhere else and would rather avoid it all together. If you ever do a WarpOS version you'll see there is no ForbidPPC()/PermitPPC()...

-threaded graphics-

Agreed. In most cases a multithreaded approach to rendering on a single processor system is pointless.

However, the double buffered rasterizer I wrote works reasonably well because on the current hardware, rendering takes time and does force the calling code to wait (we are talking direct Warp3D level stuff here, not OpenGL). Only the simple pre-v4 Warp3D calls are asynchronous. The v4 vertex array calls (which I use for efficiency/flexibility reasons) are not (well there may be some parallelism at the hw level). The cache thrashing issue isn't much of a problem in my code since the buffers aren't very large anyway.

So, in my case whilst drawing isnt physically any faster, the setup stage can continue so time is not wasted. The threaded double buffering adds the asynchonicity you would expect from a 'decent' OpenGL implementation.

I don't have a high level 3D system apart from a simple transforamtion / shading engine. I have no interest in trying to compete with OpenGL :-)

Anyway, if you use multithreading under Windows, you'll love it on AmigaOS. Task switch times are miniscule :-)

marcus256 · « **Reply #26 on:** May 21, 2003, 12:46:54 PM »

Daarrrgh!!!!

I wrote a leeengthy reply to this mail - but it didn't get posted (login timeout?). Anyway, this will be more brief...

> -edit-
> [snip]
> -end edit-

I agree...

I didn't quite understand your signalling policy. GLFW mimcs the POSIX pthread API (which rocks, IMHO), by supporting mutexes (AmigaOS signal semaphores) and condition variables (sleep/wake mechanism).

What's special about condition variables is that any numder of threads can be waiting for the same condition, and any numbder of threads can be signalling that condition (not knowing about which, if any, threads are waiting). Also, the condition variable does not maintain any state (it's like a strobe), so the actual condition has to be managed through mutex-protected shared variables. This means that the condition can be of arbitrary complexity (boolean, counter, combination of conditions etc).

The problem with AmigaOS is that each task has it's private set of allocated signals, and the signalling thread must know both which task to signal, and which signal ID that particular task is waiting for. That is why I need a loop to check each and every GLFW thread if it is waiting, and which signal ID it is waiting for (and of course, if the signal ID corresponds to the condition variable that is currently being signalled).

I actually don't use Forbid/Permit (I was confused with the joystick code I did recently, where joystick allocation needs Forbid/Permit). I use a global signal semaphore that protects all thread state (e.g. when a thread is added to the GLFW thread list).

Regarding task switch times: AmigaOS may be good, but Windows NT/2k/XP is really good! Windows 98 sucks big time though. I have a benchmark program in the GLFW example program collection which does forced context switching (two threads that signal/wait/signal/wait... in a loop). Here are some results:

AmigaOS (WinUAE, 68020 ~200 MHz): 50,000 switches/s
Windows 2000 (Athlon 700 MHz): 500,000 switches/s
Windows 98 (Athlon 700 MHz): 23,000 switches/s
Linux (Athlon 700 MHz): 160,000 switches/s
SunOS (6 x USPARC2 400 MHz): 120,000 switches/s
OSF/1 (1 x Alpha 21264 500 MHz): 130,000 switches/s
OSF/1 (2 x Alpha 21264 500 MHz): 40,000 switches/s

The signalling involves both mutex locking and condition signalling. The GLFW implementation of course has some kind of overhead to it too. I think that under Mac OS X (pthread) the figure is somewhere in the range 10-20 kswicthes/s. IRIX 5.3 also sucked if I remember correctly.

Do you have any similar benchmarking figures? (it would be interesting to compare)

Oh, and I have unconditional sleep too. It uses Amiga's Delay() - is that any good? (gives me a minimum of 40 ms sleep time in average - funny, I thought it would be 1000/50 = 20 ms)

I still haven't solved timed conditional waits. I suppose I would have to use timer.device to set up a timeout signal and wait for that too.

PiR · « **Reply #27 on:** May 21, 2003, 01:25:49 PM »

@marcus

I hesitated if I should write about it, however I decided I will.

I think you can improve you implementation of condition variables. As I understood till now you have your condition variables and each task has a waitfor field with address of the condition variable it waits for.
So every time any condition variable is signalled/broadcasted you have to check through all the tasks. The more tasks you have the more you have to look through.

If you like the POSIX way I think you should make the following modification:
'waitfor' should actually be a listnode, while inside condition variable should be a listheader. Linking/Unlinking to the condition variables (done due to waiting for and waking up) should be mutexed of course.

This is you code, so if I should keep my nose out of it just ignore it. I'm playing a lot of POSIX threads last few months.

Good luck
PiR

Karlos · « **Reply #28 on:** May 21, 2003, 02:39:36 PM »

Hi marcus,

-edit-

Pity we are never online at the same time!

PiR is right - a linked list is more efficient for this, I would say...
-end edit-

About the signalling policy. What I mean is, most of the time, the thread is running inside the context of the threadable object. Like the Thread interface in java, for example.
So, say I write code for a threadable object that waits for a member of that object to change value. The code would be something like

void MyThreadableObject::waitForValueChange()
{
  int oldvalue = value;
  while (oldvalue == value)
    sleep(); // indefinate until wake() or stop() called
}

Note sleep() is a simpliication. The real method is idle(uint32 millisecs, bool ignoreWake, bool abortIdle, SysSignal trigger);

..but that confuses the example slighty. Using a delay time of 0 ms is forever...
This pethod is called by the internal thread. I may write a public method like this

void MyThreadableObject::setNewValue(int v)
{
  value = v;
  if (isSleeping())
    wake();
}

And thats it. As soon as I call setNewValue() and the value I pass is different from the internal one, the internal task, if already asleep, will woken up.

As for delays. Don't use Delay() - it's bobbins for accuracy.

Use the timer.device and wait on that. I have a DelayTimer which is derived from MilliClock. It gives millisecond delay accuracy. I could easily mmake this finer but the MilliClock implementation has to work on lots of platforms, not all of which have the microsec resolution available to the amiga.

kamelito · « **Reply #29 from previous page:** May 18, 2016, 12:55:24 PM »

13 years later is GLFW ported to the Amiga in the end?

Kamelito

Author Topic: GLFW (OpenGL toolkit) port for AmigaOS (Read 33401 times)

Karlos

Re: GLFW (OpenGL toolkit) port for AmigaOS

Karlos

Re: GLFW (OpenGL toolkit) port for AmigaOS

marcus256

Re: GLFW (OpenGL toolkit) port for AmigaOS

marcus256

Re: GLFW (OpenGL toolkit) port for AmigaOS

T_Bone

Re: GLFW (OpenGL toolkit) port for AmigaOS

Karlos

Re: GLFW (OpenGL toolkit) port for AmigaOS

Karlos

Re: GLFW (OpenGL toolkit) port for AmigaOS

Karlos

Re: GLFW (OpenGL toolkit) port for AmigaOS

PiR

Re: GLFW port for AmigaOS

marcus256

Re: GLFW (OpenGL toolkit) port for AmigaOS

marcus256

Re: GLFW (OpenGL toolkit) port for AmigaOS

Karlos

Re: GLFW (OpenGL toolkit) port for AmigaOS

marcus256

Re: GLFW (OpenGL toolkit) port for AmigaOS

PiR

Re: GLFW (OpenGL toolkit) port for AmigaOS

Karlos

Re: GLFW (OpenGL toolkit) port for AmigaOS

kamelito

Re: GLFW (OpenGL toolkit) port for AmigaOS