Author Topic: NetSurf OS3.x Issues (Read 60210 times)

olsen · « **on:** February 28, 2016, 02:07:41 PM »

Quote from: chris;804753

OK, this is more like I would expect - everything running consistently at a slower speed.

I know exactly what is causing the slowdown now. clib2 uses memory pools with a puddle size and expected allocation size of 4K. I modified that in the newer build to use normal memory allocations instead.

What is happening, is that early memory allocations are fast and efficiently allocated in 4K chunks. Then, when bits of memory is de-allocated it leaves holes. When new memory blocks are allocated it - and this is where I'm not sure of the implementation details in the OS - is trying to fill in the gaps in the already-allocated pools? With a lot of pools it may be taking some time to search through and find a gap of the correct size, which is similar to how normal memory allocations work when searching through all of RAM (and thus a similar speed).

Quite simply, we are allocating and de-allocating so much memory that we quickly lose any advantage of memory pools.

To fix it... well, that's tricky. The correct way would be to pool together elements of the same size to avoid fragmentation, but I can't do that in the core and all libraries without re-writing all the memory allocations (which would definitely not be popular). Note I already do this in the frontend everywhere it is practical (this was one of my earlier OS3 optimisation attempts!)

It may simply be a case of making the memory pools bigger, and I will try that first.

I suspect that this may not make much of a difference. The memory pools, which is what the malloc()/alloca()/realloc()/free() functions in clib2 are built upon, were intended to avoid fragmenting main memory. This is accomplished by having all allocations smaller than the preset puddle size draw from a puddle that still has enough room left for it to fit. Fragmentation happens inside that puddle.

The problems begin when the degree of fragmentation inside these puddles becomes so high that the only recourse is to allocate more puddles and allocate memory from that. The number of puddles in use increases over time, and when you try to allocate more memory, the operating system has to first find a puddle that still has room and then try to make the allocation work. Both these operations take more time the more puddles are in play, and the higher the fragmentation within these puddles is. Allocating memory will scale poorly, and what goes for allocations also goes for deallocations.

The other problem is with memory allocations whose length exceeds the puddle size. These allocations will be drawn from main memory rather than from the puddles. This will likely increase main memory fragmentation somewhat, but the same problems that exist with the puddles apply to main memory, too: searching for a chunk to draw the allocation from takes time, and the same goes when deallocating that chunk. There's an additional burden on this procedure because the memory pool has to keep track of that "larger than puddle size" allocation, too.

Because all the memory chunk/puddle, etc. allocations and deallocations use the humble doubly-linked Exec list as its fundamental data structure, the amount of time spent finding the right memory chunk, and putting the fragments back together, scales poorly. Does this sound familiar?

From the clib2 side I'm afraid that the library can only leverage what the operating system provides, and that is not well-suited for applications which have to juggle large number of allocated memory fragments.

Question is what size of memory chunk is common for NetSurf, how many chunks are in play, how large they are. If you have not yet implemented it, you might want to add a memory allocation debugging layer and collect statistics for it over time.

It may be worth investigating how the NetSurf memory allocations could be handled by an application-specific, custom memory allocator that sits on top of what malloc()/alloca()/realloc()/free() can provide and which should offer better scalability.

olsen · « **Reply #1 on:** February 29, 2016, 08:38:55 AM »

Quote from: itix;804810

I would just take a shortcut and install TSLFmem: http://dump.platon42.de/files/

Other than that, there is no really solution. Designing good memory allocator is an art of its own where one has to consider memory fragmentation, allocation performance and deallocation performance.

All true... but given the limited scalability of the built-in AmigaOS 68k memory management system, it's worth a try looking for alternatives. What's the worst that could happen?

Quote

Yeah, I read from previous page that with TLSFmem Netsurf is crashing but this is very likely due to internal memory trash somewhere in Netsurf... with good old memory lists and standard memory pools buffer under/overflows often go unnoticed but with TLSF you are likely going to crash right away.

Of course, Wipeout session could reveal this albeit it is going to be painfully slow experience with such a complex application.

One could do what Wipeout does directly within the application itself. There exist wrappers for the entire malloc()/alloca()/realloc()/free() family which hook into a tracking system which verifies consistency and also keeps score of where the allocations were made from (with file name and line number), etc. You could even build your own system if you wanted to (I did with Wipeout - this isn't so hard, really).

Wipeout is several layers removed from the application performing the memory allocations and will provide little information to go on. It's a system-wide debugging aid, whereas here you would be better served by an application-specific solution.

Update: I looked around a bit and found the Boehm garbage collector (http://hboehm.info/gc/) which can be used as a drop-in for the malloc(), etc. family. Might be worth a shot. Debug wrappers for the malloc(), etc. family exist in several flavours, e.g. http://dmalloc.com or http://www.hexco.de/rmdebug/. Some assembly may be required.

olsen · « **Reply #2 on:** February 29, 2016, 01:21:01 PM »

Quote from: kamelito;804817

@Olsen
Lists, this remind me of this : https://isocpp.org/blog/2014/06/stroustrup-lists

Kamelito

As always, he has a point

The memory management system in AmigaOS 68k is, in my opinion, adequate for the type of machine it was intended for when the system was designed. It's fast and managing free space is efficient, too, for small amounts of memory (say, 256 KBytes to tens of MBytes). Because it is built around doubly-linked lists it's easy to understand how it works, which is a big bonus in this field. These properties seem to be a good match for Stroustrup's view on the matter.

Unfortunately, a simple and robust memory management system is still simple and while it may have held up well for more than 30 years in its domain, the design could not anticipate the needs of a web browser, which is likely one of the most complex systems ever designed

olsen · « **Reply #3 on:** February 29, 2016, 01:33:20 PM »

Quote from: chris;804828

Hi Olaf, thanks for commenting. I increased the puddle size to 16K from the default 4K and it seems to have helped in my limited testing (no feedback yet from anybody else). I figure this reduces the number of puddles in the list which need to be searched through, as well as allowing larger allocations into the pool.

It might help, but you will have to put up with the limitations of the list-based memory management system if you stick with the pools.

As I mentioned before, you might want to cast a bigger net for alternative memory management solutions. No matter how much you tweak the puddle size, there will be side-effects which are just as strange as the ones you hope to get under better control (fragmentation, number of items in each list that need to be checked, etc.). Hence, I would suggest you look into finding a memory management system which can be tuned to the needs of the application.

Quote

I haven't. I think there are cache statistics in the log which might offer some clues though. Beyond the cache data everything else is structures which should all be very small.

Well, you can't get a good handle on the memory problem until you have enough data to permit you to make a decision on how to proceed.

For example, you could collect information on the average sizes of allocations made and then group them. Let's say you have one group of 128 bytes or less, one of 256 bytes or less, up to 4KBytes or less and then everything else. You could set up different pools from which only these specific allocation sizes would be drawn. For best effect match the puddle sizes to the most frequently used allocation sizes.

Another idea would be to recycle memory allocations once they are freed. Again, you would group allocations by size (128 bytes or less, 256 bytes or less, etc.) and once an allocation is freed, you'd stick it into a list of chunks of the same size available for reuse. If an allocation is made which matches the chunk size of an entry in the list, you'd pick up the first entry and reuse it. That saves effort because you don't have to merge allocations back into bigger chunks upon freeing them, and because you don't have to search for allocations inside the fragmented puddles: you just pick up the first entry in the respective list, remove it from the list and the use it. Mind you, you will have to watch how much memory is tied up in these lists and prune them to avoid running out of memory over time.

olsen · « **Reply #4 on:** February 29, 2016, 01:42:33 PM »

Quote from: wawrzon;804834

thanks for confirmation olaf. this is ecxactly what i had in mind. however designing application specific memory allocator oer allocation method is probably not the right way..

Sometimes it's the only option to regain control over how the application uses the memory if how it is used is a poor match for the abilities of the operating system.

An extreme example may be the BSD-Unix derived TCP/IP stacks for the Amiga which all retain the source kernel's memory management system, which is fused with the TCP segment reassembly logic. This code sits atop the Amiga memory management system, where it emulates the 2K memory pages of the Unix system it was originally designed for.

I have used simpler custom memory management code in my own server applications, e.g. by directly recycling freed memory chunks instead of delegating the work to the operating system and the 'C' link library that sits on top of it. I helped to keep long term fragmentation of memory at bay.

All of these are perfectly legitimate solutions

olsen · « **Reply #5 on:** March 02, 2016, 07:30:10 PM »

Quote from: chris;804983

Yes... I don't really want to write my own memory management though!

Need a hand?

Author Topic: NetSurf OS3.x Issues (Read 60210 times)

olsen

Re: NetSurf OS3.x Issues

olsen

Re: NetSurf OS3.x Issues

olsen

Re: NetSurf OS3.x Issues

olsen

Re: NetSurf OS3.x Issues

olsen

Re: NetSurf OS3.x Issues

olsen

Re: NetSurf OS3.x Issues