@Trev
Re: legacy MEMF_REVERSE usage, if you observed that your allocations were in the upper half of memory (either physically or with respect to memory lists), wouldn't using MEMF_REVERSE have decreased allocation time by at least a factor of two, barring fragmentation?
The way MEMF_REVERSE flag is implemented in the First Fit algo is actually even slower than normal allocations. It always must scan all free memory blocks to find the last matching one (since they're sorted by address). Also, with TLSF the MEMF_REVERSE flag only adds complexity to the routine without adding any performance, or reducing the fragmentation.
It also seems to me that using an allocaligned-like function would have increased fragmentation with very little benefit, particularly for alignments <= 16, by unnecessarily increasng the size of the system memory lists.
Allocaligned kind of functions are only used to get memory aligned to specific granularily. Often it's needed for DMA operations, getting memory aligned to PPC code/stack alignment requirements or MMU page size aligned memory areas. You indeed should not use them lightly as they increase fragmentation quite dramatically. Luckily with TLSF the fragmentation no longer is an issue.