The macros are perfectly legal and they are a lot faster.
Please define "a lot". Or rather, take your time and benchmark console with the macro and without it, measure the difference and tell me. I haven't done that, admittedly, but just from your gut, what would you expect? 10% speed improvement? Not seriously. My gut feeling is: Below the statistical error. Way below. Don't waste your time in details.
Is it better coding practice to use slower code for little if any benefit?
Compatibility? Readability? Maintainability? Actually, that's quite worth something. What I do not like about macros is that they require knowledge of the implementation details to work. In other words, they expose the internal structure you are manipulating to the compiler, making it impossible to change that without recompiling the code. Again, admittedly, it's already too late for graphics to fix that, graphics is all over with code that exposes internal interfaces where it should not, but please! can we avoid this problem in future code somehow? There are interface functions, and here we have a perfectly fine "setter" function for an internal property. It's good practice to use that.
Your newest layers.library still uses the Forbid/Permit macros in most places instead of the function calls.
No. Actually, in one single place, and that was replaced by a semaphore in Os 4 anyhow (by Olsen). Again, you're looking at the details and not at the problem. The problem that is solved here (in the function using the Forbid()-locking) is creating a layer-info global stack of cliprects, thus avoiding a wasteful memory allocation, and hence avoiding memory fragmentation whenever possible. That makes a difference, though probably not for layers directly. Ideally, this could be a semaphore, but given that only a single pointer linkage requires protection, this might arguably an overshoot. The real reason why there is no semaphore is that back then I didn't know where to put that semaphore. It should be a layerinfo-relative semaphore (since that's where the cliprects belong to), but there is no room in the layerinfo anymore. It could be a global semaphore, but that means that one layer info can lock out another layer info, same as with forbid-locking. So there is not really a clear advantage of semaphores here either.
Also, some of your layers.library function calls go through the LVO and some use a regular BSR instruction. It looks pretty inconsistent to me.
That depends on the function, and is pretty much a choice the original designers of layers took (mostly). Given that layers is patched over by many places, it's not a good idea to mess with that overly. If you have a patch on top of a layer function, you'd probably better make sure to know when precisely that is called, and the patch - in some cases - should not be called for internal functioning of layers. Again, I outright admit that I do not understand the logic by which layers does or does not call through its external interface, but given that patches and programs may depend on exactly the choices that have been taken by our ancestors (ehem), it should probably stay that way.
A couple of functions aren't going to make much of a difference. I just found it funny that the example wasn't consistent with the article comments. It is also funny that the 68k AmigaOS with all its optimizing for space really wasn't so well optimized from their example either.
No, it's not a particularly good example. Or rather, it wasn't made quite clear by the article why that choice was a good choice and helped making console better. From a software design perspective, it *might* have been a good example, but it would have been an even better example to explain why that function call should be removed in first place.
Yea, but you wouldn't have removed the call because the code is working. I know you too well. Making the safe optimizations to gain what I could while leaving planar support intact sounds pretty good to me. It would be better if the code was in C with the programmer using the macros and the compiler doing the optimizations though.
Maybe I've done too much C++, but I wouldn't pick macros if there is a call interface. Macros are fine for internal interfaces, really. But for something you expose to the global world, they are a particularly bad choice. Again, too late for poor gfx anyhow.
Not likely faster. Not likely at all. Better is debatable.
"Better" requires a metric you measure quality with (oh boy, I'm again talking as if in a JPEG meeting again, same problem all over!). "Running speed" and "code size" are two of them, but not necessarily the most important ones. Something I did not understand back then either, again admittedly. But if you want to maintain code over a longer period of time, and want to update it if you need to, you'd better make sure that you have robust interfaces and a well-documented code, and code that speaks for itself. Assembler doesn't quite fit into this picture, and macros neither.