Yes, real Fast Memory runs on a completely separate bus. The trapdoor slot runs on the custom chipset bus, so RAM there is substantially slower regardless of whether it is mapped as Chip or "Fast". You can test this yourself by running a RAM benchmarking program. If you run it in a 2 color 320x200 screen, and a 16-color 640x400 screen, you'll see that the "Fast" RAM is slowing down just like the Chip RAM, because the DMA for the screen refresh is hogging time on the bus.
If you look at the trapdoor expansion on your A500, you'll see that the board has an extra connector going to the Gary chip -- this hack is there to make sure that the memory mapping used for the expansion shows up properly on the internal bus. (Incidentally this can stop other expansions from working properly!)
If you want true Fastmem in your A500, it needs to be added to the CPU bus (either via a piggyback on the CPU socket, or the expansion slot on the left side). If you add a new CPU board that has local RAM on the expansion, this will create the best-case scenario.
I don't know why it's having problems with games, it could do with how the memory is being mapped conflicting with your IDE expansion's mapping. Also, a lot of very early badly-behaved games expected RAM to be in very specific locations, so this could be your issue. In general the original 24-bit Amiga memory map (introduced with the A1000, but the specific tweaks on the A500 became the "standard" map that a lot of people wrote for) had very specific portions of memory reserved for the trapdoor, while other parts were reserved for other parts of the system.
...
I don't know if they are badly behaved games-- perhaps they didn't foresee people adding hard drives and 8+ MB of fast RAM. I like those games that go directly to the hardware and take over the system to squeeze the maximum potential out of the hardware. Too bad we don't have that sort of innovative stuff nowadays on modern machines. Last I heard even the Playstation/Nintendo stuff involves going through APIs.
...
My A500 has a "Sapphire" 14Mhz 68020 card on the CPU socket, and a GVP HD8+ hard drive controller with expansion RAM on the side expansion slot. While the CPU card has no provision for 32-bit RAM and thus the speed of the system is limited, WHDLoad stuff runs like a champ on this setup and almost everything is compatible with it and runs at good speed, and there's no lag from custom chipset waitstates since the CPU is run synchronously.
I guess a good test would be to heavy a loaded copper list and have the processor use fast RAM to do calculations and see if turning on/off the copper list affects the performance. For true coprocessing w/dual bus, it shouldn't affect the time to do the calculations.
On the plus side, using a real Amiga you get super-silky scrolling from your games, something that is next-to-impossible to achieve in an emulator due to the levels of abstraction in the host OS's video drivers.
I can't really run any of my hardware interfacing stuff w/emulation anyways amongst other things.