I did understand your point - and I do know it's running on lots of abstraction layers, but I don't experience any noticeable latency in input when using WinUAE.
I would imagine there is less latency between midi input & output than there is between joystick & video. With video you can only send a complete frame at a time.