Unless you are some truly dreadful code, the old hardware context switch shoudldnt be noticable at all - it takes half a millisecond or so per switch (hence a millisecond overall).
It certianly doesn't impact on my system at all (at least as far as I can test), and thats the 040 version of the card. The 060 version is considerably faster apparently.
I have written some test code that thrashes the warpos kernel by doing a massive number of switches in a tight loop. Now that did bugger things up a tad, but any code like that is indicative of an extremely poor/vindictave coder ;-)