They are doing some sort of hardware interface anyway even when they target APIs, but instead of all manufacturers of said product using same hardware interface they are using same APIs which is suboptimal.
You haven't addressed Protracker 1.0 vs OctaMed V4 vs Deluxe Music interactions.
How would they know every CUDA processor variant? G84M A2 and A3 stepping has different voltage parameters. Drivers can detect the steppings and adapt accordingly. The reasons for different stepping are due to manufacturing issues. Geforce 9650M GT and Geforce 9500M GS have memory timings.
Is the user land programmer going re-implement the power management for my GPU?
Setting the wrong P-states can destroy the GPU e.g. if you didn't install the latest NVIDIA driver and BIOS patches, you might invoke NVIDIA's G84/G86's "blackscreen of death".
Setting the wrong memory timings can corrupt display e.g. Geforce 9650GT running mod desktop driver corrupts the display, while it's fine on Geforce 9500M GS.
Subsequent hardware fixes introduce inconsistencies.
Standards usually have a long verification times e.g. X64 development vs CUDA development.