A Tesla GPGPU is more than a mere GPU, as you are so keen to dismiss it. Furthermore, it's something that can be purchased.
Whilst I too like the concept of CAPI, apart from a few storage, and network FPGA accelerated cards, there are no massively threaded compute offload cards for CAPI. So if one were building a supercluster, then perhaps a CAPI capable interconnect would provide some edge. The cost per activated core on the Power 9 CPU might break the budget though.
So for something that sits on top of, or beneath a desk, and can perform at or above one trillion floating point operations per second, the most cost effective and practical system will be one with an x86-64 multi-core/multi-threaded CPU + multiple Nvidia Tesla GPGPUs.