The machine has 4GB of RAM and 896MB of video RAM, which just isn't possible in a 32-bit OS 4GB address space (unless the OS supports PAE). Plenty of the applications (read games) I run in Windows are 32-bit, though drivers and codecs are 64-bit.
Generally, the benefits are that 64-bit optimised code runs faster on the CPU than legacy x86 code does (there are a few rare exceptions, even in some of my own code), since 64-bit code can make use of 16 64-bit general purpose registers for integer code and at least SSE2 for floating point/vector ops.
Furthermore, 32-bit applications in the 64-bit environment can allocate more physical RAM than they could in a 32-bit one, since on 32-bit, only around 2GB was addressable in total (1GB of address space reserved for OS/hardware space, another 1GB used to map in the video memory.
As an example of the benefits of this: I'm working on a mapping application that runs on Linux. This app frequently have to work on datasets way larger than 2GB, often over 4GB.
For the legacy 32 bit version, which I'm facing out, this code has to seek to a location and do one or more reads to get to specific data.
For the 64 bit version, it just uses mmap() to map the entire data file into memory at once (ca 2GB is really the practical limit for this on 32 bit Linux, regardless of PAE, since PAE only helps with the *total* amount of RAM in the system, not the per-process addressable space), and reads from wherever it wants to, and leaves the OS to optimize the disk IO accordingly (e.g. how many/few bytes are worth reading.
As it turns out, Linux does a pretty good job of that, and in any case not having to do explicit seek()'s and read()'s saves a massive amount by reducing the number of context switches.