Your best bet is to render individual frames to disk, as you thought. There are a few reasons for this.
First and foremost is that there's no additional hardware required

If for any reason you had a system failure in the middle of the rendering process (power failure, etc), you can simply pick up and continue rendering from where the failure occurred; if a failure occurred while outputting to a video format, you would have to start the output process over from the beginning...not happy if you're at the 99% mark after 10 hours of render

By outputting to individual frames, you have multiple re-targeting options. You can use a batch-conversion utility (such as irfan) to convert your source images to your desired format. For example, I'll often bring frame anims into Maya for effects work or as a backplate for other animation; I'd convert the source images to .iff, TIF, or TGA format.
Another benefit of batch processing individual frames is the ability to process multiple tweaks, such as color correction, chromakeying/rotoscoping, sharpening/desharpening, and, maybe most importantly, pixel aspect ration correction/compression.
After outputting an frame archive, any number of utilities can be used to assemble the frames into an animated video, in any of several formats. As mentioned, VirtualDub is perfect for the job.
Just a couple of reasons there, but I hope they show that frame-based output is safer, easier, and much more robust pipeline than outputting directly to video formats.