IIRC Doom doesn't write to the screen sequentially but in columns so pixels can't be grouped easily when writting to chipram.
Indeed, Doom's graphics engine is an advanced raycaster, and they get their speed because walls are merely vertically scaling a pixel-wide slice of a texture - and you can do a 1 dimensional scale very easily in integer maths.
In this case it's simply easier to render to a chunky buffer in fast RAM, and then copy/translate it to display memory using an optimised C2P routine. But even that's difficult, as you are turning 32 chunky bytes into 8 planar 32-bit words. You inevitably need to store some working data in memory as you are processing things, so a fast large CPU cache comes in very handy.
Quake probably does the same so writting to gfx ram a row of 4 contiguous pixels requires may require redesigning the rasterizer, that could speed up writes notably. Perhaps rendering in 8x8, 16x16 or 32x32 groups speeds up rendering
Quake is not a raycaster, but instead it is a more conventional 3D renderer.