I think it's possible to cascade DCMs. And XC3S400 & XC3S500E both have 4x DCMs.
These clocks can be done without DCM (provided the 28MHz):
28.63636 MHz / 8 = 3.579545 MHz (colour clock)
28.63636 MHz.. (agnus)
28.63636 MHz / 2 = 7.15909 MHz (m68k cpu)
28.63636 MHz / 40 = 0.715909 MHz (E clk)
The key issue is timing, thus the colour coding schemes which another person mentioned is quite irrelevant as output is direct RGB.
I don't think the colour frequency crystal is needed at all actually. One could simple hookup to the MCU crystal and thus eliminate one crystal + associated passives.
I think a pure
HDL solution is doable if the slight timing deviation is tolerable.