Author Topic: Duke Nukem 3D new version for 68060 (Read 11916 times)

cgutjahr · « **Reply #14 on:** May 09, 2016, 04:33:14 PM »

Quote from: Cosmos;808121

The Nova version is only AGA, that's why I updated the RTG Dante's version

I don't understand - aren't we talking C code here, or did Dante optimize the whole thing by porting parts of it to ASM? Libnix certainly is C.

Quote

Ok, I'll put it on Aminet !

That requires honoring Duke Nukem's license (GPL). I'm not sure how to do that, if you just patch other people's binaries. But at the very least, the license needs to be mentioned and you need to include an offer to provide interested parties with a way to easily duplicate what you did. I doubt that "I replaced all mul64/div64 instances with 68060 friendly opcodes" will suffice...

Edit: Your forum seems to have a minor SPAM problem: three postings from you, followed by eleven pages of spam postings...

Cosmos Amiga · « **Reply #15 on:** May 09, 2016, 04:44:44 PM »

Quote from: cgutjahr;808140

I don't understand - aren't we talking C code here, or did Dante optimize the whole thing by porting parts of it to ASM? Libnix certainly is C.

That requires honoring Duke Nukem's license (GPL). I'm not sure how to do that, if you just patch other people's binaries. But at the very least, the license needs to be mentioned and you need to include an offer to provide interested parties with a way to easily duplicate what you did. I doubt that "I replaced all mul64/div64 instances with 68060 friendly opcodes" will suffice...

Edit: Your forum seems to have a minor SPAM problem: three postings from you, followed by eleven pages of spam postings...

Ok, no upload to Aminet...

There are some parts in asm into the Dante's source...

I'll see for the SPAM problem...

grond · « **Reply #16 on:** May 09, 2016, 04:54:58 PM »

Quote from: Cosmos;808133

Explanation : the instructions mul64 & div64 are NOT inside the 68060, so the 68060.library emulate them, and very slowly...

The 64bit MUL and DIV are available in hardware in the apollo core so this optimisation is probably going to make the code run slower on the vampire. Same problem as with most optimisations, you can't optimise for all target hardwares at the same time...

utri007 · « **Reply #17 on:** May 09, 2016, 06:08:37 PM »

Quote from: Cosmos;808141

Ok, no upload to Aminet...

There are some parts in asm into the Dante's source...

I'll see for the SPAM problem...

http://aminet.net/package/game/shoot/duke3d_amiga_v0.3

Cosmo just upload it to aminet. You can easily "honore duke nuken 3d GPL lisence" like Dante did. I don't want to register french language forum just to download one lha file. It is not anyone interest.

cgutjahr · « **Reply #18 on:** May 09, 2016, 08:10:29 PM »

Quote from: utri007;808145

You can easily "honore duke nuken 3d GPL lisence" like Dante did.

Dante uploaded the sources along with his binary, which is how you properly honor the license.

I'm just not sure how one would do that if you don't modify the sources but simply patch the binary (assuming that's what Cosmos did). If anybody's got an idea, I'm all ears. Like I said, uploading the file with hint that he replaced this and that wouldn't cut it, IMNSHO.

Cosmos Amiga · « **Reply #19 on:** May 10, 2016, 05:43:42 AM »

Quote from: cgutjahr;808154

Dante uploaded the sources along with his binary, which is how you properly honor the license.

I'm just not sure how one would do that if you don't modify the sources but simply patch the binary (assuming that's what Cosmos did). If anybody's got an idea, I'm all ears. Like I said, uploading the file with hint that he replaced this and that wouldn't cut it, IMNSHO.

The Dante's source were compiled again with gcc 2.95.3-4 and with updated asm parts...

guest11527 · « **Reply #20 on:** May 10, 2016, 09:54:56 AM »

Quote from: jltursan;808118

Unbelievable!, and all thanks replacing mul/div 64 bits instructions?

Why simple if it can be done the complicated way... Actually, for those that do not know: There is a small tool on the Aminet ("MuRedox") that can perform such patches in real-time on running binaries. Any manual patching is quite superfluous. Yes, it *also* takes care of the 64-bit multiplication and division instructions.

ExiE_ · « **Reply #21 on:** May 10, 2016, 09:59:22 AM »

Quote from: cgutjahr;808154

Dante uploaded the sources along with his binary, which is how you properly honor the license.

I'm just not sure how one would do that if you don't modify the sources but simply patch the binary (assuming that's what Cosmos did). If anybody's got an idea, I'm all ears. Like I said, uploading the file with hint that he replaced this and that wouldn't cut it, IMNSHO.

OMG. Anyway Cosmos could release just patch, differences between two binary files, based on Dantes 0.3 version from Aminet, for spatch for example and we have nothing to talk about.

wawrzon · « **Reply #22 on:** May 10, 2016, 10:23:02 AM »

what concerns aminet, it looks like cosmos can simply include updated sources, along with asm inlines in his release, no problem. and cg is not nitpicking here on anyones work, he simply needs to ensure license conformity of material published on the repository he is in charge of maintenance.

then again, solution proposed by thor is another viable option, people might not initially think of, rather simply giving up on slow binaries.

Cosmos Amiga · « **Reply #23 on:** May 10, 2016, 10:54:43 AM »

Quote from: Thomas Richter;808172

Why simple if it can be done the complicated way...

Because mul&div64 routines are used a lot :
1/ with a direct replacement, it will be faster than using MuRedox, each cycle is important for this case
2/ Actual integer replacement mul&div64 is a bit slow, fpu version seems even faster
3/ Another secret goal for the moment, maybe for the v0.5

Again, it's not a patch/hack, but a complete compilation with gcc 2.95.3-4

guest11527 · « **Reply #24 on:** May 10, 2016, 12:14:17 PM »

Quote from: Cosmos;808177

Because mul&div64 routines are used a lot :
1/ with a direct replacement, it will be faster than using MuRedox, each cycle is important for this case

Have you measured this or are you guessing? The reason why the emulation is so slow is not because the actual replacement implementation is slow, but rather because the fpsp/isp has to go through a complete decoding cycle of the instruction, i.e. it has to figure out where the data is coming from, fetch it there, and where it is going to, all in supervisor mode with interrupts block. This overhead goes away.

In reality, "mulu" is part of a bigger program and typically not as critical as you may believe. Unless it is really called in a tight loop, more or less as the only instruction within such a loop. So for example, if you compute a Mandelbrot fractal in integer math, the difference between a 32 multiply natively executed by the 68060 or a 64 bit math patched over by MuRedox is measurable - more or less because it is the only instruction in the loop.

In real world, this rarely happens and the multiplication is just a minor part of a bigger algorithm.

cgutjahr · « **Reply #25 on:** May 10, 2016, 01:21:15 PM »

Quote from: Cosmos;808167

The Dante's source were compiled again with gcc 2.95.3-4 and with updated asm parts...

Oh? Then I simply misunderstood what you did. In that case, there's no problem of course. Just upload the modified souces along with your binary (in a separate archive, preferably) and make sure you mention the license.

guest11527 · « **Reply #26 on:** May 10, 2016, 01:45:17 PM »

So here are some hard facts. FYI, this is measured with DMandel, a fractal generator, quite deeply zoomed in. This runs a couple of multiplications and additions (and nothing else) in a very tight loop, probably around 10000 iterations per pixel, for a 1024x768 XGA screen.

68000 integer (fixed point) math requires 1:33 for the full picture.

68020 integer (fixed point) math: This is based on 64 bit integer operations (add, addx, but also 64x64 multiplication) - 7:20.

68882 FPU: This is using FPU instructions in a tight loop, based on floating point. Note that this does not require explicit scaling (as in the fixed point case) and hence the loop is tighter - 1:08. Surprisingly, the FPU is here faster than the CPU, at least on the 68060. For the 030, this is not the case.

FPU, but via the mathieeedoubbas.library. Essentially, it is using the same instructions as in the 68882 case, but has to go through the ieeedoubbas library interface, hence some "register ping-pong". The loop is hence not very streamlined. - 3:35.

68020 math as above, 64 bit multiplications, but patched with MuRedox: 3:55. So a bit less than a factor of two as speedup, or approximately doubling the overhead for going through an instruction decoding phase.

Note again, this is for calling instructions in a tight loop (probably 20 instructions long), nothing you would typically find in most other applications.

I forgot to say: This is a 68060@50Mhz.

nicholas · « **Reply #27 on:** May 10, 2016, 02:58:43 PM »

Quote from: cgutjahr;808188

Oh? Then I simply misunderstood what you did. In that case, there's no problem of course. Just upload the modified souces along with your binary (in a separate archive, preferably) and make sure you mention the license.

Or an offer to provide the source to anyone who possesses the binary should do to comply with the GPL.

Cosmos Amiga · « **Reply #28 on:** May 10, 2016, 03:09:21 PM »

Quote from: Thomas Richter;808182

Have you measured this or are you guessing?

No precise measure, but it's used A LOT for sure !

Quote from: Thomas Richter;808182

The reason why the emulation is so slow is not because the actual replacement implementation is slow, but rather because the fpsp/isp has to go through a complete decoding cycle of the instruction, i.e. it has to figure out where the data is coming from, fetch it there, and where it is going to, all in supervisor mode with interrupts block

Of course, I know.

Inlining is always much better than your MuRedox in this case !

My routines included skip some of your code : no time lost...

Anyway, was very easy to do, maybe one hour of work...

Linde · « **Reply #29 from previous page:** May 10, 2016, 11:12:08 PM »

Just a few quick notes here since I see a lot of misconceptions about the GPL license.

[ul]
[li]You don't honor the license simply by distributing the source. Binary distributions as well as the source distributions have to include the original license, unchanged, and be released under the terms of that license.[/li]
[li]If you distribute binaries on a network server, you have to distribute the source on a network server as well. That is, if you are offering binaries for download, you have to offer the source for download as well, not simply offer to send the source to someone requesting it.[/li]
[li]Source patches or binary patches are not permissible ways to distribute the source. You honor the license by distributing the full source or simple means to retrieve it(e.g. via a version control system)[/li]
[/ul]

Author Topic: Duke Nukem 3D new version for 68060 (Read 11916 times)

cgutjahr

Re: Duke Nukem 3D new version for 68040/68060

Cosmos Amiga

Re: Duke Nukem 3D new version for 68040/68060

grond

Re: Duke Nukem 3D new version for 68040/68060

utri007

Re: Duke Nukem 3D new version for 68040/68060

cgutjahr

Re: Duke Nukem 3D new version for 68040/68060

Cosmos Amiga

Re: Duke Nukem 3D new version for 68040/68060

guest11527

Re: Duke Nukem 3D new version for 68040/68060

ExiE_

Re: Duke Nukem 3D new version for 68040/68060

wawrzon

Re: Duke Nukem 3D new version for 68040/68060

Cosmos Amiga

Re: Duke Nukem 3D new version for 68040/68060

guest11527

Re: Duke Nukem 3D new version for 68040/68060

cgutjahr

Re: Duke Nukem 3D new version for 68040/68060

guest11527

Re: Duke Nukem 3D new version for 68040/68060

nicholas

Re: Duke Nukem 3D new version for 68040/68060

Cosmos Amiga

Re: Duke Nukem 3D new version for 68040/68060

Linde

Re: Duke Nukem 3D new version for 68040/68060