Author Topic: Zorro III memory card... now with Ethernet (Read 13736 times)

olsen · « **on:** November 17, 2013, 11:35:22 AM »

Quote from: tnt23;752890

Anybody willing to help in developing device driver for the DM9000, you are more than welcome, too.

Now that sounds tempting - if only I hadn't my hands full already

How do you feel about open sourcing the resulting driver? As things stand, we do not have enough SANA-II Ethernet driver examples which can be reviewed and adapted by anybody interested in the matter. The networking driver source code, as part of the original SANA-II kit, explains how one might create a SLIP driver, but that is so obsolete it's not even funny any more.

olsen · « **Reply #1 on:** November 20, 2013, 08:50:27 AM »

Quote from: tnt23;752975

I would not mind sharing the driver at all, only there's a bit of chicken and egg problem. To come up with a decent driver I'd take a look at a sample one written by someone else (and preferrably not in assembler). It is really great 3c589.device sources are available, although the code seems rather complex to me. I think I'm going to hack it quickly to get ping or DHCP reply before posting.

I had a quick look at the 3c589.device source code, and it looks good to me (I'd probably rewrite the basic device I/O code to be more paranoid, though, and the cleanup procedures in case of errors should be reworked). It even has support for the SANA-IIR2 packet filter feature which some drivers choose to omit.

As far as I can tell from my past experience, 3c589.device is a good SANA-II Ethernet device driver design. It may appear to be complex, but this is in fact how you would implement this kind of driver. It properly separates the basic device I/O, the individual hardware units, and the low level hardware access. This is really how it's supposed to be done.

More code documentation would always be nice, but not everybody wants his code to be a didactic example

olsen · « **Reply #2 on:** November 26, 2013, 08:20:02 AM »

Quote from: tnt23;753448

TX code is working now, too. Packets sent from Amiga are seen on the remote side. (Will have to set up a way to grab screenshots instead of using my mobile phone's camera.)

Luxury. When I was a lad, we used to make screenshots by using the tiniest stubs of chewn and grubby crayons. But we were happy then.

Quote

For that reason I've bought the RoadShow TCP/IP stack last night

I guess it's too late to talk you out of buying Roadshow, but the demo version should be able to handle test setups like these just fine

olsen · « **Reply #3 on:** November 30, 2013, 09:15:00 AM »

Quote

...

This would probably make sense if the AllocVec () call used MEMF_CLEAR in addition to MEMF_PUBLIC, but this is not the case. Even if the allocated memory has been zeroed previously why not provide NULL as the default value? The same very source uses NULL just a couple lines of code further:

Code: [Select]
opener->filter_hook = (APTR)GetTagData (S2_PacketFilter, NULL, tag_list); opener->dma_tx_function = (APTR)GetTagData (S2_DMACopyFromBuff32, NULL, tag_list);

Or perhaps there is some reason of (not) doing so?

As far as I can tell (the callbacks are initialized exactly once), this is risky, and there is no benefit in initializing the callbacks in this manner.

I would not call it a bug, since every client of the SANA-II driver is likely to provide the proper callbacks. But if it does not, for some reason, then the device will crash.

The 3c589.device should be more paranoid, and verify that each parameter provided by the client is sound.

I already put a snapshot of the whole 3c589.device/pccard.library source code into my SVN repository, for rework, but there's been too little time to rework it so far :-(

olsen · « **Reply #4 on:** December 01, 2013, 07:20:47 PM »

Quote from: tnt23;753638

Well, this indeed does not seem like a bug, at least no one complained so far. I think I understand what this code should do: since the S2_CopyToBuff is obligatory, the RX hook will be set to S2_CopyToBuff first. If the caller provides S2_CopyToBuff16 then the hook will be assigned that new tag value; otherwise, it will stick to S2_CopyToBuff, and so on.

Yes, that seems to be the intention. However, if S2_CopyToBuff were missing, and S2_CopyToBuff16 were missing, too, then the code will end up using an unitialized pointer, which should be caught before it happens. Same goes for the S2_CopyFromBuff tags.

Quote

That way, as it seems to me, the request will be serviced using the fastest hook caller provides.

The purpose of S2_CopyToBuff16 is not to speed up copying. It is the counterpart to the S2_CopyFromBuff16 tag, which is a workaround for a hardware bug. As far as I know this bug only exists in one type of Amiga Ethernet card, which is the original "Ariadne".

There is a bug in how byte-sized Zorro II accesses to the card are handled. These are treated like word-sized accesses, which means that garbage data will go out or come in the high order byte. This isn't much of a problem for reading (if you read a byte from the receive buffer, you'll probably write it back as a byte, too), but if you write bytes to the Ariadne buffer, this will trash half the buffer contents.

The solution is to copy only in word-sized portions to the buffer, or in long-sized portions if possible. For this purpose the ariadne.device allocates a side-buffer, which all writes will go through. First the data will be copied into the side-buffer, then the side-buffer will be copied quickly to the transmit buffer on the card (in long-sized portions). Problem solved, but at the expense of speed.

The S2_CopyFromBuff16 method solves the problem by requiring that the client copies only in word-sized portions (or long-sized portions). As far as I know, no ariadne.device with the S2_CopyToBuff16 method enabled was ever shipped. The ariadne.device supports a different method, which is functionally identical to S2_CopyToBuff16. The tag ID for this method is (S2_Dummy + 1968). I suppose the ariadne.device author (Stefan Sticht, if I remember correctly) may have been born in 1968

Put another way, no driver is really required to support the S2_CopyFromBuff16 method unless the driver really, really needs it.

Quote

Anyway, the bug on my side was so silly it even isn't worth mentioning. Time to dig DHCP (and try Sgrab):

Hm... does the DHCP negotiation succeed, eventually? If not, have you tried tcpdump yet?

olsen · « **Reply #5 on:** December 02, 2013, 07:52:30 AM »

Quote from: tnt23;753674

That's fascinating One would think the 16/32 buffer management routines have been proposed into SANA with performance in mind, not as some certain bug workarounds.

Since the DM9000 in my design is wired in 16 bits, and I tend to use word accesses wherever possible, using x16 routines would be preferrable in my case.

I would not recommend it. The 16/32 bit copy functions require that the data being copied is aligned to a particular address boundary, and that in itself is a restriction. That restriction may be necessary (if your hardware chokes on unaligned accesses, which would be rather unfortunate), but it does not produce speed gains. On the contrary: the 68030 would benefit from word-sized access restrictions, but since the Zorro II space is marked as non-cacheable there would be no advantage after all. And on a Zorro III board that question wouldn't even come up.

Sticking with S2_CopyFromBuff/S2_CopyToBuff has no downsides. Any client (e.g. TCP/IP stack) should use optimized copying code which would automatically use long-sized accesses.

So, in a nutshell: your driver should use S2_CopyFromBuff/S2_CopyToBuff and ignore everything else, unless your hardware has very specific requirements for which the 16/32 bit aligned copying functions would solve a really big problem.

The same goes for the S2_DMACopyFromBuff32/S2_DMACopyToBuff32 functions: unless your hardware supports this functionality perfectly (that is, it actually supports DMA to/from arbitrary 32 bit aligned addresses) don't bother implementing it. The benefits of these functions are very, very small if you don't support DMA. You might be able to skip one copying step inside the TCP/IP stack, but the gains are small. One case (perhaps the only case) in which the gains are not so small is the PPPoE driver which I cooked up, and which is practically useless today

Quote

No, the DHCP gives up after a minute timeout. I suspect there are at least two reasons for that, first that the queueing TX is not done properly, and then there is good load of KPrintF () calls all over the code - running at 9600 by default. If the serial debug routines are blocking then this would also impact timings. I will change the speed to 115200 and also will fix the TX queueing.

Haven't tried tcpdump yet, but definitely will

tcpdump is worth a shot if you suspect that traffic has gone missing which should have been processed by the TCP/IP stack. Readability of the output tends to be rather mixed bag, though, so this might be a good idea only if all other options have been exhausted (or if you create binary capture files and view them in "Wireshark").

olsen · « **Reply #6 on:** December 10, 2013, 12:22:42 PM »

Quote from: tnt23;753805

Here's what I've been looking into: (http://wiki.amigaos.net/index.php/Revision_3)

Code: [Select]
These are optional callbacks presented to the device with the same calling interface as for S2_CopyToBuff or S2_CopyFromBuff, respectively. The difference to the original callbacks is the required and guaranteed transfer size and alignment for accessing the device's buffer for a single piece of a data of either 16 or 32 bits, a data word. The copy function called may only use 16/32 bit aligned read/write commands of 16/32 bits at once to transfer the data words, respectively. If the buffer data length is not a multiple of the required data word transfer size, the last data word transfer may contain garbage padding in either transfer direction.

I don't know if this has been clarified yet.

The purpose of 16 or 32 bit variants of the S2_CopyToBuff and S2_CopyFromBuff callbacks is to restrict all copying to operations which transfer data in amounts of a specific granularity. In the 16 bit variant, only 16 or 32 bit transfer operations will be used. In the 32 bit variant, only 32 bit transfer operations will be used. By contrast, the S2_CopyToBuff and S2_CopyFromBuff methods will use 8, 16 or 32 bit transfer operations, as necessary.

The S2_CopyFromBuff/S2_CopyFromBuff16/S2_CopyFromBuff32 callbacks transfer data to a contiguous buffer. If your hardware has no such contiguous buffer to transfer data to, you will have to copy the data to a contiguous side-buffer, which is then given to S2_CopyFromBuff/S2_CopyFromBuff16/S2_CopyFromBuff32 to process.

It works exactly the same with the S2_CopyToBuff/S2_CopyToBuff16/S2_CopyToBuff32 callbacks, except that the data is transferred into the opposite direction.

You may be able to avoid using a contiguous side-buffer if the TCP/IP stack supports the S2_DMACopyToBuff32 and S2_DMACopyFromBuff32 callbacks. With these callback functions, you may receive a pointer to a contiguous buffer which is at least as large as you requested. You may then access this buffer and directly copy to/from it. Note that you may get a NULL pointer if no such buffer is available, which which case you would need to fall back to calling S2_CopyToBuff or S2_CopyFromBuff instead, respectively.

olsen · « **Reply #7 on:** December 19, 2013, 08:52:57 AM »

Quote from: tnt23;754231

Frankly speaking, I don't understand why, for the 16-bit case, there would be any 32-bit transfer at all. Say, if we need to transfer two 16-bit words with respect to both size AND alignment, then it should look like two "move.w (src)+, (dst)+" instructions should it not? The addressing will be done in words, and that's nice. In my perception this is not equal to one "move.l (src)+, (dst)+" instruction as the latter breaks both the size (transferring 32 bits at once) and alignment constraints (crossing the 16 bit boundary).

There are two reasons.

The first is historic: up until very recently (and with the exception of the DKB WildFire, which I believe was capable of 32 bit wide memory access) all Amiga Ethernet hardware was either accessible only through the Zorro II bus, or did not permit 32 bit wide memory access. On the Zorro II bus, a 32 bit wide access will be broken up into two consecutive 16 bit accesses. How this worked out with hardware which could not support 32 bit wide accesses was up to the glue logic on the board.

The second is performance: the ratio of instructions executed vs. the amount of data copied is terrible for "move.w (a0)+,(a1)+", less terrible for "move.l (a0)+,(a0)+" and becomes better if you can leverage "movem.l (a0)+,d1-d7/a2-a6 ; movem.l d1-d7/a2-a6,(a1)+" style copying (better still if you can unroll the copying loop in which movem.l is used).

I stopped counting execution cycles more than 15 years ago, but I believe that performance of even an unrolled "move.w (a0)+,(a1)+" loop will be quite poor.

Roadshow contains a restricted version of the original, optimized copying function, with the restriction being that only 16 and 32 bit copying operations are used. The goal was to provide for better performance than the S2_CopyFromBuff/S2_CopyToBuff callbacks could. Which was done specifically for the "Ariadne".

There is a slow "move.w (a0)+,(a1)+" variant available in Roadshow already. It is enabled by default, but all the example interface configuration files disable it. To switch back to the slow variant, either remove the "copymode=fast" parameter from the respective interface file, or replace it with "copymode=slow".

Quote

That's exactly what I am trying to figure out. It is possible to implement the said contiguous buffer on my card, with the restriction that it should only be accessed in 16-bits using even addresses only. If the S2_CopyFromBuff16/S2_CopyToBuff16 hooks would follow that "move.w (src)+, (dst)+" restriction, everything should work smoothly - and that would eliminate the need in any side buffering, saving in memory and performance.

Now, if the S2_CopyFromBuff16/S2_CopyToBuff16 hooks at some point won't follow the granularity convention and decide to switch to transferring 32 bits at once, that would break the whole idea, I think.

Could be, but then your code needs to be able to handle the regular S2_CopyFromBuff/S2_CopyToBuff callbacks, which are likely going to be much worse in terms of performance. You will always have to be able to provide for a side-buffer, in case S2_CopyFromBuff/S2_CopyToBuff callbacks are invoked and the client offers no alternative callbacks.

Quote

I understand the DMA callbacks idea better now In fact, I am trying to perform exactly like that, checking if the DMA hook is available, then asking for the pointer etc. It even seems to work, although is slow as hell. Lot to check on my side.

So, back to our 16-bit stuff. Do you think it would be feasible to implement that 'strict' behaviour S2_CopyFromBuff16/S2_CopyToBuff16 in Roadshow?

See above: it's already supported

Quote

UPDATE. I'm afraid I have been terribly wrong: the hardware buffer on my side could only be arranged for long-aligned 16-bit access

If you can make it appear on a 32 bit aligned start address, then testing it with Roadshow's built-in slow 16 bit copy callback might just work out.

Author Topic: Zorro III memory card... now with Ethernet (Read 13736 times)

olsen

Re: Zorro III memory card... now with Ethernet

olsen

Re: Zorro III memory card... now with Ethernet

olsen

Re: Zorro III memory card... now with Ethernet

olsen

Re: Zorro III memory card... now with Ethernet

olsen

Re: Zorro III memory card... now with Ethernet

olsen

Re: Zorro III memory card... now with Ethernet

olsen

Re: Zorro III memory card... now with Ethernet

olsen

Re: Zorro III memory card... now with Ethernet