Welcome, Guest. Please login or register.

Author Topic: [Solved] A500 green screen, bad CIA, DiagROM address errors  (Read 1983 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline screwtopTopic starter

[Solved] A500 green screen, bad CIA, DiagROM address errors
« on: December 09, 2024, 03:36:29 AM »
I recently picked up a non-working Amiga 500 as a fixer-upper: Rev. 6A board, PAL, 8372A Agnus, Panasonic DRAM chips (four of them). It displays a green screen when trying to boot Kickstart 1.3, suggesting a memory issue.

The board looks to be in reasonable condition, though it had black spots (mould? coffee spill?) when I got it, and there was some tarnishing of some of the chipset pins and sockets (Agnus even had spots of green). I've cleaned those up with fibreglass pencil and Deoxit. There were also a few spots of where the solder mask had bubbled, which I've scraped and coated with some acrylic nail polish. DiagROM is able to run, and mouse, keyboard, serial port, sound and video all seem OK.

Memory tests are a different story. It'll get about 56 blocks in (so around 56 kiB?) and the machine will lock up, reporting address (and sometimes illegal instruction) errors via the serial port, sometimes accompanied by interesting stripes and glitches on the display. I've probed the address lines on the DRAM circuits with my scope, and the signals all look reasonable and consistent to my untrained eye. Scoping the input/output pairs on U34 and U35 shows the output matching the input, so I think those are OK too.

DiagROM did indicate a problem with the CIA-B timers. The problem moved when I swapped the CIA chips around, and went away when I replaced the suspect one with one from another Amiga. Here's what DiagROM reported initially:

Testing Timer A, on CIA-A (ODD) :  2000128ms                          OK
Testing Timer B, on CIA-A (ODD) :  2000128ms                          OK
Testing CIA-A TOD (Tick/VSync)  :  98 Ticks                           OK
Testing Timer A, on CIA-B (EVEN):  14080ms - CIA Timing too slow!     FAILED
Testing Timer B, on CIA-B (EVEN):  2000128ms - CIA Timing too fast!   FAILED
Testing CIA-B TOD (HSync)       :  17 Ticks - Too slow ticksignal     FAILED   


I've also tried swapping Gary, Agnus, and the CPU using parts from a working A500 (a Rev. 5). The Gary swap didn't seem to change anything. With the other Agnus (a PAL 8371), it was quite flaky, hanging a couple of times during DiagROM's inital multi-coloured screens, and when it did boot the display had a slight flicker with colours being off on some columns of pixels (reddish vs yellow text).

With the other 68000 installed, the machine went into an address error loop as soon as DiagROM prompts for RMB for serial access. It seems particularly strange to me that swapping the CPU would make any difference, and makes me suspect something wrong electrically with the board or sockets; I wonder how best to pin those down. I haven't been able to find much information on likely causes of the address errors, other than that the CPU will throw them if it tries to access a word at an odd (cf. even) memory address.

If it's bad DRAM, how would I go about investigating that further? Or would it be reasonable just to install some sockets and some RAM chips and trial-and-error it?

Thanks!
« Last Edit: December 30, 2024, 02:43:40 AM by screwtop »
 

Offline screwtopTopic starter

Re: A500 green screen, bad CIA, DiagROM address errors
« Reply #1 on: December 09, 2024, 11:26:04 PM »
To clarify, it doesn't "lock up" as such when DiagROM reports address or invalid instruction errors - you can generally click and continue, although the errors often recur, and sometimes it does get into a bad state where the serial output is corrupted and it has to be rebooted. I'm also getting a suspicion that "badness" can persist after a warm boot.

I managed to get some information about which address bits are implicated in the address errors, by using DiagROM's manual memory range test. After incrementally increasing the upper limit for the test range, I see specific address bits being flagged as potentially erroneous, starting around $E207.

ADDR Errors: --------  --------  ---EEE--  --------

After some further testing, a couple more problem bits have been raised:

ADDR Errors: --------  --------  ---EEE-E  E-------

I believe that would be address bits 7, 8, 10, 11, 12 (starting from 0). Presumably this is with respect to the CPU's address bus, not the DRA bus between Agnus and Chip RAM (since the latter only uses 9 bits with a column/row scheme)? If they were data errors I think I'd be able to deduce which chips were involved, but I don't know what to do with these address errors. Also, what's with the pin numbering on the 68000 ranging from A1..23 rather than A0..23?

I also realise I'm a bit confused about the function of holding certain mouse buttons down when DiagROM starts up, and that might be a confounding variable with my testing. Is this documented officially somewhere?

If it helps, here's a sample DiagROM AddressError screen:

Quote
                             AddressError Detected

Debugdata (Dump of CPU Registers D0-D7/A0-A7):
$000000FF $00000000 $00000000 $00000000 $00000000 $00000000 $0000FFFF $00000305
$00F940D1 $00CBFFFF $00C80000 $00F89B74 $00000000 $00F8245A $0006E47E $0006E1FA
                                                                               
SR: $B2B5 ADR: $00C90039 Content: FFFFFEFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
                                                                               
IRQ Level 1 Points to: $00F8D4C6 Content: 33FC0444 00DFF180 4E7348E7 FFFE0680
IRQ Level 2 Points to: $00F8D4C6 Content: 33FC0444 00DFF180 4E7348E7 FFFE0680
IRQ Level 3 Points to: $00F8D4C6 Content: 33FC0444 00DFF180 4E7348E7 FFFE0680
IRQ Level 4 Points to: $00F8D4C6 Content: 33FC0444 00DFF180 4E7348E7 FFFE0680
IRQ Level 5 Points to: $00F8D4C6 Content: 33FC0444 00DFF180 4E7348E7 FFFE0680
IRQ Level 6 Points to: $00F8D4C6 Content: 33FC0444 00DFF180 4E7348E7 FFFE0680
IRQ Level 7 Points to: $00F8D4C6 Content: 33FC0444 00DFF180 4E7348E7 FFFE0680
                                                                             
Is $1114 readable at addr $0 (ROM still at $0): NO
Is $1114 readable at addr $f80000 (Real ROM addr): YES
                                                     
CPU: 68000  FPU: NONE  MMU: NOT CHECKED
Poweronflags: 00000000000000000000000000000000
                                             
                        Press any key/mouse to continue

Thanks!
« Last Edit: December 11, 2024, 10:32:20 AM by screwtop »
 

Offline screwtopTopic starter

Re: A500 green screen, bad CIA, DiagROM address errors
« Reply #2 on: December 11, 2024, 11:59:36 AM »
Some more thoughts and findings...

If I limit DiagROM's memory test range to $0-$E283, I see no errors even after dozens of passes. If I extend the range to $0-$E28F, there are address errors within a few passes. The range $0-$E28B seems marginal, sometimes passing quite a few times before showing errors. If I go much beyond that (say $0-E3FF), DiagROM fairly promptly crashes (outputting unexpected text or binary on the serial port).

When address errors are detected, address bits 10-12 are always indicated (though DiagROM acknowledges that these are only estimates). I also often see bits 5, 7 and 8, and occasionally 2, 3, 4 and 6. I don't believe I've seen any other bits involved.

Once errors happen, things seem to stay bad: if I go back and test a previously "good" memory range, it will now show errors. Something seems to get stuck somehow, and a cold boot restores things.

I wondered about the mapping between the CPU address lines and the DRAM addressing controlled by Agnus, thinking that that might help identify underlying hardware problems. The Agnus specification document is quite helpful here:

Quote
The device generates RAM address from two sources, the processor or from the device performing DMA cycles, selected by a multiplexer. This multiplexer allows the processor to access RAM when AS* and RAMEN* are both low. At this time, the device also multiplexes the processor address (A1-A18) onto the MA bus. The device places A1 to A8 & A17 on the MA0 to MA9 outputs, respectively, during the row address time and places A9 to A16 & A18 on the MA0 to MA9, respectively, during the column address time. The A19 line is used by the IC to determine which RAS line is to be asserted. If A19 is low, RAS0* is enabled, and if high, RAS1* is enabled. The device also senses the LDS* and UDS* inputs to determine which CAS to drop. If LDS* is low, the IC will drop CALS* and if UDS* is low, CASU* is dropped.
https://retro-commodore.eu/files/downloads/amigamanuals-xiik.net/Hardware/Specifications%20Agnus%20-%20Manual-ENG%20.pdf

Note that MA is named DRA on other schematics, and presumably the references to MA9 in that excerpt are mistakes (there are 9 lines in total, but numbered MA0..M8). The version of Agnus described uses DRA0..DRA8 for 9 bits of row/column addressing, for (2^9)^2 = 262,144 total words (512 kiB).

So, during row address times, the mapping is:
   A1   DRA0
   A2   DRA1
   A3   DRA2
   A4   DRA3
   A5   DRA4
   A6   DRA5
   A7   DRA6
   A8   DRA7
   A17   DRA8

...and during column address times:
   A9   DRA0
   A10   DRA1
   A11   DRA2
   A12   DRA3
   A13   DRA4
   A14   DRA5
   A15   DRA6
   A16   DRA7
   A18   DRA8

Combining this with the pattern of bad address bits doesn't reveal any magical pattern to me, but it does at least show that the most commonly (apparently) flaky address bits are associated with DRA1-3 during column accesses.

I also found it interesting that A8-A13 are associated with the CIA chips, and one of these tested bad. A8-11 connect directly to both CIAs, and A12 and A13 are involved in the chip select logic.

Helpful? I really don't know.  :)
 

Offline screwtopTopic starter

[Solved] Re: A500 green screen, bad CIA, DiagROM address errors
« Reply #3 on: December 29, 2024, 11:35:04 PM »
It's fixed! It was a bad Panasonic MN41C4256-08 DRAM chip, and I was able to transplant some compatible chips from an A501 (Motorola MCM514256AP80 parts).

What was really helpful was realising that with a trapdoor RAM expansion installed, you can run DiagROM from Fast RAM. Hold the left mouse button at power-on and release it before DiagROM checks for stuck buttons (and note that you have to use the serial port in this mode - no video output). In this mode I didn't see any of the AddressError/IllegalInstruction exceptions, and, oddly, the Chip RAM actually tested OK (though it was definitely bad as I confirmed later). It was also good to confirm that the expansion RAM was likely OK, as I planned to use that to replace the onboard RAM.

DiagROM's memory editor also gave valuable insights into what was going wrong: since DiagROM writes 32-bit address values corresponding to the memory locations when it starts up, you can spot discrepancies due to addressing errors. I saw that addresses of the form $nnnnE100-$nnnnFFFF had bad values in bits 8-11 (numbering from 0 at the right). They were often zeroes but also other values that implicated all bits in that nybble. That is, there was a misbehaving region of memory in the final 7936 bytes of every 64 kiB block.

Desoldering all those pins was stressful and time-consuming, even with a desoldering tool and a hot air station. I wished I had a set of DIP soldering tweezers like I saw recently on the Mend It Mark channel on YouTube! But once I'd put in some nice machined-pin sockets, it was easy to swap chip and identify the culprit, and then confirm that the donor chips were all good and the system would boot Kickstart.

I posted some more details over at Lemon Amiga.
« Last Edit: December 29, 2024, 11:43:22 PM by screwtop »