My question is, how is the receiving end supposed to determine where the file ends and garbage starts?
I always used Term over NComm, and still use it often for embedded software development from the Amiga. But whatever floats your boat

I'm not at all familiar with the Xmodem protocol, but here's a few ideas.
Most protocols that have a fixed or dynamic length packet size but contain variable length data will have some kind of length field in the packet header. It's possibly the case in your situation, as the data you're sending in the packet can contain any random data from 0x00 to 0xFF, so you can't rely on an EOF detection character.
As an example, I just wrote an application to communicate with Afga serial digital cameras. Here's a simplified look at it's protocol:
= 1 byte
= 2 bytes
= 1 to 2048 bytes
= 2 bytes
So if you look at the first few bytes of each Xmodem packet, you'll possibly find something similar. You should see 2 bytes which don't change in value until the very last packet which doesn't contain 1024 bytes.
e.g. the first two bytes might indicate the length of data to follow, so with 1024 (0x400) file bytes in the packet, you might see the first two bytes as 0x00, 0x04 or they may be in the opposite order depending on the word endianess.
The last two bytes might be a 2 byte checksum. Perhaps the INT16 sum of each of the 1024 data bytes in the packet?
To look at the serial data in hex form (not ASCII), it's incredibly useful to use Term 4.8 and put it into hex mode.
Settings menu > Terminal > Emulation tab > Select Hex
No idea if NComm has this feature or not.
To 'sniff' the RS232 communications in a working transfer, just connect ground and RXD only of the Amiga serial port to TXD of whatever is sending the Xmodem data.
On the 25 pin port, pin 3 = RXD, 7 = GND
On 9-way ports, pin 2 = RXD, 5 = GND
Should be easy enough to work out.