Author Topic: Handling wchar.h, utf-8 and so on (Read 8504 times)

nyteschayde · « **on:** September 11, 2019, 07:01:37 AM »

So I have a few programming environments setup, include gcc 2.95.3, sas/c 6.58, Amiga E and more. Trying to figure out a way to port programs that rely on wide character support or UTF character support. What is usually done when program to be ported relies on these things? Any thoughts, experience? Point a girl in the right direction?

Rotzloeffel · « **Reply #1 on:** September 11, 2019, 12:33:06 PM »

I am not a Programmer, but normally codesets.library should be your friend

http://aminet.net/package/util/libs/codesets-6.21

Joloo · « **Reply #2 on:** September 14, 2019, 08:30:54 AM »

Quote

So I have a few programming environments setup, include gcc 2.95.3, sas/c 6.58, Amiga E and more. Trying to figure out a way to port programs that rely on wide character support or UTF character support.

Normally, wide char support is only available using newer gcc versions; gcc 3.x.x+ and then only in conjunction with clib2; I don't know whether gcc 2.95.x is supported by clib2 or not.
With UTF you mean...? UTF-7, UTF-8, or UTF-16 and/or UTF-32 (for LE/BE machines), or all of them?

Quote

What is usually done when program to be ported relies on these things? Any thoughts, experience?

Depends on...
If the encoding scheme is only used for disk operations, you can drop the entire Unicode support and use the Standard C Library function equivalents for this instead.

If the intern representation of characters rely on a multi-byte encoding scheme, you have to support it natively, be it now UTF-16 or UTF-8.
It doesn't help much, like Rotzloeffel suggested, to use Codesets library; it is just a converter between the different formats and mainly used to transcode the multi-byte character set to a single-byte ISO encoding, because the intern representation for character codes, like we use it for AmigaOS, is bound to single-byte character sets and we only have functions available, which are offered by e.g. the Standard C Library - and that one supports only single-byte encodings (strictly speaking, not even these but purely ASCII). What in contrast can help you out for UTF-16 is clib2. It offers functions that deal with this multi-byte encoding scheme.

I have to confess that when I port software from Windows/Linux to AmigaOS, it is already my own software, where I am using UTF-8 internally. But I am biased, because I've written a lot UTF-8 stuff and I don't like UTF-16, because it is damn slow in case one doesn't limit her/himself to Unicode Standard 3. Today, we're using Unicode Standard 12.1, and then each 16 bit code unit must be investigated whether or not a Surrogate Code Point is encountered - and that is a time consuming process.

If I have to deal with UTF-16 strings, I am transcoding these in first place to UTF-8 and then apply the necessary operations (replacing texts and so on). But as I already stated, I am biased.

Multi-byte encoding: Range from 0x000000 to 0x10FFFF (21 bits) with ~135000 characters or control codes.
Single-byte encoding: Range from 0x00 to 0xFF (8 bits) corresponding to 255 characters or control codes.
ASCII: Range from 0x00 to 0x7F (7 bits) corresponding to 128 characters or control codes.

If one now transcodes a multi-byte sequences into a single-byte, it must be clear that one runs danger of truncating (invaliding) strings if the multi-byte sequences cannot be mapped into the single-byte - and that happens quite often, if you port from an Unicode aware system (Linux/BSD/Windows) to a non aware system (AmigaOS 1 to 3).

If you are really interested in porting software using the "Universal multiple-octet coded character set Transformation Format" (UTF) you should treat it as learning exercise, which will demand much time. It isn't achieved with no effort unless the strings will be exclusively used for disk operations, like f.e. "wfopen() (std) / _wfopen_s() (Windows)".

Author Topic: Handling wchar.h, utf-8 and so on (Read 8504 times)

nyteschayde

Handling wchar.h, utf-8 and so on

Rotzloeffel

Re: Handling wchar.h, utf-8 and so on

Joloo

Re: Handling wchar.h, utf-8 and so on