scottinNH Posted October 10, 2022 Share Posted October 10, 2022 (edited) This helps me use international characters (I don't have to lookup hex anymore). What would you change anyways, what I got: /* Atari 8-bit International characters Documents the International Characters found in the standard US-Western Europe XL/XE ROM. Reference: https://en.wikipedia.org/w/index.php?title=ATASCII&oldid=1035357981#International_Character_Set Further international support exists in other Atari Roms (for example: Arab) Reference: https://web.archive.org/web/20161025170529/http://joyfulcoder.net/atari/atascii/ "Hello World" language source: https://codegolf.stackexchange.com/questions/146544/hello-world-in-multiple-languages TRY: French: Bonjour monde! German: Hallo Welt! Portuguese: Olá Mundo! Spanish: ¡Hola Mundo! */ /* NOTE: UMLAUT and DIAERESIS have same appearance. This is not Unicode so exact name does not matter. Which name chosen for the #defines is intentionally aligned with Wikipedia. Note you can directly view those pages by pasting "literal character" to end of base URL: https://en.wikipedia.org/wiki/ Example: https://en.wikipedia.org/wiki/ò */ // define (NAME) (decimal value) // (hex value) (literal character) (keystroke) #define LOWER_A_ACUTE 00 // x00 á CTRL+, #define LOWER_U_GRAVE 01 // x01 ù CTRL+A #define UPPER_N_TILDE 02 // x02 Ñ CTRL+B #define UPPER_E_ACUTE 03 // x03 É CTRL+C #define LOWER_C_CEDILLA 04 // x04 ç CTRL+D #define LOWER_O_CIRCUMFLEX 05 // x05 ô CTRL+E #define LOWER_O_GRAVE 06 // x06 ò CTRL+F #define LOWER_I_GRAVE 07 // x07 ì CTRL+G #define POUND_SIGN 08 // x08 £ CTRL+H #define LOWER_I_DIAERESIS 09 // x09 ï CTRL+I #define LOWER_U_DIAERESIS 10 // x0A ü CTRL+J #define LOWER_A_DIAERESIS 11 // x0B ä CTRL+K #define UPPER_O_DIAERESIS 12 // x0C Ö CTRL+L #define LOWER_U_ACUTE 13 // x0D ú CTRL+M #define LOWER_O_ACUTE 14 // x0E ó CTRL+N #define LOWER_O_DIAERESIS 15 // x0F ö CTRL+O #define UPPER_U_UMLAUT 16 // x10 Ü CTRL+P #define LOWER_A_CIRCUMFLEX 17 // x11 â CTRL+Q #define LOWER_U_CIRCUMFLEX 18 // x12 û CTRL+R #define LOWER_I_CIRCUMFLEX 19 // x13 î CTRL+S #define LOWER_E_ACUTE 20 // x14 é CTRL+T #define LOWER_E_GRAVE 21 // x15 è CTRL+U #define LOWER_N_TILDE 22 // x16 ñ CTRL+V #define LOWER_E_CIRCUMFLEX 23 // x17 ê CTRL+W #define LOWER_A_OVERRING 24 // x18 å CTRL+X #define LOWER_A_GRAVE 25 // x19 à CTRL+Y #define UPPER_A_OVERRING 26 // x20 Å CTRL+Z #define INVERTED_EXCLAMATION 96 // x60 ¡ CTRL+. #define UPPER_A_DIAERESIS 123 // x7b Ä CTRL+: example caller: #include <stdio.h> #include "int_char_set.h" // TODO: This define logic isn't perfect; it won't falback or default to ny language not "defined" during build" int main(void) { char pause; // POKE 756,204 enables the built-in international character set. *(unsigned char*)(756) = 204; #ifdef FRENCH printf("Bonjour monde!\n"); #endif #ifdef GERMAN printf("Hallo Welt!\n"); #endif #ifdef PORTUGUESE printf("Ol%c Mundo!\n", LOWER_A_ACUTE); // Olá Mundo! #endif #ifdef SPANISH printf("%cHola Mundo!\n", INVERTED_EXCLAMATION); #endif #ifdef ENGLISH printf("Hello World!\n"); #endif scanf("%c", &pause); return 0; } What I do not like about this is it still relies on `printf()` simply to join an international character with string text... see next post: Edited October 10, 2022 by scottinNH clarity, move question to next post 1 Quote Link to comment Share on other sites More sharing options...
scottinNH Posted October 10, 2022 Author Share Posted October 10, 2022 (edited) Here is a synthetic (not working) code, an example of what I want to do with this: #ifdef PORTUGUESE printf_special("Ol{LOWER_A_ACUTE} Mundo!\n", ); // Olá Mundo! #endif I am seeking a way to be able to directly print international text -- without having to pass the international character as an argument to printf(). This would feel more natural, even if in the above (fake) example I have to escape the special character somehow. At least it would mean you could prepare the text in advance using sed/awk replacement of "á" with a define. If someone understands my goal, I could be overlooking a simpler approach. Mainly I am trying to avoid use of `printf()` as so many have suggested. Cheers. I was looking into how printf() works because I wanted to steal the escaping stuff, so that maybe `\LOWER_A_ACUTE` could be an escape sequence. Maybe that is the right approach (?) but I got pretty deep into the weeds tracking all the sub-functions of printf(). Edited October 10, 2022 by scottinNH Quote Link to comment Share on other sites More sharing options...
ivop Posted October 10, 2022 Share Posted October 10, 2022 (edited) 11 hours ago, scottinNH said: #define UPPER_U_UMLAUT 16 // x10 Ü CTRL+P This one is inconsistent. Umlaut is German. You used diaeresis for the others. Edit: sorry, I missed you note about this in the comments :)) This still stands though: Or perhaps support both spellings. Edited October 10, 2022 by ivop 1 Quote Link to comment Share on other sites More sharing options...
ivop Posted October 10, 2022 Share Posted October 10, 2022 (edited) If you define the characters as strings, you can use simple C string concatenation. Like: printf("Ol" LOWER_A_ACUTE " Mundo!\n"); Note however how some of them clash with ASCII characters, like \t and \n (tab and newline). Or End Of String (\0x00). Edited October 10, 2022 by ivop 1 Quote Link to comment Share on other sites More sharing options...
scottinNH Posted October 10, 2022 Author Share Posted October 10, 2022 5 hours ago, ivop said: If you define the characters as strings, you can use simple C string concatenation. Like: printf("Ol" LOWER_A_ACUTE " Mundo!\n"); Note however how some of them clash with ASCII characters, like \t and \n (tab and newline). Or End Of String (\0x00). Yup, another way, thanks. I think "I" am happy to use printf ...but a lot of folks actively seek to avoid printf to conserve memory. So I am trying to adhere to their goal, and come up with a small light framework others could use, help improve, or maybe clean up and submit to CC65 as a PR. Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 10, 2022 Share Posted October 10, 2022 Hi! 17 hours ago, scottinNH said: This helps me use international characters (I don't have to lookup hex anymore). What I do not like about this is it still relies on `printf()` simply to join an international character with string text... see next post: There is a much better way in CC65 to make character code translations, see this example: Compiling with: cl65 -tatari -o example.xex example.c Gives this: The default charmap when compiling for the Atari is here: https://github.com/cc65/cc65/blob/master/include/atari_atascii_charmap.h , it only makes four changes: 07 -> FD (BELL) 09 -> 7F (TAB) 0A -> 9B (EOL) 0C -> 7D (FF, changed to clear-screen) Note that for the accented characters translation to work, you must write your source code in an 8-bit text code, most modern editors use UTF-8 that uses more than one byte for the latin-1 chatacters. Have Fun! example.zip 1 Quote Link to comment Share on other sites More sharing options...
scottinNH Posted October 11, 2022 Author Share Posted October 11, 2022 5 hours ago, dmsc said: Hi! There is a much better way in CC65 to make character code translations, see this example: Thank you so much! From another thread, had gotten the wrong impression you needed to use low-level putchar() (in order to get around some unwanted character translation issue). Therefore I was working to create something easier to use, but we already have it. Awesome! Now I can scale my effort back: simply create a multi-language example to contribute as documentation. Cheers. Quote Link to comment Share on other sites More sharing options...
ivop Posted October 11, 2022 Share Posted October 11, 2022 (edited) 11 hours ago, scottinNH said: From another thread, had gotten the wrong impression you needed to use low-level putchar() (in order to get around some unwanted character translation issue). Which is what is more or less done here, too. It calls write(), which is raw stdio. Edited October 11, 2022 by ivop 1 Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 11, 2022 Share Posted October 11, 2022 Hi! 56 minutes ago, ivop said: Which is what is more or less done here, too. It calls write(), which is raw stdio. Yes, so this also generates smaller code. Basically the CC65 library does two translations when using stdio functions to provide compatibility with standard 😄 - Standard "FILE*" pointers are mapped to integer unix-like file-descriptors. - File descriptors are mapped to CIO channels. Also, CC65 reuses channels, so the standard input/output and error file-descriptors (0, 1 and 2) are all mapped to I/O channel #0. Have Fun! 2 Quote Link to comment Share on other sites More sharing options...
scottinNH Posted October 20, 2022 Author Share Posted October 20, 2022 On 10/10/2022 at 6:35 PM, dmsc said: Note that for the accented characters translation to work, you must write your source code in an 8-bit text code, most modern editors use UTF-8 that uses more than one byte for the latin-1 chatacters. Hmm OK YUP I just ran into that issue: VS Code works in UTF-8.. 🙂 .. for the thread, FYI you can tell VS Code to put a file into ISO 8859-1 (click on "UTF-8" at the bottom of the window) ..but my string (with tilde n) will "look correct" in VS Code, but print wrong on the Atari. So it's not working or I miss a step. When I open your example.c and compile it with cl65, it displays correctly on atari800 ...but when viewing source in VS Code your string is A�o Nuevo\n The problem makes sense to me... encoding conflict. Solutions however makes less sense (I come from Python and Perl and did not deal with i18n) Do you (anyone here) know how to 8-bit text working properly in VS Code? (on per-file basis, or at least per-project) What are you using for a text editor? I can use something else for this work. Cheers Quote Link to comment Share on other sites More sharing options...
baktra Posted October 20, 2022 Share Posted October 20, 2022 (edited) There are two factors: 1. Character Encoding of your source file 2. Character Encoding that is expected by the cc65 If these two are a mismatch, you can expect weird results. Character encoding of your source file is under your control. Character encoding expected by cc65 is platform-specific and can be different for let us say macOS and Windows. To answer your question about VS. Open the file, click the UTF-8 at the bottom, select Reopen with encoding and then select CP-1252. Edited October 20, 2022 by baktra 1 Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 26, 2022 Share Posted October 26, 2022 Hi! On 10/20/2022 at 1:12 AM, scottinNH said: Hmm OK YUP I just ran into that issue: VS Code works in UTF-8.. 🙂 .. for the thread, FYI you can tell VS Code to put a file into ISO 8859-1 (click on "UTF-8" at the bottom of the window) ..but my string (with tilde n) will "look correct" in VS Code, but print wrong on the Atari. So it's not working or I miss a step. When I open your example.c and compile it with cl65, it displays correctly on atari800 ...but when viewing source in VS Code your string is A�o Nuevo\n The problem makes sense to me... encoding conflict. Solutions however makes less sense (I come from Python and Perl and did not deal with i18n) Do you (anyone here) know how to 8-bit text working properly in VS Code? (on per-file basis, or at least per-project) Let's try using vscode. First, write the code: Now, press on "UTF-8" bellow: Select "Save with Enconding": Select "ISO 8859-1", and save: Now, see if the file has the correct encoding: Yes, the "ñ" is encoded as F1, as needed. This is on Linux (Raspbian, on my RP 400). On 10/20/2022 at 1:12 AM, scottinNH said: What are you using for a text editor? I can use something else for this work. Cheers I normally use VIM, and it automatically detects the encoding (UTF-8 or 8859-1), so it "just works" in this case. Have Fun! 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.