Subj : CharacterSet Translation To : All From : Scott Street Date : Fri Feb 11 2022 01:18 pm Hello Everyone! After much tinkering, I've been unable to get translations to be 100%. The biggest issue being CP866 -> UTF8. It seems that I can't get Golded+ to really do translation. The bits from my golded.cfg [which I've tried on Linux and macOS] -paste- XLATPATH /fido/etc/golded/ XLATLOCALSET UTF-8 XLATCHARSETALIAS UTF-8 UTF8 XLATCHARSET CP1125 UTF-8 1125_u8.chs XLATCHARSET CP437 UTF-8 437_u8.chs XLATCHARSET CP850 UTF-8 850_u8.chs XLATCHARSET CP865 UTF-8 865_u8.chs XLATCHARSET CP866 UTF-8 866_u8.chs XLATCHARSET LATIN-1 UTF-8 iso1_u8.chs XLATCHARSET KOI8-R UTF-8 koi8_u8.chs -end- I thought it was just the messages, so I wrote a PHP library to read JAM files and translate the message body text to UTF8 and then output that to the terminal [the same terminal I use for Golded+, etc etc]. So my terminal (Apple's macOSX Terminal.app) does indeed display characters correctly, it just seems I can't get GoldEd+ to do it as well. PHP code bits for reference: -paste- $xlated = mb_convert_encoding($line, "UTF-8", $msg_encoding); -end- $xlated is the body line string after mb_convert_encoding() takes the raw bytes ( $line ) and converts them based on $msg_encoding, which is the message's CHRS (or CHRSET) value, which was translated earlier to a PHP native character set. See https://www.php.net/manual/en/function.mb-convert-encoding.php for more info on the PHP function. In addition: I'm using the included translation files, the most troubling display is from users with CP866 character sets. -file 866_u8.chs- ; ; This file is a charset conversion module in text form. ; ; Source file: ; http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP866.TXT ; 100000 ; ID number (when >65535, all 255 chars will be translated) 0 ; version number ; 2 ; level number ; CP866 UTF-8 ; \0 \0 ; NULL \0 \d1 ; START OF HEADING \0 \d2 ; START OF TEXT \0 \d3 ; END OF TEXT \0 \d4 ; END OF TRANSMISSION \0 \d5 ; ENQUIRY \0 \d6 ; ACKNOWLEDGE \0 \d7 ; BELL \0 \d8 ; BACKSPACE \0 \d9 ; HORIZONTAL TABULATION \0 \d10 ; LINE FEED \0 \d11 ; VERTICAL TABULATION \0 \d12 ; FORM FEED \0 \d13 ; CARRIAGE RETURN \0 \d14 ; SHIFT OUT \0 \d15 ; SHIFT IN \0 \d16 ; DATA LINK ESCAPE \0 \d17 ; DEVICE CONTROL ONE \0 \d18 ; DEVICE CONTROL TWO \0 \d19 ; DEVICE CONTROL THREE \0 \d20 ; DEVICE CONTROL FOUR \0 \d21 ; NEGATIVE ACKNOWLEDGE \0 \d22 ; SYNCHRONOUS IDLE \0 \d23 ; END OF TRANSMISSION BLOCK \0 \d24 ; CANCEL \0 \d25 ; END OF MEDIUM \0 \d26 ; SUBSTITUTE \0 \d27 ; ESCAPE \0 \d28 ; FILE SEPARATOR \0 \d29 ; GROUP SEPARATOR \0 \d30 ; RECORD SEPARATOR \0 \d31 ; UNIT SEPARATOR \0 \d32 ; SPACE \0 \d33 ; EXCLAMATION MARK \0 \d34 ; QUOTATION MARK \0 \d35 ; NUMBER SIGN \0 \d36 ; DOLLAR SIGN \0 \d37 ; PERCENT SIGN \0 \d38 ; AMPERSAND \0 \d39 ; APOSTROPHE \0 \d40 ; LEFT PARENTHESIS \0 \d41 ; RIGHT PARENTHESIS \0 \d42 ; ASTERISK \0 \d43 ; PLUS SIGN \0 \d44 ; COMMA \0 \d45 ; HYPHEN-MINUS \0 \d46 ; FULL STOP \0 \d47 ; SOLIDUS \0 \d48 ; DIGIT ZERO \0 \d49 ; DIGIT ONE \0 \d50 ; DIGIT TWO \0 \d51 ; DIGIT THREE \0 \d52 ; DIGIT FOUR \0 \d53 ; DIGIT FIVE \0 \d54 ; DIGIT SIX \0 \d55 ; DIGIT SEVEN \0 \d56 ; DIGIT EIGHT \0 \d57 ; DIGIT NINE \0 \d58 ; COLON \0 \d59 ; SEMICOLON \0 \d60 ; LESS-THAN SIGN \0 \d61 ; EQUALS SIGN \0 \d62 ; GREATER-THAN SIGN \0 \d63 ; QUESTION MARK \0 \d64 ; COMMERCIAL AT \0 \d65 ; LATIN CAPITAL LETTER A \0 \d66 ; LATIN CAPITAL LETTER B \0 \d67 ; LATIN CAPITAL LETTER C \0 \d68 ; LATIN CAPITAL LETTER D \0 \d69 ; LATIN CAPITAL LETTER E \0 \d70 ; LATIN CAPITAL LETTER F \0 \d71 ; LATIN CAPITAL LETTER G \0 \d72 ; LATIN CAPITAL LETTER H \0 \d73 ; LATIN CAPITAL LETTER I \0 \d74 ; LATIN CAPITAL LETTER J \0 \d75 ; LATIN CAPITAL LETTER K \0 \d76 ; LATIN CAPITAL LETTER L \0 \d77 ; LATIN CAPITAL LETTER M \0 \d78 ; LATIN CAPITAL LETTER N \0 \d79 ; LATIN CAPITAL LETTER O \0 \d80 ; LATIN CAPITAL LETTER P \0 \d81 ; LATIN CAPITAL LETTER Q \0 \d82 ; LATIN CAPITAL LETTER R \0 \d83 ; LATIN CAPITAL LETTER S \0 \d84 ; LATIN CAPITAL LETTER T \0 \d85 ; LATIN CAPITAL LETTER U \0 \d86 ; LATIN CAPITAL LETTER V \0 \d87 ; LATIN CAPITAL LETTER W \0 \d88 ; LATIN CAPITAL LETTER X \0 \d89 ; LATIN CAPITAL LETTER Y \0 \d90 ; LATIN CAPITAL LETTER Z \0 \d91 ; LEFT SQUARE BRACKET \0 \d92 ; REVERSE SOLIDUS \0 \d93 ; RIGHT SQUARE BRACKET \0 \d94 ; CIRCUMFLEX ACCENT \0 \d95 ; LOW LINE \0 \d96 ; GRAVE ACCENT \0 \d97 ; LATIN SMALL LETTER A \0 \d98 ; LATIN SMALL LETTER B \0 \d99 ; LATIN SMALL LETTER C \0 \d100 ; LATIN SMALL LETTER D \0 \d101 ; LATIN SMALL LETTER E \0 \d102 ; LATIN SMALL LETTER F \0 \d103 ; LATIN SMALL LETTER G \0 \d104 ; LATIN SMALL LETTER H \0 \d105 ; LATIN SMALL LETTER I \0 \d106 ; LATIN SMALL LETTER J \0 \d107 ; LATIN SMALL LETTER K \0 \d108 ; LATIN SMALL LETTER L \0 \d109 ; LATIN SMALL LETTER M \0 \d110 ; LATIN SMALL LETTER N \0 \d111 ; LATIN SMALL LETTER O \0 \d112 ; LATIN SMALL LETTER P \0 \d113 ; LATIN SMALL LETTER Q \0 \d114 ; LATIN SMALL LETTER R \0 \d115 ; LATIN SMALL LETTER S \0 \d116 ; LATIN SMALL LETTER T \0 \d117 ; LATIN SMALL LETTER U \0 \d118 ; LATIN SMALL LETTER V \0 \d119 ; LATIN SMALL LETTER W \0 \d120 ; LATIN SMALL LETTER X \0 \d121 ; LATIN SMALL LETTER Y \0 \d122 ; LATIN SMALL LETTER Z \0 \d123 ; LEFT CURLY BRACKET \0 \d124 ; VERTICAL LINE \0 \d125 ; RIGHT CURLY BRACKET \0 \d126 ; TILDE \0 \d127 ; DELETE \d208 \d144 ; CYRILLIC CAPITAL LETTER A \d208 \d145 ; CYRILLIC CAPITAL LETTER BE \d208 \d146 ; CYRILLIC CAPITAL LETTER VE \d208 \d147 ; CYRILLIC CAPITAL LETTER GHE \d208 \d148 ; CYRILLIC CAPITAL LETTER DE \d208 \d149 ; CYRILLIC CAPITAL LETTER IE \d208 \d150 ; CYRILLIC CAPITAL LETTER ZHE \d208 \d151 ; CYRILLIC CAPITAL LETTER ZE \d208 \d152 ; CYRILLIC CAPITAL LETTER I \d208 \d153 ; CYRILLIC CAPITAL LETTER SHORT I \d208 \d154 ; CYRILLIC CAPITAL LETTER KA \d208 \d155 ; CYRILLIC CAPITAL LETTER EL \d208 \d156 ; CYRILLIC CAPITAL LETTER EM \d208 \d157 ; CYRILLIC CAPITAL LETTER EN \d208 \d158 ; CYRILLIC CAPITAL LETTER O \d208 \d159 ; CYRILLIC CAPITAL LETTER PE \d208 \d160 ; CYRILLIC CAPITAL LETTER ER \d208 \d161 ; CYRILLIC CAPITAL LETTER ES \d208 \d162 ; CYRILLIC CAPITAL LETTER TE \d208 \d163 ; CYRILLIC CAPITAL LETTER U \d208 \d164 ; CYRILLIC CAPITAL LETTER EF \d208 \d165 ; CYRILLIC CAPITAL LETTER HA \d208 \d166 ; CYRILLIC CAPITAL LETTER TSE \d208 \d167 ; CYRILLIC CAPITAL LETTER CHE \d208 \d168 ; CYRILLIC CAPITAL LETTER SHA \d208 \d169 ; CYRILLIC CAPITAL LETTER SHCHA \d208 \d170 ; CYRILLIC CAPITAL LETTER HARD SIGN \d208 \d171 ; CYRILLIC CAPITAL LETTER YERU \d208 \d172 ; CYRILLIC CAPITAL LETTER SOFT SIGN \d208 \d173 ; CYRILLIC CAPITAL LETTER E \d208 \d174 ; CYRILLIC CAPITAL LETTER YU \d208 \d175 ; CYRILLIC CAPITAL LETTER YA \d208 \d176 ; CYRILLIC SMALL LETTER A \d208 \d177 ; CYRILLIC SMALL LETTER BE \d208 \d178 ; CYRILLIC SMALL LETTER VE \d208 \d179 ; CYRILLIC SMALL LETTER GHE \d208 \d180 ; CYRILLIC SMALL LETTER DE \d208 \d181 ; CYRILLIC SMALL LETTER IE \d208 \d182 ; CYRILLIC SMALL LETTER ZHE \d208 \d183 ; CYRILLIC SMALL LETTER ZE \d208 \d184 ; CYRILLIC SMALL LETTER I \d208 \d185 ; CYRILLIC SMALL LETTER SHORT I \d208 \d186 ; CYRILLIC SMALL LETTER KA \d208 \d187 ; CYRILLIC SMALL LETTER EL \d208 \d188 ; CYRILLIC SMALL LETTER EM \d208 \d189 ; CYRILLIC SMALL LETTER EN \d208 \d190 ; CYRILLIC SMALL LETTER O \d208 \d191 ; CYRILLIC SMALL LETTER PE \d226 \d150 \d145 ; LIGHT SHADE \d226 \d150 \d146 ; MEDIUM SHADE \d226 \d150 \d147 ; DARK SHADE \d226 \d148 \d130 ; BOX DRAWINGS LIGHT VERTICAL \d226 \d148 \d164 ; BOX DRAWINGS LIGHT VERTICAL AND LEFT \d226 \d149 \d161 ; BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE \d226 \d149 \d162 ; BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE \d226 \d149 \d150 ; BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE \d226 \d149 \d149 ; BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE \d226 \d149 \d163 ; BOX DRAWINGS DOUBLE VERTICAL AND LEFT \d226 \d149 \d145 ; BOX DRAWINGS DOUBLE VERTICAL \d226 \d149 \d151 ; BOX DRAWINGS DOUBLE DOWN AND LEFT \d226 \d149 \d157 ; BOX DRAWINGS DOUBLE UP AND LEFT \d226 \d149 \d144 ; BOX DRAWINGS DOUBLE HORIZONTAL \d226 \d148 \d148 ; BOX DRAWINGS LIGHT UP AND RIGHT \d226 \d148 \d180 ; BOX DRAWINGS LIGHT UP AND HORIZONTAL \d226 \d148 \d172 ; BOX DRAWINGS LIGHT DOWN AND HORIZONTAL \d226 \d148 \d156 ; BOX DRAWINGS LIGHT VERTICAL AND RIGHT \d226 \d148 \d128 ; BOX DRAWINGS LIGHT HORIZONTAL \d226 \d148 \d188 ; BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL \d226 \d149 \d158 ; BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE \d226 \d149 \d159 ; BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE \d226 \d149 \d154 ; BOX DRAWINGS DOUBLE UP AND RIGHT \d226 \d149 \d148 ; BOX DRAWINGS DOUBLE DOWN AND RIGHT \d226 \d149 \d169 ; BOX DRAWINGS DOUBLE UP AND HORIZONTAL \d226 \d149 \d166 ; BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL \d226 \d149 \d160 ; BOX DRAWINGS DOUBLE VERTICAL AND RIGHT \d226 \d149 \d144 ; BOX DRAWINGS DOUBLE HORIZONTAL \d226 \d149 \d172 ; BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL \d226 \d149 \d167 ; BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE \d226 \d149 \d168 ; BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE \d226 \d149 \d164 ; BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE \d226 \d149 \d165 ; BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE \d226 \d149 \d153 ; BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE \d226 \d149 \d152 ; BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE \d226 \d149 \d146 ; BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE \d226 \d149 \d147 ; BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE \d226 \d149 \d171 ; BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE \d226 \d149 \d170 ; BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE \d226 \d148 \d152 ; BOX DRAWINGS LIGHT UP AND LEFT \d226 \d148 \d140 ; BOX DRAWINGS LIGHT DOWN AND RIGHT \d226 \d150 \d136 ; FULL BLOCK \d226 \d150 \d132 ; LOWER HALF BLOCK \d226 \d150 \d140 ; LEFT HALF BLOCK \d226 \d150 \d144 ; RIGHT HALF BLOCK \d226 \d150 \d128 ; UPPER HALF BLOCK \d209 \d128 ; CYRILLIC SMALL LETTER ER \d209 \d129 ; CYRILLIC SMALL LETTER ES \d209 \d130 ; CYRILLIC SMALL LETTER TE \d209 \d131 ; CYRILLIC SMALL LETTER U \d209 \d132 ; CYRILLIC SMALL LETTER EF \d209 \d133 ; CYRILLIC SMALL LETTER HA \d209 \d134 ; CYRILLIC SMALL LETTER TSE \d209 \d135 ; CYRILLIC SMALL LETTER CHE \d209 \d136 ; CYRILLIC SMALL LETTER SHA \d209 \d137 ; CYRILLIC SMALL LETTER SHCHA \d209 \d138 ; CYRILLIC SMALL LETTER HARD SIGN \d209 \d139 ; CYRILLIC SMALL LETTER YERU \d209 \d140 ; CYRILLIC SMALL LETTER SOFT SIGN \d209 \d141 ; CYRILLIC SMALL LETTER E \d209 \d142 ; CYRILLIC SMALL LETTER YU \d209 \d143 ; CYRILLIC SMALL LETTER YA \d208 \d129 ; CYRILLIC CAPITAL LETTER IO \d209 \d145 ; CYRILLIC SMALL LETTER IO \d208 \d132 ; CYRILLIC CAPITAL LETTER UKRAINIAN IE \d209 \d148 ; CYRILLIC SMALL LETTER UKRAINIAN IE \d208 \d135 ; CYRILLIC CAPITAL LETTER YI \d209 \d151 ; CYRILLIC SMALL LETTER YI \d208 \d142 ; CYRILLIC CAPITAL LETTER SHORT U \d209 \d158 ; CYRILLIC SMALL LETTER SHORT U \d194 \d176 ; DEGREE SIGN \d226 \d136 \d153 ; BULLET OPERATOR \d194 \d183 ; MIDDLE DOT \d226 \d136 \d154 ; SQUARE ROOT \d226 \d132 \d150 ; NUMERO SIGN \d194 \d164 ; CURRENCY SIGN \d226 \d150 \d160 ; BLACK SQUARE \d194 \d160 ; NO-BREAK SPACE END -file end- My primary example message is from FIDONEWS, MsgID "2:5030/1081.117 61f6e5cd" My PHP script correctly converts the CP866 characters into UTF-8; but Golded+ just makes a mess of it. The tagline of the message translates to "- And you would do art. Poetry, right?" and the origin: (loosly) "I advise you to rub with ant alcohol" Which appears to be posted by a version of GoldEd running on Windows-32bit - So I have to believe proper character translation can be done! Sorry for the fairly large post; just tried to give as much information as possible in one shot. Any help is greatly appreciated! Scott --- * Origin: -={ The Digital Post }=- (1:266/420.1) .