tmbsprint: improve printing output when it has invalid UTF data - sacc - sacc (saccomys): simple gopher client. (DIR) Log (DIR) Files (DIR) Refs (DIR) LICENSE --- (DIR) commit edab539b23594219bbfc83729822da917a18a243 (DIR) parent c416c8c73d0a33eb8c428b1a9b9eaaffc098ee5b (HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 5 Jan 2021 21:21:03 +0100 mbsprint: improve printing output when it has invalid UTF data Reset the decode state when mbtowc returns -1. The OpenBSD mbtowc(3) man page says: "If a call to mbtowc() resulted in an undefined internal state, mbtowc() must be called with s set to NULL to reset the internal state before it can safely be used again." Print the UTF replacement character (codepoint 0xfffd) for the invalid codepoint or incomplete sequence and continue printing the line (instead of stopping). Remove the 0 return code as it can't happen because we're already checking the string length in the loop. Diffstat: M sacc.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) --- (DIR) diff --git a/sacc.c b/sacc.c t@@ -110,12 +110,18 @@ mbsprint(const char *s, size_t len) slen = strlen(s); for (i = 0; i < slen; i += rl) { - if ((rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4)) <= 0) - break; + rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4); + if (rl == -1) { + mbtowc(NULL, NULL, 0); /* reset state */ + fputs("\xef\xbf\xbd", stdout); /* replacement character */ + col++; + rl = 1; + continue; + } if ((w = wcwidth(wc)) == -1) continue; if (col + w > len || (col + w == len && s[i + rl])) { - fputs("\xe2\x80\xa6", stdout); + fputs("\xe2\x80\xa6", stdout); /* ellipsis */ col++; break; }