tmbsprint: improve printing output when it has invalid UTF data - sacc - sacc (saccomys): simple gopher client.
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) LICENSE
       ---
 (DIR) commit edab539b23594219bbfc83729822da917a18a243
 (DIR) parent c416c8c73d0a33eb8c428b1a9b9eaaffc098ee5b
 (HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org>
       Date:   Tue,  5 Jan 2021 21:21:03 +0100
       
       mbsprint: improve printing output when it has invalid UTF data
       
       Reset the decode state when mbtowc returns -1. The OpenBSD mbtowc(3)
       man page says: "If a call to mbtowc() resulted in an undefined internal
       state, mbtowc() must be called with s set to NULL to reset the internal
       state before it can safely be used again."
       
       Print the UTF replacement character (codepoint 0xfffd) for the invalid
       codepoint or incomplete sequence and continue printing the line
       (instead of stopping).
       
       Remove the 0 return code as it can't happen because we're already
       checking the string length in the loop.
       
       Diffstat:
         M sacc.c                              |      12 +++++++++---
       
       1 file changed, 9 insertions(+), 3 deletions(-)
       ---
 (DIR) diff --git a/sacc.c b/sacc.c
       t@@ -110,12 +110,18 @@ mbsprint(const char *s, size_t len)
        
                slen = strlen(s);
                for (i = 0; i < slen; i += rl) {
       -                if ((rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4)) <= 0)
       -                        break;
       +                rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4);
       +                if (rl == -1) {
       +                        mbtowc(NULL, NULL, 0); /* reset state */
       +                        fputs("\xef\xbf\xbd", stdout); /* replacement character */
       +                        col++;
       +                        rl = 1;
       +                        continue;
       +                }
                        if ((w = wcwidth(wc)) == -1)
                                continue;
                        if (col + w > len || (col + w == len && s[i + rl])) {
       -                        fputs("\xe2\x80\xa6", stdout);
       +                        fputs("\xe2\x80\xa6", stdout); /* ellipsis */
                                col++;
                                break;
                        }