fix unicode glitch in DCS strings, patch by Tim Allen - st - Personal fork of st
 (HTM) git clone git://git.drkhsh.at/st.git
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
 (DIR) commit 818ec746f4caae453d09368b101c3e841cf39870
 (DIR) parent 9ba7ecf7b15ec2986c6142036706aa353b249ef9
 (HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org>
       Date:   Wed, 17 Jun 2020 21:35:39 +0200
       
       fix unicode glitch in DCS strings, patch by Tim Allen
       
       Reported on the mailinglist:
       
       "
       I discovered recently that if an application running inside st tries to
       send a DCS string, subsequent Unicode characters get messed up. For
       example, consider the following test-case:
       
           printf '\303\277\033P\033\\\303\277'
       
       ...where:
       
         - \303\277 is the UTF-8 encoding of U+00FF LATIN SMALL LETTER Y WITH
           DIAERESIS (ÿ).
         - \033P is ESC P, the token that begins a DCS string.
         - \033\\ is ESC \, a token that ends a DCS string.
         - \303\277 is the same ÿ character again.
       
       If I run the above command in a VTE-based terminal, or xterm, or
       QTerminal, or pterm (PuTTY), I get the output:
       
           ÿÿ
       
       ...which is to say, the empty DCS string is ignored. However, if I run
       that command inside st (as of commit 9ba7ecf), I get:
       
           ÿÿ
       
       ...where those last two characters are \303\277 interpreted as ISO8859-1
       characters, instead of UTF-8.
       
       I spent some time tracing through the state machines in st.c, and so far
       as I can tell, this is how it works currently:
       
         - ESC P sets the "ESC_DCS" and "ESC_STR" flags, indicating that
           incoming bytes should be collected into the strescseq buffer, rather
           than being interpreted.
         - ESC \ sets the "ESC_STR_END" flag (when ESC is received), and then
           calls strhandle() (when \ is received) to interpret the collected
           bytes.
         - If the collected bytes begin with 'P' (i.e. if this was a DCS
           string) strhandle() sets the "ESC_DCS" flag again, confusing the
           state machine.
       
       If my understanding is correct, fixing the problem should be as easy as
       removing the line that sets ESC_DCS from strhandle():
       
       diff --git a/st.c b/st.c
       index ef8abd5..b5b805a 100644
       --- a/st.c
       +++ b/st.c
       @@ -1897,7 +1897,6 @@ strhandle(void)
                       xsettitle(strescseq.args[0]);
                       return;
               case 'P': /* DCS -- Device Control String */
       -                term.mode |= ESC_DCS;
               case '_': /* APC -- Application Program Command */
               case '^': /* PM -- Privacy Message */
                       return;
       
       I've tried the above patch and it fixes my problem, but I don't know if
       it introduces any others.
       "
       
       Diffstat:
         M st.c                                |       1 -
       
       1 file changed, 0 insertions(+), 1 deletion(-)
       ---
 (DIR) diff --git a/st.c b/st.c
       @@ -1897,7 +1897,6 @@ strhandle(void)
                        xsettitle(strescseq.args[0]);
                        return;
                case 'P': /* DCS -- Device Control String */
       -                term.mode |= ESC_DCS;
                case '_': /* APC -- Application Program Command */
                case '^': /* PM -- Privacy Message */
                        return;