[HN Gopher] Storing Data in Control Flow
       ___________________________________________________________________
        
       Storing Data in Control Flow
        
       Author : chmaynard
       Score  : 56 points
       Date   : 2023-07-11 18:56 UTC (4 hours ago)
        
 (HTM) web link (research.swtch.com)
 (TXT) w3m dump (research.swtch.com)
        
       | kccqzy wrote:
       | I have second thoughts on the author's assertion that "When it's
       | possible to store state in code instead, that often leads to a
       | clearer program."
       | 
       | I have found that an explicit state machine can be more readable
       | when there are many states. When there are too many states, a
       | straightforward transformation to control flow often cannot
       | eliminate all the "goto"s like the author successfully did. Even
       | if they were eliminated, there are usually too many nested "if"
       | conditions and the like to be easily readable. That's the kind of
       | spaghetti code that first inspired a ban on goto. (And that's not
       | even including cases when you are writing code to a spec that's a
       | state machine, like the TCP state machine.)
       | 
       | What I personally like, sometimes, is to make the state variable
       | an enum and give names to be states. State 0 becomes
       | kBeforeStart, state 1 becomes kInsideQuote, state 2 becomes
       | kAfterBackslash. It suddenly now becomes intuitive. If you've
       | written any kind of emulator or interpreter it's even more
       | intuitive, because the state machine is just the DFA expanded
       | from the regular expression. If I'm using a language with
       | guaranteed tail calls, I can also replace each case in that
       | switch by a separate named function.
       | 
       | It's possible to write clear and readable code in either style.
       | Doing so requires skill and expertise, of which I don't think the
       | author lacks.
        
       | binary132 wrote:
       | This makes me think of tries. You don't actually need to store
       | the value at the leaf if you simply accumulate the path traversed
       | in the traversal function. Of course sometimes it's more
       | efficient to store values.
        
       | samsquire wrote:
       | Thanks for this.
       | 
       | I think high level control over control flow manipulation is
       | really underrated and underutilised in programming languages.
       | 
       | For example, the decision where to jump is extremely powerful but
       | the primitives we have for it are weak. (Exceptions, algebraic
       | effects, if statements)
       | 
       | Inheritance controls control flow but doesn't actually solve the
       | pattern of desired behaviour and code organisation most people
       | want.
       | 
       | One idea I am exploring is the idea that efficient control flow
       | can be compiled from the inferred relationships of a rule engine.
       | 
       | Information systems should be used to create all the
       | relationships between data and associations of desired behaviour
       | to data values or partial data values.
       | 
       | Then all the if statements, switch statements and vtables can be
       | inferred.
       | 
       | In other words the types the code operates on is inferred from
       | desired behaviour based on state.
        
       | knome wrote:
       | There is a game called TIS-100 where you solve problems using a
       | grid of networked processors that can each store one current
       | register and a second value you can swap to. It's an extremely
       | constrained environment as such.
       | 
       | There is one challenge where you receive a series of numbers,
       | have to sort and then output them. The game provides a couple of
       | FIFO units you can read and write from in order to facilitate
       | this.
       | 
       | I, instead, decided to do it without using them. I ended up
       | storing all of the state in branching. By passing values down and
       | then jumping and returning literals from branches, it gave me
       | effectively a comparatively massive amount of space based on
       | where the instruction pointer was sitting in any given node,
       | letting me manipulate the state, storing it as branches taken,
       | and then dumping out literals from the lowest levels of those
       | before resetting the nodes for the next series of values.
       | 
       | It made me cognizant of how state, assumptions and literal values
       | are "stored" by which branch you choose, with no actual
       | variables.
       | 
       | This is also what happens when you run a "check" on a value
       | rather than parsing it into a more specific type related to the
       | check [1]. After a `if( !isa_username( somestring) ){ return ; }`
       | you will often assume that `somename` is a username without
       | actually encoding that into its type.
       | 
       | Because these assumptions are not checked by the compiler, then
       | the compiler can't stop you from inappropriately merging branches
       | or accidentally cutting out checks later.
       | 
       | It's powerful, common, and generally dangerous.
       | 
       | [1] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-
       | va...
        
         | riwsky wrote:
         | Not checked by the compiler, except for cases where they're
         | checked by the compiler:
         | https://www.typescriptlang.org/docs/handbook/2/narrowing.htm...
         | 
         | TS is probably the most famous gradual-type system to do this,
         | but IIRC pycharm is smart enough to figure this out for its
         | python linting, too
        
           | knome wrote:
           | Narrowing involves actual types. I was referring to the
           | compiler not checking mere value validation.
           | 
           | I was referencing the kind of validation where someone checks
           | a bunch of attributes of a string, usually, or perhaps the
           | range of an integer, but never actually does anything
           | excepting in fail cases and then passes it on to other
           | functions that then just expect those invariants to hold. I
           | doubt typescript is remembering that my string doesn't have
           | dashes in it or that it adheres to some regex, for example.
           | 
           | Type narrowing is fantastic for making things like nil
           | punning work inside a complex type system.
           | 
           | I really like typescript's types. From where they started,
           | they did an absolutely incredible job creating a robust and
           | powerful type system that could manage almost any existing
           | javascript code pattern.
           | 
           | Just really amazing work by the authors.
        
       | ramesh31 wrote:
       | Think of all the confusion and hot air that could be saved if we
       | all just switched to Lisp.
        
       | 10000truths wrote:
       | What's really unfortunate is that compilers _already_ perform
       | this  "control flow -> state machine" transformation, but almost
       | none of them expose it to the user for direct manipulation -
       | instead, they tightly couple it to a bunch of unrelated
       | abstractions like event loop runtimes, async executors, managed
       | stacks, etc. I'd kill for a compiler that can properly:
       | 
       | * Parse a coroutine that produces and consumes values
       | 
       | * Perform all the normal function-level optimizations to minimize
       | frame space (stack slot reuse via liveness analysis of local
       | variables, re-computing temporaries across yield points, etc.)
       | 
       | * Expose that coroutine frame as a fixed-size struct that I can
       | explicitly resume and query
       | 
       | Zig is _almost_ there, but suspend /resume cannot return
       | values/take arguments, which requires some unergonomic
       | workarounds. Rust is making some promising progress on unified
       | coroutines (https://github.com/rust-lang/rfcs/pull/2781), but the
       | generator types are opaque so you can't encapsulate them in a
       | Sized struct or allocate an array of them. Not to mention that
       | it's still extra-unstable, and last I checked, there were issues
       | with generator size optimizations (https://github.com/rust-
       | lang/rust/issues/59087). C++20 coroutines are similarly opaque
       | and cannot be allocated as a contiguous array.
        
         | AlphaSite wrote:
         | Are you asking for what is roughly Python generator expressions
         | but in a compiled language?
        
           | 10000truths wrote:
           | Yes, something like Python generators which have yield and
           | send, but typed, and which compiles down to an efficiently
           | sized struct with a resume function. Here's a crude
           | illustration of what I mean, using pseudo-code describing a
           | TCP session flow:                 coroutine tcp_session() {
           | var syn_packet = @consume();
           | if(!check_syn_flag(syn_packet)) {
           | @produce(ERROR_BAD_INPUT);           return;         }
           | var syn_ack_packet = build_tcp_segment(...);
           | @produce(syn_ack_packet);         var ack_packet =
           | @consume();         if(!check_ack_flag(ack_packet)) {
           | @produce(ERROR_BAD_INPUT);         }         // ... proceed
           | with tcp session       }
           | 
           | Such a coroutine would then be usable elsewhere in code like
           | so:                 var tcp_connections = map<(ip_address,
           | u16), tcp_session>::new();       // ...       var
           | incoming_ip_packet = read_ip_packet_from_raw_socket();
           | var result = tcp_connections[get_dst(incoming_ip_packet)].res
           | ume(incoming_ip_packet);       if(result == ERROR_BAD_INPUT)
           | {         // ...       }
           | 
           | And the compiler would be able to determine upfront what the
           | size of the tcp_session struct is, so there'd be no need for
           | implicit boxing/heap allocations.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-07-11 23:00 UTC)