[HN Gopher] Storing Data in Control Flow ___________________________________________________________________ Storing Data in Control Flow Author : chmaynard Score : 56 points Date : 2023-07-11 18:56 UTC (4 hours ago) (HTM) web link (research.swtch.com) (TXT) w3m dump (research.swtch.com) | kccqzy wrote: | I have second thoughts on the author's assertion that "When it's | possible to store state in code instead, that often leads to a | clearer program." | | I have found that an explicit state machine can be more readable | when there are many states. When there are too many states, a | straightforward transformation to control flow often cannot | eliminate all the "goto"s like the author successfully did. Even | if they were eliminated, there are usually too many nested "if" | conditions and the like to be easily readable. That's the kind of | spaghetti code that first inspired a ban on goto. (And that's not | even including cases when you are writing code to a spec that's a | state machine, like the TCP state machine.) | | What I personally like, sometimes, is to make the state variable | an enum and give names to be states. State 0 becomes | kBeforeStart, state 1 becomes kInsideQuote, state 2 becomes | kAfterBackslash. It suddenly now becomes intuitive. If you've | written any kind of emulator or interpreter it's even more | intuitive, because the state machine is just the DFA expanded | from the regular expression. If I'm using a language with | guaranteed tail calls, I can also replace each case in that | switch by a separate named function. | | It's possible to write clear and readable code in either style. | Doing so requires skill and expertise, of which I don't think the | author lacks. | binary132 wrote: | This makes me think of tries. You don't actually need to store | the value at the leaf if you simply accumulate the path traversed | in the traversal function. Of course sometimes it's more | efficient to store values. | samsquire wrote: | Thanks for this. | | I think high level control over control flow manipulation is | really underrated and underutilised in programming languages. | | For example, the decision where to jump is extremely powerful but | the primitives we have for it are weak. (Exceptions, algebraic | effects, if statements) | | Inheritance controls control flow but doesn't actually solve the | pattern of desired behaviour and code organisation most people | want. | | One idea I am exploring is the idea that efficient control flow | can be compiled from the inferred relationships of a rule engine. | | Information systems should be used to create all the | relationships between data and associations of desired behaviour | to data values or partial data values. | | Then all the if statements, switch statements and vtables can be | inferred. | | In other words the types the code operates on is inferred from | desired behaviour based on state. | knome wrote: | There is a game called TIS-100 where you solve problems using a | grid of networked processors that can each store one current | register and a second value you can swap to. It's an extremely | constrained environment as such. | | There is one challenge where you receive a series of numbers, | have to sort and then output them. The game provides a couple of | FIFO units you can read and write from in order to facilitate | this. | | I, instead, decided to do it without using them. I ended up | storing all of the state in branching. By passing values down and | then jumping and returning literals from branches, it gave me | effectively a comparatively massive amount of space based on | where the instruction pointer was sitting in any given node, | letting me manipulate the state, storing it as branches taken, | and then dumping out literals from the lowest levels of those | before resetting the nodes for the next series of values. | | It made me cognizant of how state, assumptions and literal values | are "stored" by which branch you choose, with no actual | variables. | | This is also what happens when you run a "check" on a value | rather than parsing it into a more specific type related to the | check [1]. After a `if( !isa_username( somestring) ){ return ; }` | you will often assume that `somename` is a username without | actually encoding that into its type. | | Because these assumptions are not checked by the compiler, then | the compiler can't stop you from inappropriately merging branches | or accidentally cutting out checks later. | | It's powerful, common, and generally dangerous. | | [1] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t- | va... | riwsky wrote: | Not checked by the compiler, except for cases where they're | checked by the compiler: | https://www.typescriptlang.org/docs/handbook/2/narrowing.htm... | | TS is probably the most famous gradual-type system to do this, | but IIRC pycharm is smart enough to figure this out for its | python linting, too | knome wrote: | Narrowing involves actual types. I was referring to the | compiler not checking mere value validation. | | I was referencing the kind of validation where someone checks | a bunch of attributes of a string, usually, or perhaps the | range of an integer, but never actually does anything | excepting in fail cases and then passes it on to other | functions that then just expect those invariants to hold. I | doubt typescript is remembering that my string doesn't have | dashes in it or that it adheres to some regex, for example. | | Type narrowing is fantastic for making things like nil | punning work inside a complex type system. | | I really like typescript's types. From where they started, | they did an absolutely incredible job creating a robust and | powerful type system that could manage almost any existing | javascript code pattern. | | Just really amazing work by the authors. | ramesh31 wrote: | Think of all the confusion and hot air that could be saved if we | all just switched to Lisp. | 10000truths wrote: | What's really unfortunate is that compilers _already_ perform | this "control flow -> state machine" transformation, but almost | none of them expose it to the user for direct manipulation - | instead, they tightly couple it to a bunch of unrelated | abstractions like event loop runtimes, async executors, managed | stacks, etc. I'd kill for a compiler that can properly: | | * Parse a coroutine that produces and consumes values | | * Perform all the normal function-level optimizations to minimize | frame space (stack slot reuse via liveness analysis of local | variables, re-computing temporaries across yield points, etc.) | | * Expose that coroutine frame as a fixed-size struct that I can | explicitly resume and query | | Zig is _almost_ there, but suspend /resume cannot return | values/take arguments, which requires some unergonomic | workarounds. Rust is making some promising progress on unified | coroutines (https://github.com/rust-lang/rfcs/pull/2781), but the | generator types are opaque so you can't encapsulate them in a | Sized struct or allocate an array of them. Not to mention that | it's still extra-unstable, and last I checked, there were issues | with generator size optimizations (https://github.com/rust- | lang/rust/issues/59087). C++20 coroutines are similarly opaque | and cannot be allocated as a contiguous array. | AlphaSite wrote: | Are you asking for what is roughly Python generator expressions | but in a compiled language? | 10000truths wrote: | Yes, something like Python generators which have yield and | send, but typed, and which compiles down to an efficiently | sized struct with a resume function. Here's a crude | illustration of what I mean, using pseudo-code describing a | TCP session flow: coroutine tcp_session() { | var syn_packet = @consume(); | if(!check_syn_flag(syn_packet)) { | @produce(ERROR_BAD_INPUT); return; } | var syn_ack_packet = build_tcp_segment(...); | @produce(syn_ack_packet); var ack_packet = | @consume(); if(!check_ack_flag(ack_packet)) { | @produce(ERROR_BAD_INPUT); } // ... proceed | with tcp session } | | Such a coroutine would then be usable elsewhere in code like | so: var tcp_connections = map<(ip_address, | u16), tcp_session>::new(); // ... var | incoming_ip_packet = read_ip_packet_from_raw_socket(); | var result = tcp_connections[get_dst(incoming_ip_packet)].res | ume(incoming_ip_packet); if(result == ERROR_BAD_INPUT) | { // ... } | | And the compiler would be able to determine upfront what the | size of the tcp_session struct is, so there'd be no need for | implicit boxing/heap allocations. | [deleted] ___________________________________________________________________ (page generated 2023-07-11 23:00 UTC)