[HN Gopher] Syntax Design (2014) ___________________________________________________________________ Syntax Design (2014) Author : memorable Score : 180 points Date : 2022-10-18 13:50 UTC (9 hours ago) (HTM) web link (cs.lmu.edu) (TXT) w3m dump (cs.lmu.edu) | samsquire wrote: | I began designing a language that handled recursion and iteration | as relations between variables which are topologically sorted to | determine control flow. | | Each function is a toplogical graph of stream functions so it is | similar to a data flow language or reactive programming language. | The goal is that you should express the critical insight of the | algorithm to work out what to write and the code is not nested so | there is very little tree structure. | | Algebralang is rough notes on how it would appear. | | https://github.com/samsquire/algebralang | | Example programs in the repository are binary search, btrees, a* | algorithm. | | I wrote a multithreaded parallel actor interpreter in Java and it | uses an invented assembly language which doesn't have a bytecode, | it's just text. | | https://github.com/samsquire/multiversion-concurrency-contro... | | I like the ideas behind ani language | https://github.com/waves281/anic | Jtsummers wrote: | If possible, I'd add indentation to your examples to make them | much more readable. As it stands, it's like reading one of my | math prof's C code (he was an old FORTRAN when it was shouting | coder and never learned to indent): insert t | node = recursive_deepest_first(items=t.children,item=t, | lastRecursion=l)( if len(t.children) == 0 t.activate() | location = reversed(t.children).find(item=i, item.value >= | node.value) output = insert t node if | len(t.children) > 3 { replace(t, | Node(value=t.children/(point=middle=m) = | m.value,children.sort(item=i,sortKey=i.value)=t) output | = l.t } else deepest t.children.append(node) ) | | Assuming this doesn't invalidate the program it reads more | clearly and only takes one more line: insert t | node = recursive_deepest_first(items=t.children, item=t, | lastRecursion=l)( if len(t.children) == 0 | t.activate() location = | reversed(t.children).find(item=i, item.value >= node.value) | output = insert t node if len(t.children) > 3 { | replace(t, Node(value=t.children/(point=middle=m) = m.value, | children.sort(item=i, sortKey=i.value)=t) output = | l.t } else deepest | t.children.append(node) ) | | That original program was hard to decipher for both lack of | indentation and the odd line breaks, and inconsistent choice of | space or no space after commas. Another question, why do you | use `if ... then...` in some examples and not in others? Is | that a user choice? | samsquire wrote: | Wow thanks for reading my page and looking at the examples. I | appreciate your time. | | There's a few bugs in that code. Sorry for presenting | something that obviously wasn't ready. I didn't use location | when I insert into that position in the btree. And the node | spliting code has an error. | | And thanks for reformatting the code. | | It's a very rough design. The critical insight over many | algorithms is hidden in one character or line. Such as a | strategic -1 or +1 or pattern of recursion that means it | becomes understandable. | | I find when writing code the structure of the code is more | important than the calculation or addition or subtraction. | Which is surprising because computers are calculators. The | structure of traversal, laying out data in memory and | structure of the jumping around instructions in memory is | harder than the core insight of a division, or subtraction or | addition or append or +1 or if statement here or there. | | When I write recursive code I often want to refer to outer | context of an outer recursion. So that's the meaning of the | "deepest" | Jtsummers wrote: | You may be interested in things like Strand and the work on | parallel Prologs which have a similar "let the computer | system sort out the proper execution order". This wouldn't | satisfy your syntax desire (Algol family) but may help | develop your understanding of the problem domain. | | A discussion last year: | https://news.ycombinator.com/item?id=26948351 (wow, 18 | months since that discussion, seemed more recent in my | memory) | adamnemecek wrote: | Seems down | https://web.archive.org/web/20221018135106/https://cs.lmu.ed... | Arch-TK wrote: | > Because C does not have real arrays | | C does have real arrays, they just get implicitly converted to | pointers to their first element in a lot of cases (for a | multitude of reasons in part having to do with simplifying the | language), A[B] is defined as such so it works with normal | pointers and arrays-converted-to-pointers in the same manner. | | Try using an array with sizeof, unary &, or in the form of a | string literal used to initialise an array. In those situations | it suddenly stops behaving like a pointer to its first element | and definitely behaves like something which is unlike anything | else in C (hint: it's an array). | djedr wrote: | Very nice little article! Learned some new terms. | | To anybody dabbling as I do in syntax design, who may be looking | for an extremely minimal representation for trees (even more | minimal than S-exprs!) I would like to introduce my little | project called Jevko: | https://djedr.github.io/posts/jevko-2022-02-22.html | | It is pure distilled treeness. Its grammar fits into one line, if | compressed well: Jevko = *("[" Jevko "]" / "`" | ("`" / "[" / "]") / %x0-5a / %x5c / %x5e-5f / %x61-10ffff) | | This took me years of syntax golfing to figure out. I think it's | turned out pretty nice. It's complete, formally defined, with a | few simple parsers written, except it has no users. ;D | | To relate back to the article, an interesting and AFAIK original | feature of this syntax is that newlines or other whitespace are | neither significant nor insignificant nor "possibly significant" | in Jevko. I'd call it whitespace-agnostic. Various whitespace | rules can be laid on top of it, producing for example a Lisp-like | language with native multiword identifiers with spaces, e.g.: | define [sum primes [[a][b]] accumulate [ [+] | [0] filter [ [prime?] enumerate | interval [[a][b]] ] ] ] | | here "sum primes" and "enumerate interval" are two double-word | identifiers. It's the only right_solution to the identifierWars, | I-tell-you! | nathell wrote: | I thought "what a weird name", then silently pronounced it and | my Polish ear heard "drzewko", meaning "little tree". What a | fitting name. :) | djedr wrote: | :) | | For a long time I couldn't find the right name that would | express the generic nature of it. | | An earlier prototype was called TAO, as an acronym for Tree | Annotation Operator (it had an extra feature called | operators), and as a reference to the ancient Chinese | concept, in essence nameless and by design hard to pin down | -- this seemed to fit perfectly. | | However there is about 2^42 cowznofski potrzebie things | called TAO (kind of ironic, as the original idea was that the | Tao would be distinct from the countless named things), so it | turned out to be a bad name after all. So I decided to find a | more unique one and here we are. | | The amount of time spent thinking about this and the lengths | I went to are better left untold. | | In other words, naming is hard. | abathur wrote: | This reminds me of Breck Yunits' Tree Notation | (https://treenotation.org/). Both seem to have a ~totalizing | energy. Maybe some common cause. :) | djedr wrote: | Indeed, it's close. Obviously mine and Breck's levels of | appreciation for indentation/brackets are very different. ;) | Although independent, the paths we have taken to arrive at | these are somewhat similar (somewhere early in there are | experiments with visual programming). As are the tools of | thought (minimalism). We were thus taken to similar places. | | Before I was aware of the existence of Tree Notation I put my | syntax online at tree-annotation.org (now defunct), so even | naming converged. I was initially very confused myself. :D | | Ultimately I think that the existence of multiple | incarnations of this idea suggests that there is (perhaps a | very niche) need for a minimal syntax like this. Something | like S-exps, but general-purpose. Trying to satisfy that need | is the common cause. | | The way I imagine it is that it would be supported across | programming languages, like JSON. It could be an universal | format for (tree) structured data. There is this piece of the | Unix philosophy which says that text streams are the | universal interface. That's true on a certain level. On | another level not far below binary streams are the universal | interface. On another level not far above... there was | nothing universal until XML. But that was overkill, so JSON | displaced it. But that's still overkill, so... | abathur wrote: | I agree that it feels like multiple projects are converging | on something that is ripe (or close). | | I have done some deep-digging for markup languages and came | across more than one project in this space. (I've added | Jevko to my list; | https://twitter.com/abathur/status/1582492437984837632) | | You may have already seen it as well, but you might also | find https://github.com/teamtreesurf/link interesting. | znkr wrote: | This is very beautiful, nice work. I wonder if I should use it | for something... | djedr wrote: | Thank you. :) | | > I wonder if I should use it for something... | | I'd be honored! | | A couple of ideas: | | How about a simple configuration format? https://gist.github. | com/djedr/681e0199859874b3324eaa84192c42... (I should make a | library out of this) | | Or you can put it in your query strings to make them more | humane: https://github.com/jevko/queryjevko.js | | Or make up a markup DSL: https://github.com/jevko/markup- | experiments#asttohtmltable | | Or serialize game objects in your indie game. Or make it the | interface of your experimental app. Or use it to shave off a | few unnecessary characters off your data: | https://jevko.github.io/compactness.html | | No parser in your favorite language? A basic one should be | only a couple dozen lines! | https://github.com/jevko/parsejevko.js / | https://github.com/jevko/specifications/blob/master/spec- | sta... | thrtythreeforty wrote: | I find the section on "syntactic salt" interesting: | | > The opposite of syntactic sugar, a feature designed to make it | harder to write bad code. Specifically, syntactic salt is a hoop | the programmer must jump through just to prove that he knows | what's going on, rather than to express a program action. | | This is perhaps an uncharitable way to describe it, but the | concept does ring a bell. Rust's unsafe {}, C++'s | reinterpret_cast<>(), etc - all slightly verbose. More important | than jumping through hoops, the verbosity helps when _reading_ | code to know that something out of the ordinary is going on. | DonHopkins wrote: | And then there's Perl's "syntactic syrup of ipecac". | | https://en.wikipedia.org/wiki/Syrup_of_ipecac | nyanpasu64 wrote: | What I can't stand about Rust is that the language developers | think they know better than language users developing software. | They stack on syntactic salt to make it more unpleasant to | write the equivalent of correct C++ programs with aliased | mutation or manual freeing, in situations where the idiomatic | Rust ways are _also_ unpleasant to write (Cell, RefCell, raw | pointers), have runtime overhead (RefCell), are busywork to | implement in working programs (restructuring your entire | program around an ownership tree with only ephemeral stack- | allocated cross-linking &mut, which is only sometimes possible | without reducing performance or increasing memory use), or are | easy, tempting, and undefined behavior in Rust but not C++ | (casting *mut to &mut in an unsafe block in a safe function). | athrowaway3z wrote: | "Man tries to hammer in screw. Angry at hammer manufactures | for not being screwdriver manufactures" | | ---- | | I'll just edit this comment to clarify: Rust does a lot of | thing really well. Non-RC manual memory management for a | tree/graph structure is absolutely not it. | | As for the verbose 'unsafe' pointer/memory manipulations, i | really don't see the issue. I've written my share and I think | its fine to add a roadblock if you want to shoulder the | ability to add segfaults and other issues into a codebase. | Additionally, it usually helps that you decide to encapsulate | it into the least number of unsafe functions, instead of | 'doing it manually' all over a codebase. | nyanpasu64 wrote: | Rust is intended to be a systems language capable of | replacing C++ in its niche, and interfacing with existing | C++ at a fairly coarse-grained level like Firefox's | oxidation (though cxx is trying to enable rich interop | passing richer types than C-ABI ones), so it's trying to be | a better screwdriver more so than a hammer. So difficulty | expressing C++ concepts is arguably a flaw, and difficulty | implementing all software with the same CPU/memory overhead | as C++ (which I'd argue is the case, though some would | disagree) is definitely a flaw. | | It's like creating a new screwdriver bit or handle, trying | to convert the world to it, then attracting a legion of | followers arguing that manufacturing flat-head screwdrivers | should expose you to legal liability for anyone who slips | it out of the socket and injures themselves (ignoring that | flat-head screws existed and will continue to exist). | Chris_Newton wrote: | "Left-handed person tries to put screw in with screwdriver | shaped for right-handed holding. Right-handed people are | surprised when left-handed person decides screwdriver isn't | for them and uses something easier instead." | | We saw this for years with C++ and the new-style casts. The | principle of making casting behaviours more specific and | clarifying the risk of using each of them was fine. In | practice, if someone asks programmers to start writing | verbose, syntactically awkward stuff like | reinterpret_cast<X*>(p) instead of (X*)p and either will | work, obviously in the real world many will choose the | latter. Empirically, the in-your-face syntax turned out to | be a deterrent to adopting the better tools the language | offered and so devalued those tools for everyone. | Rusky wrote: | This is not an accurate characterization of the Rust language | developers. Neither of these features were designed as | "syntactic salt!" They are compromises, made on a time budget | to achieve goals which were higher priority for the project- | but the door is still open to improve them. This is a far cry | from "knowing better than language users," which implies that | they could have simply left that syntax out while still | achieving their goals. (Or worse, that their goal was | specifically to annoy people...?) | | For instance, they are not satisfied with the current raw | pointer syntax either, as it interacts poorly with | lvalue/place syntax in ways that make unsafe code harder to | audit. There are regular proposals for how to improve things | like `(*ptr).field` or the use of raw pointers as method | receivers. | | The situation with interior mutability is similar: compile- | time memory safety inherently requires some limitations on | programming style, but I regularly see proposals for how to | improve "field projection" syntax. | | > undefined behavior in Rust but not C++ (casting *mut to | &mut in an unsafe block in a safe function). | | The question of "syntactic salt" aside, this is simply false. | nyanpasu64 wrote: | Casting _mut to &mut in a safe function is unsound, and UB | if the result is used alternatingly with an earlier &mut to | the save object (I've seen this in a library I tried | using). In C++ casting a _ to a & is sound, and casting a * | into a __restrict & might be unsound but restrict is so | rarely used that it doesn't matter, whereas safe Rust | nearly requires using &mut for mutating through a pointer. | | As for "compile-time memory safety inherently requires some | limitations on programming style", I find compile-time | lifetime safety to be a tradeoff, and often a net negative | in not only performance but ease of programming for low- | level code maintained by the same individual preserving a | "theory" of the code over time (whereas I don't find | compile-time bounds checking or thread safety to be a net | negative to programmer experience nearly as often). And | when I see people on crusades to stop programmers from | writing code in unsafe languages (taking away programmers') | ability to opt out of this tradeoff, I will stop at nothing | to oppose these people. | [deleted] | zppln wrote: | > syntactic salt | | I feel like this describes Rust's lifetime annotations pretty | well too. | epage wrote: | Not just lifetimes but types used to do more complex | lifetimes that are normalized by other languages like | RefCell, Arc, etc. | Karliss wrote: | I disagree about lifetime annotations being syntax salt. At | least with my interpretation of what syntax salt. | | From syntax perspective lifetime annotations are almost as | short as possible assuming you want to explicitly convey this | information at all ('a is just two symbols and one of them is | identifier). The alternative of not specifying it at all | comes with major tradeoffs of either in memory safety (like | in C) or runtime performance (like most programming languages | with dynamic memory management). In theory there is third | option of compiler fully deducing lifetimes, but that's far | from trivial, has it's own costs and realistically even | further narrow down what programs the compiler considers | valid and increasing compilation time. | | There are strong similarities with typing strategies. Just | because there are programming languages with dynamic typing | doesn't mean that explicit static typing is salt. Dynamic | typing has performance cost, and static inferred typing has | worse self documentation properties and slightly bigger | compilation time cost since you can't process each function | independently. | | On the other hand reinterpret_cast<Foo*>(expr) doesn't | provide to compiler any extra information that couldn't be | conveyed with simpler less verbose syntax like R(Foo*)expr or | (expr as Foo*). Same with unsafe{} blocks, compiler already | know which operations are unsafe. | tmtvl wrote: | I've heard it called "syntactic vinegar" instead. | | I wonder if there's a term for syntactic useless stuff, like | commas in Clojure quasiquoted lists. | cxr wrote: | > More important [...] the verbosity helps when reading code to | know that something out of the ordinary is going on | | This applies to JS as well with its strict equality check | (triple equals). Bad practices within the NodeJS ecosystem, | however, have led to circumstances where triple equals has been | cargo culted as the "right" thing to do for any equality | comparison. The consequences of this include code that is more | verbose, is no more type safe (and often _doesn 't_ do the | right thing for some inputs--whereas with double equals, in | contrast, it would...), and that the appearance of triple | equals is no longer a strong signal that there's something | happening that's worth paying attention to. | adamddev1 wrote: | Can you give some examples of inputs where === _doesn 't_ do | the right thing? | mvf4z7 wrote: | This reminds me of Reacts "dangerouslySetInnerHTML" prop. | | https://reactjs.org/docs/dom-elements.html#dangerouslysetinn... | [deleted] | mncharity wrote: | Another exploration of syntax: http://rigaux.org/language- | study/syntax-across-languages.htm... | xaedes wrote: | Wow, amazing! This really is a comprehensive overview. | Basically a syntax Tafelwerk. | hzhou321 wrote: | The `infix` syntax is missing from the major items. Without infix | syntax, all languages are just variations of LISP -- I guess that | was all the article is about. | JohnDeHope wrote: | I enjoy this sort of "one example, multiple different lenses" | style of discussion. It reminds me of this book... Exercises in | Programming Style by Cristina Videira Lopes. | tabtab wrote: | I actually like the VB-style, but VB did it mostly wrong: if you | start the block with X, you should always end it with "End X". | Thus, you'd have While ... End While instead of crap like "Wend" | and "Next" (in For...Next). | | It's more legible to know what block is being ended. C-style | continually frustrates me that area. The End-X style just never | found a nice way to wrap text for longer statements. | | C-style also has a problem in that there is no way to define | arbitrary blocks: it relies too much on key-words. I'm trying to | remedy this with "Moth" syntax: | | https://www.reddit.com/r/ProgrammingLanguages/comments/ky22d... | | It's LINQ-esque but without the bloated Lambda conventions, and | influenced by XML in that you have a simple syntax pattern that | can "implement" many domains' needs. It started with an attempt | to merge the best of Lisp and C-style. (Whether it succeeded or | not is hotly contested. I welcome other attempts.) | f1shy wrote: | Note that in the for case, you can have many "next" and having | many "end" would be silly. | Jtsummers wrote: | > I actually like the VB-style, but VB did it mostly wrong: if | you start the block with X, you should always end it with "End | X". Thus, you'd have While ... End While instead of crap like | "Wend" and "Next" (For...). | | That's covered in their syntactic salt section, but Ada does, | mostly, what you describe. procedure Proc(...) | is -- vars, types, and subprograms defined here | begin ... end Proc; for I in | Some_Array'Range loop ... end loop; if | condition then ... end if; | mojifwisi wrote: | > C-style also has a problem in that there is no way to define | arbitrary blocks | | I might have misunderstood what you mean by "arbitrary blocks", | but you can definitely do this in C: int main() | { { /* arbitrary block */ return 0; | } } | Jtsummers wrote: | They seem to mean a computational block (or a closure/lambda) | that can be passed on to other functions. Try to do this in C | (it's invalid as presented, but this is the concept): | int main() { int* collection = ... int* | filtered = filter(collection, int func(int item) { return | item > 10; }); ... } | | You have to actually define a function at the top level in | order to pass that in and there is no notion of closures so | you can't do the more useful thing that you might have in | even C++ these days: int main() { | int* collection = ... int limit = ... auto | filter = [&limit](auto item) { return item > limit; }; | int* filtered = filter(collection, filter); // assuming | `filter` is defined ... } | | You can come close, but you create a lot of extra bookkeeping | in your C program to pull it off, and the functions are still | only defined at the top-level. | wodow wrote: | I really like the term "Sugary Functional Style" -- sweetening | pure functional programming with faux procedurality. | | Looks like it's a (three word) Google Whack at the time of | writing: | https://www.google.com/search?hl=en&q=%22Sugary%20Functional... | bhauer wrote: | It's awesome to see an article from Dr. Ray Toal on HN this | morning! In my biased opinion, the excellent tenured professors | at LMU's CS program make it a stand out for its size. | kragen wrote: | Another interesting weird syntax I ran across a few years ago is | OGDL: https://ogdl.org/ | | It's sort of an alternative to S-expressions with much less | punctuation, but the data model is slightly different -- in | S-expressions you label the leaves, and in OGDL you label the | nodes. In other contexts these node-labeled trees are sometimes | called "rose trees"; they are the basic data model of, for | example, Prolog. Labeling nodes is almost equivalent to labeling | arcs, but OGDL does support multiple references, so not quite. | | The OGDL proposal was intended for data, like XML, not programs. | They started out by trying to simplify YAML, which has arrays and | dicts, and they simplified it by unifying them into a single | structure. | | Here's one of their examples: network | eth0 ip 192.168.0.10 mask 255.255.255.0 | gw 192.168.0.1 hostname crispin | | This is not quite just an edge-labeled digraph because, as in | S-expressions, the order of arcs within a node is significant; | you can have multiple edges with the same label in the same node, | and you can select edges by ordinal rather than, or in addition | to, label. | | This is of course amenable to use as a programming syntax. | Existenceblinks wrote: | Fun read. I would love to read what if it's not text based, is it | going to be different? Visual programming seems to suffer from | composability and it's also bounded to be using human language as | well, box with border is hard to comprehend, can get messy | easily. | | I mean, visual but text-like theme. It seems to be in sweat spot. | Only fix some downside/limit of text. | csmeyer wrote: | Shameless plug for my hybrid visual/text pl, Pickcode, which | matches what you're talking about | | Demo programs: https://app.pickcode.io/playground | Vermeulen wrote: | Wow this is awesome, love this style. I did something really | similar with our game's scripting language called MBScript: | https://docs.modboxgame.com/docs/mbscript Same kind of line | setup, visual add button, etc. | | Whats Pickcode made for? Web programming? | csmeyer wrote: | Pickcode is meant for K-12 education as an alternative to | block programming. The end goal is to have a WYSIWG editor | for web apps with behavior defined using the visual | programming language. | | MBScript looks great and I'd love to talk about your | learnings from it. My contact is on the pickcode.io landing | page if you want to chat! | Existenceblinks wrote: | Hey, yes! Nice, I'm thinking more serious and ambitious. You | could go mass by adding "module" and ways to compose. The | keyboard navigation is ok-ish (honest opinion) because this | is the hardest ones which is design for every day task in | long run, at minimum should be as fast as text based | programming. At least 3-4 devs are comfortable to work on | this codebase (non-realtime, just normal version branching | flow) | | I really really want this decades old idea to take off. We | should have grammar files in .. json is fine (have to start | somewhere), and have spec for editor implementor to spread it | across platforms. Ideally languages creators only have to | customize "view" to decide how their lang would look like. | Probably configure keymaps if they think their lang can be | developed fast with certain keystroke (akin to emacs but more | friendly because medium is not text anymore) | masklinn wrote: | You could check out Self. It's image-based so the objects are | "live", and can be interacted with directly via the UI. | Existenceblinks wrote: | Is it https://www.youtube.com/watch?v=CCx6Nj_Hr1g ? I've seen | quite many live programming languages. Though not a single | one seems to want to go mass. Like able to have at least a | 3-4 devs team work on it. ___________________________________________________________________ (page generated 2022-10-18 23:00 UTC)