[HN Gopher] Tree-sitter: an incremental parsing system for progr...
       ___________________________________________________________________
        
       Tree-sitter: an incremental parsing system for programming tools
        
       Author : sbt567
       Score  : 331 points
       Date   : 2021-02-22 15:03 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | alissasobo wrote:
       | You can watch a good Strangeloop presentation on Tree Sitter.
       | https://www.youtube.com/watch?v=Jes3bD6P0To
        
       | wiradikusuma wrote:
       | While we're in this discussion: Say I want to implement "SQL" for
       | my app (if you've used Jira, I want to make my own JQL). Is this
       | the tool for that? I'm looking for something much simpler than
       | ANTLR.
        
       | Grimm1 wrote:
       | I recently used this to put together a unified PL classification
       | model. It's nice because any language treesitter grows to support
       | we'll support pretty effortlessly and treesitter captures more
       | than enough nuance per language to derive high quality
       | classifications.
       | 
       | It's fair to say we can classify a snippet of code based on
       | either single or multiple AST paths produced by treesitter. Right
       | now only doing the programming language but extending it to
       | function classification or description etc isn't out of the
       | question we just don't need it right now.
        
       | drewdennison wrote:
       | We've been using tree-sitter for Semgrep and it's nothing short
       | of incredible. Amazing work by Max and team.
        
       | ahelwer wrote:
       | I half-wrote a tree-sitter grammar for a niche DSL (the PRISM
       | probabilistic model checking language). It was a very nice
       | experience. It's part of another half-written side project to
       | create a language server for PRISM; I still haven't gotten around
       | to making the whole end-to-end pipeline work.
       | 
       | With its syntax tree query frontend I wonder whether tree-sitter
       | would make a good interpreter frontend for some niche languages,
       | or you need something more powerful.
        
       | amelius wrote:
       | Next steps: incrementally resolve symbols and type-check?
        
         | dcreager wrote:
         | We're currently working on a more precise version of the Code
         | Nav that's shipped on github.com, which is very similar in
         | spirit to this!
        
       | ritter2a wrote:
       | I tried to use this to ease the front end work load of students
       | in a compiler project (building a C compiler) for a University
       | course, so that the project could be focused on the more
       | interesting middle and back end parts of the compiler. However,
       | reported bugs in the C grammar that saw no activity at all [1]
       | made this impossible. From this small sample of experiences, I
       | was left with the impression that Tree Sitter is great for things
       | like syntax highlighting, where wrong results are annoying but
       | not dramatic, but not so suitable for tools that need a really
       | correct syntax tree.
       | 
       | --- [1] https://github.com/tree-sitter/tree-sitter-c/issues/51
        
       | dang wrote:
       | If curious, past threads:
       | 
       |  _Tree-sitter: new incremental parsing system for programming
       | tools (2018) [video]_ -
       | https://news.ycombinator.com/item?id=21675113 - Dec 2019 (28
       | comments)
       | 
       |  _Tree-sitter - a new parsing system for programming tools
       | [video]_ - https://news.ycombinator.com/item?id=18213022 - Oct
       | 2018 (25 comments)
       | 
       | Others?
        
         | maxbrunsfeld wrote:
         | One more that I know of:
         | 
         |  _Atom understands your code better than ever before_ -
         | https://news.ycombinator.com/item?id=18349013 - Oct 2018
        
       | Annili wrote:
       | I'm curious to see if Tree-sitter can be used to provide fast and
       | rich code navigation. I was able to implement simple goto
       | definition/references [1], not sure if it can be used for more
       | advanced navigation features in a language-agnostic way.
       | 
       | If you're interested, GitHub is already using it [2] for that
       | purpose and Sourcegraph is experimenting it [3]
       | 
       | [1] https://github.com/alidn/lsif-os [2]
       | https://github.com/github/semantic [3]
       | https://github.com/sourcegraph/sourcegraph/issues/17378
        
         | maxbrunsfeld wrote:
         | At GitHub, we're in the process of building a more precise code
         | navigation system on top of Tree-sitter, that models language-
         | specific name-resolution rules in detail.
         | 
         | Our currently-available code navigation system also uses Tree-
         | sitter, but it is pretty simple; it just matches up references
         | and definitions by their name.
        
       | himujjal wrote:
       | Wrote tree-sitter-svelte. Was a good experience. I am also
       | writing a programming language of my own similar to TypeScript
       | and I am using tree-sitter for the same. Its a delight to work
       | with it. Removes a lot of the worries.
        
       | ducktective wrote:
       | Is this the same thing neovim uses for syntax highlighting?
       | 
       | Is there a chance for it getting integrated to vim? Last I
       | checked vim used a regex method which was slow and faulty.
        
         | ckolkey wrote:
         | Yup, neovim 0.5+ will be using treesitter for any supported
         | languages, with the current Regex highlighting as a fallback.
        
           | [deleted]
        
           | mkingston wrote:
           | Follow the nvim 0.5 release here:
           | https://github.com/neovim/neovim/milestone/19
        
       | guerrilla wrote:
       | Is the use case for this mainly IDEs or is it intended to replace
       | traditional lexer and parser generators too?
        
         | dcreager wrote:
         | We are also using this to power a lot of the program analysis
         | features on github.com. We use it to generate the symbol list
         | for Code Navigation, as an example, and are starting to look at
         | extracting more semantic information about some languages using
         | tree-sitter parse trees as intermediaries.
        
         | patrec wrote:
         | I have used tree-sitter, but only for a very simple use case.
         | The main shortcoming I am aware of are error messages, see
         | here:
         | 
         | https://github.com/tree-sitter/tree-sitter/issues/255
         | 
         | Tree sitter will basically always generate a parse tree, even
         | for malformed input, in which case it will add ERROR nodes for
         | the bits it doesn't like (it will also inform you that there
         | were problems with the parse by setting a boolean attribute).
         | So you have some information you can use to construct a useful
         | error message yourself, but some parser generators will handle
         | this better (although it has to be said that the difficulty of
         | obtaining good error messages from a parser generator are still
         | one of the main the reasons production parsers are mostly
         | written by hand).
        
           | guerrilla wrote:
           | Ah I see, so the reparation isn't avoidable for now? That
           | doesn't seem very appropraite for compilers then.
        
             | patrec wrote:
             | Why would it not be appropriate? The only annoyance I see
             | is that currently you will have to generate a good error
             | message from it yourself, but a first pass at the problem
             | shouldn't be too onerous.
        
               | guerrilla wrote:
               | Ok, I misunderstood. I thought it repaired without error
               | sometimes but I see that you were clear that that isn't
               | the case.
        
       | srcreigh wrote:
       | Tree Sitter is amazing. The parsing is fast enough to run on
       | every keystroke. The parse tree is extremely concise and
       | readable. It resembles an AST more than a parse tree (ie no 11
       | levels of binary op precedence rules in the tree). The parse tree
       | emits specific ERROR nodes, so you can get a semi-functional tree
       | even with broken syntax.
       | 
       | I can't wait for the tools to get built with this. Paredit for
       | TypeScript. Syntax-tree based highlighting (vs regex
       | highlighting). A command to "add an arg to current function"
       | which works across languages. A command to add a CSS class to the
       | nearest JSX node, or to walk up the tree at the className="| ..."
       | position, adding a new className if it doesn't exist.
       | 
       | There's a nicely documented Emacs package for this [1]. The
       | documentation is at [2]. The parse trees work great. There's
       | syntax highlighting support and tree-walking APIs. There's a bit
       | of confusion about TSX vs typescript langs but it's fixable with
       | some config change [3].
       | 
       | [1]: https://github.com/ubolonton/emacs-tree-sitter [2]:
       | https://ubolonton.github.io/emacs-tree-sitter/ [3]:
       | https://github.com/ubolonton/emacs-tree-sitter/issues/66#iss...
        
         | Annili wrote:
         | > "Paredit for TypeScript"
         | 
         | Is there a list of ideas for Structural Editing in C-like
         | languages?
         | 
         | I can think of `extend-selection, `move to parent block`, `add
         | arg to function`
        
         | dcreager wrote:
         | Worth calling out that the syntax highlighting support is used
         | to highlight several languages in github.com. (Linguist is
         | still used for the long tail of languages, but we plan to
         | migrate more and more over to tree-sitter-based highlighting
         | over time.)
         | 
         | The query language is also what's used to drive the
         | fuzzy/ctags-like Code Navigation feature. Both of those are
         | powered by tree-sitter query files defined in each language's
         | repo, like these for Go: https://github.com/tree-sitter/tree-
         | sitter-go/tree/master/qu...
        
           | eins1234 wrote:
           | Awesome to hear that amazing tech like tree-sitter lives on
           | even though Atom, the product it was built for, is pretty
           | much on life support at this point.
           | 
           | Curious if there's any efforts to bring tree-sitter to
           | VSCode? Exposing tree-sitter to extensions could open up so
           | many possibilities like OP mentioned.
        
         | josteink wrote:
         | Tooting my own horn, Emacs' csharp-mode[1] is undergoing a
         | rewrite to be 100% based on tree-sitter rather than regexps.
         | 
         | The new code runs way faster and is so much nicer to work with.
         | 
         | Once all the kinks are gone, I can't imagine going back.
         | 
         | [1] https://github.com/emacs-csharp/csharp-
         | mode/blob/master/csha...
        
         | robto wrote:
         | I'm so excited for this to become built-in in more places! I
         | think once non-lisp users can experience the Power of
         | Structural Editing they'll say, "Hey, I understand now why you
         | all feel so passionate about your parentheses!"
         | 
         | And I can stop feeling like my fingers have all lost a knuckle
         | when I'm writing Typescript :)
        
         | rgossiaux wrote:
         | Neovim nightly already has some tools available as plugins. I'm
         | using tree-sitter for syntax highlighting, text objects, and
         | folding right now. Pretty satisfied so far.
        
           | mkingston wrote:
           | The official release of built-in treesitter comes with neovim
           | 0.5. Which _looks_ like it 'll be out pretty soon. I've been
           | watching a fairly steady march toward release here:
           | https://github.com/neovim/neovim/milestone/19
        
         | tazjin wrote:
         | A friend of mine started working on an experimental Emacs mode
         | to provide structural navigation of code based on tree-sitter:
         | https://cs.tvl.fyi/depot/-/tree/users/Profpatsch/emacs-tree-...
         | 
         | The potential for this is essentially something like Paredit,
         | but for all languages.
        
           | yewenjie wrote:
           | Can someone point some examples of what `paredit` for other
           | languages provide? I do various lisp programming occasionally
           | but have not used `paredit` yet.
        
             | tazjin wrote:
             | Check out this video for a quick demo:
             | http://emacsrocks.com/e14.html
             | 
             | If you know a Lisp I recommend just giving paredit a spin
             | for a few minutes, it's an interesting experience.
        
               | z3t4 wrote:
               | Looks like it's mainly tree/code manipulation. Typing
               | code on the keyboard is probably the least taxing thing
               | when it comes to software development. But I guess it
               | will be nice once it has become a "reflex" rather then a
               | conscious key-combo.
        
               | tazjin wrote:
               | It's not so much about reducing the amount of characters
               | typed, and instead moving the way you think about code
               | from the character level to a more structural level.
               | 
               | Calling it a "reflex" is an interesting phrase! Tools
               | like magit let me encode complicated processes into
               | muscle memory, in a way where retrieval doesn't have to
               | go through remembering and typing a string. Structural
               | editing is similar.
        
               | mumblemumble wrote:
               | I only started using it a few months ago. It's such a
               | natural way to edit code, it only took me about a day for
               | it to become reflexive.
               | 
               | Now it just feels vaguely annoying to work without it.
               | It's fine, it's just one of those ergonomic changes that
               | nags at you a bit. Kind of like the opposite of that
               | feeling of taking off uncomfortable business clothes at
               | the end of the day. Or what I imagine people who are
               | better at vim than me keep talking about.
        
         | lalaithion wrote:
         | Maybe I can finally have this syntax highlighting style:
         | https://youtu.be/b0EF0VTs9Dc?t=900
        
           | srcreigh wrote:
           | There is an emacs package for this (maybe beta). I can't
           | remember the name of it and Google is failing me.
           | 
           | EDIT: finally found it https://github.com/alphapapa/prism.el
        
             | jackcviers3 wrote:
             | Rainbow delimiters mode kind of does this, but doesn't
             | maintain the scope color of referenced variables.
        
           | brundolf wrote:
           | The idea is pretty awesome, but my eyes nearly rolled out of
           | my head from the needless condescension at the beginning.
        
       | maxbrunsfeld wrote:
       | Hey, Tree-sitter author here. Thanks for posting! Let me know if
       | you have questions about the project.
        
         | gravypod wrote:
         | When I played around with tree sitter a bit I noticed there
         | were situations where ast elements didn't exactly contain what
         | I'd expect them to. For example: comments are represented in
         | the AST but unfortunately they don't have the contents of the
         | comment parsed out following the laguanges conventions.
         | 
         | I was wondering if this is a case I could open an issue about?
         | Is this for the main tree sitter repo or should I open one
         | language-by-language?
         | 
         | I was looking into automating some stuff across all languages
         | with tree-sitter but handling all of the languages comments
         | syntaxes made it very hard.
        
           | maxbrunsfeld wrote:
           | Most tree-sitter grammars just parse comments as a single
           | token. Can you give an example of what you mean when you say
           | "contents of the comment parsed out"?
           | 
           | Are you talking about conventions like JSDoc, for putting
           | structured data inside of comments? On GitHub, we handle that
           | by parsing JSDoc comments in a separate pass, using a
           | separate parser. We do it this way because JSDoc isn't really
           | part of the JavaScript language, not all projects use JSDoc,
           | and not all applications are interested in parsing the text
           | inside of comments.
        
             | gugagore wrote:
             | My guess is that they meant parsing code that has been
             | "commented out".
        
         | lemming wrote:
         | Is it possible to use tree-sitter to generate parsers in
         | languages other than C? How hard would it be to modify it to
         | create parsers in e.g. Java?
         | 
         |  _Edit:_ sorry, I just saw that you had answered that below.
        
         | anaerobicover wrote:
         | I've done two grammars for my own use in the last few months
         | (well, one isn't quite complete yet) and it's been quite an
         | enjoyable (learning) experience. Thanks for sharing this tool!
        
           | maxbrunsfeld wrote:
           | That's great to hear. Thanks!
        
         | autoditype wrote:
         | Thanks for building this. I had not heard of it before, but it
         | looks great Are there more tutorials elsewhere on the Internet
         | you would recommned, besides what is in the documentation?
        
           | maxbrunsfeld wrote:
           | Not that I know of, right now :(.
           | 
           | In the near future, we'll create some more GitHub-specific
           | documentation that walks you through how to add advanced
           | language support for any programming language on GitHub, by
           | writing a Tree-sitter grammar, and then by writing the _tree
           | queries_ that are used for syntax highlighting, simple code
           | navigation, and someday soon... _precise code navigation_.
        
         | yig wrote:
         | Are there any plans to support modifying the grammar on the fly
         | or without recompiling?
        
           | maxbrunsfeld wrote:
           | One day, I would love to generalize the web-based playground
           | so that you could edit the grammars. But it's complicated,
           | because we use C as our output language, so you would always
           | need to recompile the C after changing the grammar.
           | 
           | So, I would say that it's not on our near-term roadmap.
        
           | dcreager wrote:
           | I don't think you can do this without recompiling, since the
           | grammars get translated into C code before use. But the
           | built-in command line tools ('tree-sitter parse', etc) all
           | support a mode where they will detect local changes to a
           | checked-out grammar definition, and recompile on the fly if
           | needed. (This happens each time the CLI program is started
           | up; it doesn't happen during a long-running process.)
        
             | sitkack wrote:
             | The obvious answer is to embed TCC or another C compiler
             | and either generate a dynamic library or generate wasm and
             | load it directly into the process.
             | 
             | exec_wasm(generate_wasm(generate_c(grammar)))
             | 
             | Now if you can make that whole fn chain incremental, then a
             | delta_grammar -> delta_c -> delta_wasm ->
             | delta_recomputed_wasm_call stack, this will propagate
             | deltas down to exec_wasm and you could dynamically execute
             | the generated code as the grammar changes.
        
         | akavel wrote:
         | There's been some recent discussion as to whether tree-sitter
         | grammars can be used to parse markdown with some hacks or not
         | (currently it's being done by working around all the tree-
         | sitter machinery, resulting in a lot of problems), with no
         | consensus among plugin authors:
         | 
         | https://github.com/nvim-treesitter/nvim-treesitter/issues/87...
         | 
         | Could you possibly chime into that discussion and help them
         | with any possible insights you might have on that? That would
         | be really awesome! TIA <3
        
         | fiddlerwoaroof wrote:
         | I've been using tree-sitter via FFI from Common Lisp, but what
         | I'd really like would be a way to write my own code generator
         | so that the generated parser could be "native" lisp code.
         | Otherwise, it's an amazing tool: my only other complaint would
         | be the lack of a grammar for objective-c which would be useful
         | for a lisp/objective-c bridge I've been working on.
        
           | maxbrunsfeld wrote:
           | I think that it'd be pretty easy to generate parser code in
           | other languages besides C, but it would be a lot of work to
           | do to port the core library itself[1] to those other
           | languages.
           | 
           | [1] https://github.com/tree-sitter/tree-
           | sitter/tree/master/lib/s...
           | 
           | I agree about the Objective-C grammar! Although it looks like
           | somebody's started work on it:
           | 
           | https://github.com/merico-dev/tree-sitter-objc
        
         | josephg wrote:
         | There's an architecture for compilers that I've been wanting
         | for years where a keystroke change to the sourcecode results in
         | an incremental change to the AST, and then the compiler can
         | consume that AST delta to generate a binary patch to the
         | compiled executable.
         | 
         | Would tree-sitter be able to be used for that? (What I want is
         | to feed tree-sitter a stream of keystroke changes and get out a
         | stream of minimal AST changes as a result).
        
       | chrisseaton wrote:
       | Tree-sitter is unfathomable to me. This is the grammar for Ruby:
       | 
       | https://github.com/tree-sitter/tree-sitter-ruby/blob/master/...
       | 
       | I find it absolutely amazing that a grammar for something as
       | complicated as Ruby can be so concise. Less than a thousand
       | lines. The corresponding Bison grammar is 13k lines. And I think
       | the tree-sitter one is scannerless so also includes the lexer?!
       | How do they do it?
        
         | codesnik wrote:
         | bison should be compared to https://github.com/tree-
         | sitter/tree-sitter-ruby/blob/master/... probably?
        
           | chrisseaton wrote:
           | No the JSON file there is generated (I believe?) from the
           | JavaScript I linked, while the Bison file is hand-written.
           | 
           | With tree-sitter you're hand-writing a 1k file. With Bison
           | you're hand-writing a 13k file.
        
         | dcreager wrote:
         | This is more a function of Ruby than of tree-sitter. The tree-
         | sitter grammars for other languages are hopefully less
         | inscrutable. For Ruby, we basically just ported whitequark's
         | parser [1] over to tree-sitter's grammar DSL and scanner API.
         | 
         | [1] https://github.com/whitequark/parser
        
           | chrisseaton wrote:
           | I didn't mean the tree-sitter grammar was not understandable
           | - it's very understandable - I just can't work out how to
           | managed to find such a concise way to express grammars. Even
           | compared to Whitequark it's 1/3 the size. What's the unique
           | thing you do that makes it so concise?
           | 
           | It also seems somehow to be completely declarative? How have
           | you managed to transform Ruby parsing to be context-free? For
           | example where's the set of what's currently a local variable
           | so you can distinguish from method calls?
        
             | tp3 wrote:
             | The code is obviously much simpler than its syntax - most
             | importantly, its syntactical simplicity makes it way easier
             | to deal with. So when you write the code to parse it you
             | don't have to try to parse it in one fell swoop like you do
             | in Whitequark.
             | 
             | So you can't read anything from a method call! I can make
             | it so, if you're doing a class method (of any kind) you
             | have to invoke the constructor, as described in "What is a
             | method?" There's also a few new techniques like
             | "new_class_method", which requires creating an object (of
             | some kind) for that class... but what about that? It's not
             | "I've just fixed Tree-sitter's problem"; it's that Tree-
             | sitter hasn't yet resolved the problem yet - there are
             | other parsing problems besides Tree-sitter in Ruby itself
             | like those of classes (and classes are not part of Tree-
             | sitter) and things that are known as "type-traits" and so
             | on - so as it's not quite enough it can be done by other
             | things. The reason for using LR grammar is that when it
             | comes to this - what do I want from that grammar?
             | 
             | The point I'm making here is that LR doesn't give a reason
             | for what you're doing. As a programmer you are trying to
             | write code that is portable because - if it works in a
             | domain you don't understand (such as Ruby) - then you don't
             | know what you're doing is wrong. There can be a domain (as
             | in any language) that's a lot more complex than this - but
             | since we've got that, how can I be sure it won't mess up
             | the code I'm writing?
        
             | dcreager wrote:
             | Ahh my mistake! :-)
             | 
             | To be fair, we're cheating a little bit because the Ruby
             | grammar relies so heavily on an external scannar, which is
             | just under 1,000 lines of C++: https://github.com/tree-
             | sitter/tree-sitter-ruby/blob/master/...
        
               | chrisseaton wrote:
               | But for example how do you parse the difference between
               | `x = 14; x` and `y = 14; x`? In the latter case `x` is a
               | method call, and in the former it's a local variable
               | read. I can't see where the parser maintains a set of
               | local variables and where it queries this set. Is it
               | somehow done declaratively? If so that's a huge
               | achievement I don't think that's really been done before
               | in a parser generator.
               | 
               | I really want to try tree-sitter for using in an actual
               | Ruby implementation because it's so beautiful!
        
               | dcreager wrote:
               | [EDITED to make the example actually line up with OP's
               | test]
               | 
               | There's no symbol table in the parser, so at parse time,
               | we don't distinguish those cases:                 $ cat
               | test.rb       module Test         def test1           x =
               | 14; x         end              def test2           y =
               | 14; x         end       end       $ tree-sitter parse
               | test.rb       (program [0, 0] - [9, 0]         (module
               | [0, 0] - [8, 3]           name: (constant [0, 7] - [0,
               | 11])           (method [1, 2] - [3, 5]             name:
               | (identifier [1, 6] - [1, 11])             (assignment [2,
               | 4] - [2, 10]               left: (identifier [2, 4] - [2,
               | 5])               right: (integer [2, 8] - [2, 10]))
               | (identifier [2, 12] - [2, 13]))           (method [5, 2]
               | - [7, 5]             name: (identifier [5, 6] - [5, 11])
               | (assignment [6, 4] - [6, 10]               left:
               | (identifier [6, 4] - [6, 5])               right:
               | (integer [6, 8] - [6, 10]))             (identifier [6,
               | 12] - [6, 13]))))
               | 
               | In both cases the bit after the semicolon just parses as
               | (identifier).
               | 
               | For some use cases (e.g. syntax highlighting, depending
               | on your colorization rules) it doesn't matter, and so we
               | don't want to pay the cost. If it does matter (like in an
               | actual implementation), then you'd have to implement this
               | yourself and drive it by the parse tree you get from
               | tree-sitter.
        
               | chrisseaton wrote:
               | Right you could just have a phase to fix-it-up after
               | parsing. Much better than trying to shoe-horn an
               | imperative action into a nice more-pure parser. Great
               | idea!
        
         | anaerobicover wrote:
         | No, the Ruby grammar is actually an outlier from what I've
         | seen; it has one of the largest/most complex external scanners:
         | https://github.com/tree-sitter/tree-sitter-ruby/blob/master/...
         | 
         | Precisely because the language is complicated and less amenable
         | to LR parsing.
        
           | ComputerGuru wrote:
           | Not a ruby developer here: that sounds terrifying! Does it
           | make it harder to have a proper mental model of the language
           | (note: not the libraries) or is this mainly because of
           | flexibility (too many ways to skin one cat)?
        
             | anaerobicover wrote:
             | I don't write Ruby regularly either, but I wouldn't say
             | that _syntactic_ complexity, is necessarily equivalent to
             | _semantic_ complexity. And the syntax is the only part that
             | 's relevant to Tree-sitter: it's not an
             | interpreter/compiler.
             | 
             | Note also that (as I alluded to above) the parsing
             | technique that Tree-sitter uses, "LR parsing", makes some
             | things more difficult to parse than they'd be with another
             | kind of parser. This is a deliberate trade-off, because LR
             | parsing makes certain features of Tree-sitter, like fast
             | re-parsing in response to input changes, much much easier.
        
               | tp3 wrote:
               | So, a syntactic tree is a list of elements, grouped by
               | their ordering, which are to be parsed from their
               | arguments, as they appeared in the input. Or a grammar
               | tree, which is a set of elements. There's many things we
               | can do to make Tree-sitter simpler to read and write.
               | Perhaps, like in Perl, there are syntactic categories of
               | types that make it much easier to find things like nodes
               | in a tree, since they're the ones that come in the input.
               | Or I'd be willing to say that maybe, like in Haskell,
               | certain aspects of the language, are syntactic
               | categories, like the parser. So some things that might
               | not be obvious in code, like what the syntax for a class
               | of names is, might be obvious in theory, too. Or, at
               | least they might be obvious in a particular way. Or some
               | aspects of the compiler are really special, and we can
               | infer those in terms of what the compiler does. Or, of
               | course, we can do all these other things, too. We can
               | rewrite the parser, or the compiler, to try to do more or
               | less anything that the parser does. Or maybe we can make
               | Tree-sitter a lot simpler in general. Which I think is
               | probably what you've been thinking about.
        
             | codesnik wrote:
             | It's mostly to work _less_ surprising to the programmer,
             | AFAIR. Probably the most complexity is from having to
             | differentiate local variables and methods depending if the
             | symbol had an assignment before in the scope.
        
             | revscat wrote:
             | Flexibility. "Too many" is debatable: most organizations
             | wind up settling on a subset of the idioms that Ruby
             | provides, and some of the more esoteric constructs see
             | infrequent use anywhere.
             | 
             | There has been, however, discussion about the need to clean
             | up some of the lesser-used language feature, but obviously
             | doing so carries risks.
        
             | RangerScience wrote:
             | My mental model of Ruby is one the simplest of any of the
             | languages I've worked with, but it's also the hardest to
             | put into any words. JS actually does beat it out, and then
             | Scala and Python come after.
             | 
             | Everything is kind-of-but-not-really an object, a
             | reference, and a function, all at the same time - which
             | _sounds_ complicated but in my head... turns out to be
             | pretty simple. Everything 's just kind of different flavors
             | of the same thing. `attr_accessor` is a good place to see
             | this in action.
             | 
             | The flexibility comes more from the variety of available
             | core language options (procs, blocks, and lambdas) and core
             | libraries (map/each/collect, for example), not from a
             | variety of underlying concepts.
        
             | e12e wrote:
             | > Not a ruby developer here: that sounds terrifying! Does
             | it make it harder to have a proper mental model of the
             | language
             | 
             | It is a little terrifying in the sense that I'd not want to
             | write language level tools (eg: syntax highlighter).
             | 
             | But if you have scheme on one end and natural language on
             | the other, ruby leans a bit towards natural language - but
             | in a good way. In some ways ruby isn't that different from
             | Smalltalk - but it has a lot (sometimes I think too many,
             | sometimes not) _conveniences_.
             | 
             | Parantheses and brackets are largely optional "where it
             | makes sense". Conditionals support postfix, eg these are
             | equivalent:                 if should_send?()
             | send_mail({to: 'u@x.com'})        end            send_mail
             | to: 'u@x.com' if should_send?
        
       | brundolf wrote:
       | Here's what it looks like to call it from Rust:
       | https://github.com/tree-sitter/tree-sitter/tree/master/lib/b...
       | 
       | Seems like this would make it much easier to bootstrap a
       | performant language-server. Very cool; maybe that will be my next
       | project.
        
         | dcreager wrote:
         | We also have several of the language grammars published as
         | crates: https://crates.io/search?q=tree-sitter (And doing the
         | same for other grammars is a fairly painless process.)
         | 
         | So if you're writing a tool for a single language (like a
         | language server), it should be as easy as adding tree-sitter
         | and tree-sitter-blah to your cargo manifest.
        
           | brundolf wrote:
           | Awesome! Though my thinking was that it would have an
           | especially large impact for languages that aren't popular
           | enough to have their own LSP yet; you no longer have to be an
           | expert in writing interactive compilers to set up a
           | respectable LSP for a niche language, or even a home-grown
           | one
        
             | dcreager wrote:
             | Yes! This is a great point. It's similar to what I
             | mentioned over on this thread [1] about how we're working
             | on a more precise version of Code Navigation based on tree-
             | sitter. The tl;dr is that you'd write something like tree-
             | sitter queries [2], just like you do for the current fuzzy
             | Code Nav, but the query DSL would be a bit more
             | sophisticated, allowing you to specify the actual name
             | resolution rules of your language. One of the things we're
             | using to test this is an LSP shim that lets us test our
             | rules in VS Code (or any other LSP-compliant editor).
             | 
             | [1] https://news.ycombinator.com/item?id=26227476 [2]
             | https://tree-sitter.github.io/tree-sitter/using-
             | parsers#patt...
        
       | pcr910303 wrote:
       | To me, the most impressive use of tree-sitter was an iOS text
       | editor that uses it to parse huge JSON files / mixed language
       | files and highlight them in a very robust way. [0][1] I'm hoping
       | tree-sitter becomes more common like LSP and Emacs can get exact
       | highlighting and other tools with it...
       | 
       | [0]: https://twitter.com/simonbs/status/1352697855845273600
       | 
       | [1]: https://twitter.com/simonbs/status/1362492842141171720?s=21
        
         | ducktective wrote:
         | Yeah but I don't think LSP specs contain syntax-highlighting or
         | semantic highlighting.
        
           | [deleted]
        
           | orra wrote:
           | LSP supports semantic highlighting:
           | https://microsoft.github.io/language-server-
           | protocol/specifi...
           | 
           | Though AIUI the basic syntax highlighting is done by the
           | editor (e.g. VSCode uses Textmate grammar support).
        
         | picardythird wrote:
         | FYI there is tree-sitter.el for Emacs.
        
         | ACosmicDust wrote:
         | Emacs does have a package to use tree-sitter [0]. I think
         | emacs-lsp is aware of this highlighting backend and performs
         | pretty well.
         | 
         | (semantic highlighting is pretty slow for C++ with font-lock,
         | with tree-sitter it's a breeze :))
         | 
         | [0]: https://github.com/ubolonton/emacs-tree-sitter
        
       ___________________________________________________________________
       (page generated 2021-02-22 23:00 UTC)