[HN Gopher] Tree-sitter: an incremental parsing system for progr... ___________________________________________________________________ Tree-sitter: an incremental parsing system for programming tools Author : sbt567 Score : 331 points Date : 2021-02-22 15:03 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | alissasobo wrote: | You can watch a good Strangeloop presentation on Tree Sitter. | https://www.youtube.com/watch?v=Jes3bD6P0To | wiradikusuma wrote: | While we're in this discussion: Say I want to implement "SQL" for | my app (if you've used Jira, I want to make my own JQL). Is this | the tool for that? I'm looking for something much simpler than | ANTLR. | Grimm1 wrote: | I recently used this to put together a unified PL classification | model. It's nice because any language treesitter grows to support | we'll support pretty effortlessly and treesitter captures more | than enough nuance per language to derive high quality | classifications. | | It's fair to say we can classify a snippet of code based on | either single or multiple AST paths produced by treesitter. Right | now only doing the programming language but extending it to | function classification or description etc isn't out of the | question we just don't need it right now. | drewdennison wrote: | We've been using tree-sitter for Semgrep and it's nothing short | of incredible. Amazing work by Max and team. | ahelwer wrote: | I half-wrote a tree-sitter grammar for a niche DSL (the PRISM | probabilistic model checking language). It was a very nice | experience. It's part of another half-written side project to | create a language server for PRISM; I still haven't gotten around | to making the whole end-to-end pipeline work. | | With its syntax tree query frontend I wonder whether tree-sitter | would make a good interpreter frontend for some niche languages, | or you need something more powerful. | amelius wrote: | Next steps: incrementally resolve symbols and type-check? | dcreager wrote: | We're currently working on a more precise version of the Code | Nav that's shipped on github.com, which is very similar in | spirit to this! | ritter2a wrote: | I tried to use this to ease the front end work load of students | in a compiler project (building a C compiler) for a University | course, so that the project could be focused on the more | interesting middle and back end parts of the compiler. However, | reported bugs in the C grammar that saw no activity at all [1] | made this impossible. From this small sample of experiences, I | was left with the impression that Tree Sitter is great for things | like syntax highlighting, where wrong results are annoying but | not dramatic, but not so suitable for tools that need a really | correct syntax tree. | | --- [1] https://github.com/tree-sitter/tree-sitter-c/issues/51 | dang wrote: | If curious, past threads: | | _Tree-sitter: new incremental parsing system for programming | tools (2018) [video]_ - | https://news.ycombinator.com/item?id=21675113 - Dec 2019 (28 | comments) | | _Tree-sitter - a new parsing system for programming tools | [video]_ - https://news.ycombinator.com/item?id=18213022 - Oct | 2018 (25 comments) | | Others? | maxbrunsfeld wrote: | One more that I know of: | | _Atom understands your code better than ever before_ - | https://news.ycombinator.com/item?id=18349013 - Oct 2018 | Annili wrote: | I'm curious to see if Tree-sitter can be used to provide fast and | rich code navigation. I was able to implement simple goto | definition/references [1], not sure if it can be used for more | advanced navigation features in a language-agnostic way. | | If you're interested, GitHub is already using it [2] for that | purpose and Sourcegraph is experimenting it [3] | | [1] https://github.com/alidn/lsif-os [2] | https://github.com/github/semantic [3] | https://github.com/sourcegraph/sourcegraph/issues/17378 | maxbrunsfeld wrote: | At GitHub, we're in the process of building a more precise code | navigation system on top of Tree-sitter, that models language- | specific name-resolution rules in detail. | | Our currently-available code navigation system also uses Tree- | sitter, but it is pretty simple; it just matches up references | and definitions by their name. | himujjal wrote: | Wrote tree-sitter-svelte. Was a good experience. I am also | writing a programming language of my own similar to TypeScript | and I am using tree-sitter for the same. Its a delight to work | with it. Removes a lot of the worries. | ducktective wrote: | Is this the same thing neovim uses for syntax highlighting? | | Is there a chance for it getting integrated to vim? Last I | checked vim used a regex method which was slow and faulty. | ckolkey wrote: | Yup, neovim 0.5+ will be using treesitter for any supported | languages, with the current Regex highlighting as a fallback. | [deleted] | mkingston wrote: | Follow the nvim 0.5 release here: | https://github.com/neovim/neovim/milestone/19 | guerrilla wrote: | Is the use case for this mainly IDEs or is it intended to replace | traditional lexer and parser generators too? | dcreager wrote: | We are also using this to power a lot of the program analysis | features on github.com. We use it to generate the symbol list | for Code Navigation, as an example, and are starting to look at | extracting more semantic information about some languages using | tree-sitter parse trees as intermediaries. | patrec wrote: | I have used tree-sitter, but only for a very simple use case. | The main shortcoming I am aware of are error messages, see | here: | | https://github.com/tree-sitter/tree-sitter/issues/255 | | Tree sitter will basically always generate a parse tree, even | for malformed input, in which case it will add ERROR nodes for | the bits it doesn't like (it will also inform you that there | were problems with the parse by setting a boolean attribute). | So you have some information you can use to construct a useful | error message yourself, but some parser generators will handle | this better (although it has to be said that the difficulty of | obtaining good error messages from a parser generator are still | one of the main the reasons production parsers are mostly | written by hand). | guerrilla wrote: | Ah I see, so the reparation isn't avoidable for now? That | doesn't seem very appropraite for compilers then. | patrec wrote: | Why would it not be appropriate? The only annoyance I see | is that currently you will have to generate a good error | message from it yourself, but a first pass at the problem | shouldn't be too onerous. | guerrilla wrote: | Ok, I misunderstood. I thought it repaired without error | sometimes but I see that you were clear that that isn't | the case. | srcreigh wrote: | Tree Sitter is amazing. The parsing is fast enough to run on | every keystroke. The parse tree is extremely concise and | readable. It resembles an AST more than a parse tree (ie no 11 | levels of binary op precedence rules in the tree). The parse tree | emits specific ERROR nodes, so you can get a semi-functional tree | even with broken syntax. | | I can't wait for the tools to get built with this. Paredit for | TypeScript. Syntax-tree based highlighting (vs regex | highlighting). A command to "add an arg to current function" | which works across languages. A command to add a CSS class to the | nearest JSX node, or to walk up the tree at the className="| ..." | position, adding a new className if it doesn't exist. | | There's a nicely documented Emacs package for this [1]. The | documentation is at [2]. The parse trees work great. There's | syntax highlighting support and tree-walking APIs. There's a bit | of confusion about TSX vs typescript langs but it's fixable with | some config change [3]. | | [1]: https://github.com/ubolonton/emacs-tree-sitter [2]: | https://ubolonton.github.io/emacs-tree-sitter/ [3]: | https://github.com/ubolonton/emacs-tree-sitter/issues/66#iss... | Annili wrote: | > "Paredit for TypeScript" | | Is there a list of ideas for Structural Editing in C-like | languages? | | I can think of `extend-selection, `move to parent block`, `add | arg to function` | dcreager wrote: | Worth calling out that the syntax highlighting support is used | to highlight several languages in github.com. (Linguist is | still used for the long tail of languages, but we plan to | migrate more and more over to tree-sitter-based highlighting | over time.) | | The query language is also what's used to drive the | fuzzy/ctags-like Code Navigation feature. Both of those are | powered by tree-sitter query files defined in each language's | repo, like these for Go: https://github.com/tree-sitter/tree- | sitter-go/tree/master/qu... | eins1234 wrote: | Awesome to hear that amazing tech like tree-sitter lives on | even though Atom, the product it was built for, is pretty | much on life support at this point. | | Curious if there's any efforts to bring tree-sitter to | VSCode? Exposing tree-sitter to extensions could open up so | many possibilities like OP mentioned. | josteink wrote: | Tooting my own horn, Emacs' csharp-mode[1] is undergoing a | rewrite to be 100% based on tree-sitter rather than regexps. | | The new code runs way faster and is so much nicer to work with. | | Once all the kinks are gone, I can't imagine going back. | | [1] https://github.com/emacs-csharp/csharp- | mode/blob/master/csha... | robto wrote: | I'm so excited for this to become built-in in more places! I | think once non-lisp users can experience the Power of | Structural Editing they'll say, "Hey, I understand now why you | all feel so passionate about your parentheses!" | | And I can stop feeling like my fingers have all lost a knuckle | when I'm writing Typescript :) | rgossiaux wrote: | Neovim nightly already has some tools available as plugins. I'm | using tree-sitter for syntax highlighting, text objects, and | folding right now. Pretty satisfied so far. | mkingston wrote: | The official release of built-in treesitter comes with neovim | 0.5. Which _looks_ like it 'll be out pretty soon. I've been | watching a fairly steady march toward release here: | https://github.com/neovim/neovim/milestone/19 | tazjin wrote: | A friend of mine started working on an experimental Emacs mode | to provide structural navigation of code based on tree-sitter: | https://cs.tvl.fyi/depot/-/tree/users/Profpatsch/emacs-tree-... | | The potential for this is essentially something like Paredit, | but for all languages. | yewenjie wrote: | Can someone point some examples of what `paredit` for other | languages provide? I do various lisp programming occasionally | but have not used `paredit` yet. | tazjin wrote: | Check out this video for a quick demo: | http://emacsrocks.com/e14.html | | If you know a Lisp I recommend just giving paredit a spin | for a few minutes, it's an interesting experience. | z3t4 wrote: | Looks like it's mainly tree/code manipulation. Typing | code on the keyboard is probably the least taxing thing | when it comes to software development. But I guess it | will be nice once it has become a "reflex" rather then a | conscious key-combo. | tazjin wrote: | It's not so much about reducing the amount of characters | typed, and instead moving the way you think about code | from the character level to a more structural level. | | Calling it a "reflex" is an interesting phrase! Tools | like magit let me encode complicated processes into | muscle memory, in a way where retrieval doesn't have to | go through remembering and typing a string. Structural | editing is similar. | mumblemumble wrote: | I only started using it a few months ago. It's such a | natural way to edit code, it only took me about a day for | it to become reflexive. | | Now it just feels vaguely annoying to work without it. | It's fine, it's just one of those ergonomic changes that | nags at you a bit. Kind of like the opposite of that | feeling of taking off uncomfortable business clothes at | the end of the day. Or what I imagine people who are | better at vim than me keep talking about. | lalaithion wrote: | Maybe I can finally have this syntax highlighting style: | https://youtu.be/b0EF0VTs9Dc?t=900 | srcreigh wrote: | There is an emacs package for this (maybe beta). I can't | remember the name of it and Google is failing me. | | EDIT: finally found it https://github.com/alphapapa/prism.el | jackcviers3 wrote: | Rainbow delimiters mode kind of does this, but doesn't | maintain the scope color of referenced variables. | brundolf wrote: | The idea is pretty awesome, but my eyes nearly rolled out of | my head from the needless condescension at the beginning. | maxbrunsfeld wrote: | Hey, Tree-sitter author here. Thanks for posting! Let me know if | you have questions about the project. | gravypod wrote: | When I played around with tree sitter a bit I noticed there | were situations where ast elements didn't exactly contain what | I'd expect them to. For example: comments are represented in | the AST but unfortunately they don't have the contents of the | comment parsed out following the laguanges conventions. | | I was wondering if this is a case I could open an issue about? | Is this for the main tree sitter repo or should I open one | language-by-language? | | I was looking into automating some stuff across all languages | with tree-sitter but handling all of the languages comments | syntaxes made it very hard. | maxbrunsfeld wrote: | Most tree-sitter grammars just parse comments as a single | token. Can you give an example of what you mean when you say | "contents of the comment parsed out"? | | Are you talking about conventions like JSDoc, for putting | structured data inside of comments? On GitHub, we handle that | by parsing JSDoc comments in a separate pass, using a | separate parser. We do it this way because JSDoc isn't really | part of the JavaScript language, not all projects use JSDoc, | and not all applications are interested in parsing the text | inside of comments. | gugagore wrote: | My guess is that they meant parsing code that has been | "commented out". | lemming wrote: | Is it possible to use tree-sitter to generate parsers in | languages other than C? How hard would it be to modify it to | create parsers in e.g. Java? | | _Edit:_ sorry, I just saw that you had answered that below. | anaerobicover wrote: | I've done two grammars for my own use in the last few months | (well, one isn't quite complete yet) and it's been quite an | enjoyable (learning) experience. Thanks for sharing this tool! | maxbrunsfeld wrote: | That's great to hear. Thanks! | autoditype wrote: | Thanks for building this. I had not heard of it before, but it | looks great Are there more tutorials elsewhere on the Internet | you would recommned, besides what is in the documentation? | maxbrunsfeld wrote: | Not that I know of, right now :(. | | In the near future, we'll create some more GitHub-specific | documentation that walks you through how to add advanced | language support for any programming language on GitHub, by | writing a Tree-sitter grammar, and then by writing the _tree | queries_ that are used for syntax highlighting, simple code | navigation, and someday soon... _precise code navigation_. | yig wrote: | Are there any plans to support modifying the grammar on the fly | or without recompiling? | maxbrunsfeld wrote: | One day, I would love to generalize the web-based playground | so that you could edit the grammars. But it's complicated, | because we use C as our output language, so you would always | need to recompile the C after changing the grammar. | | So, I would say that it's not on our near-term roadmap. | dcreager wrote: | I don't think you can do this without recompiling, since the | grammars get translated into C code before use. But the | built-in command line tools ('tree-sitter parse', etc) all | support a mode where they will detect local changes to a | checked-out grammar definition, and recompile on the fly if | needed. (This happens each time the CLI program is started | up; it doesn't happen during a long-running process.) | sitkack wrote: | The obvious answer is to embed TCC or another C compiler | and either generate a dynamic library or generate wasm and | load it directly into the process. | | exec_wasm(generate_wasm(generate_c(grammar))) | | Now if you can make that whole fn chain incremental, then a | delta_grammar -> delta_c -> delta_wasm -> | delta_recomputed_wasm_call stack, this will propagate | deltas down to exec_wasm and you could dynamically execute | the generated code as the grammar changes. | akavel wrote: | There's been some recent discussion as to whether tree-sitter | grammars can be used to parse markdown with some hacks or not | (currently it's being done by working around all the tree- | sitter machinery, resulting in a lot of problems), with no | consensus among plugin authors: | | https://github.com/nvim-treesitter/nvim-treesitter/issues/87... | | Could you possibly chime into that discussion and help them | with any possible insights you might have on that? That would | be really awesome! TIA <3 | fiddlerwoaroof wrote: | I've been using tree-sitter via FFI from Common Lisp, but what | I'd really like would be a way to write my own code generator | so that the generated parser could be "native" lisp code. | Otherwise, it's an amazing tool: my only other complaint would | be the lack of a grammar for objective-c which would be useful | for a lisp/objective-c bridge I've been working on. | maxbrunsfeld wrote: | I think that it'd be pretty easy to generate parser code in | other languages besides C, but it would be a lot of work to | do to port the core library itself[1] to those other | languages. | | [1] https://github.com/tree-sitter/tree- | sitter/tree/master/lib/s... | | I agree about the Objective-C grammar! Although it looks like | somebody's started work on it: | | https://github.com/merico-dev/tree-sitter-objc | josephg wrote: | There's an architecture for compilers that I've been wanting | for years where a keystroke change to the sourcecode results in | an incremental change to the AST, and then the compiler can | consume that AST delta to generate a binary patch to the | compiled executable. | | Would tree-sitter be able to be used for that? (What I want is | to feed tree-sitter a stream of keystroke changes and get out a | stream of minimal AST changes as a result). | chrisseaton wrote: | Tree-sitter is unfathomable to me. This is the grammar for Ruby: | | https://github.com/tree-sitter/tree-sitter-ruby/blob/master/... | | I find it absolutely amazing that a grammar for something as | complicated as Ruby can be so concise. Less than a thousand | lines. The corresponding Bison grammar is 13k lines. And I think | the tree-sitter one is scannerless so also includes the lexer?! | How do they do it? | codesnik wrote: | bison should be compared to https://github.com/tree- | sitter/tree-sitter-ruby/blob/master/... probably? | chrisseaton wrote: | No the JSON file there is generated (I believe?) from the | JavaScript I linked, while the Bison file is hand-written. | | With tree-sitter you're hand-writing a 1k file. With Bison | you're hand-writing a 13k file. | dcreager wrote: | This is more a function of Ruby than of tree-sitter. The tree- | sitter grammars for other languages are hopefully less | inscrutable. For Ruby, we basically just ported whitequark's | parser [1] over to tree-sitter's grammar DSL and scanner API. | | [1] https://github.com/whitequark/parser | chrisseaton wrote: | I didn't mean the tree-sitter grammar was not understandable | - it's very understandable - I just can't work out how to | managed to find such a concise way to express grammars. Even | compared to Whitequark it's 1/3 the size. What's the unique | thing you do that makes it so concise? | | It also seems somehow to be completely declarative? How have | you managed to transform Ruby parsing to be context-free? For | example where's the set of what's currently a local variable | so you can distinguish from method calls? | tp3 wrote: | The code is obviously much simpler than its syntax - most | importantly, its syntactical simplicity makes it way easier | to deal with. So when you write the code to parse it you | don't have to try to parse it in one fell swoop like you do | in Whitequark. | | So you can't read anything from a method call! I can make | it so, if you're doing a class method (of any kind) you | have to invoke the constructor, as described in "What is a | method?" There's also a few new techniques like | "new_class_method", which requires creating an object (of | some kind) for that class... but what about that? It's not | "I've just fixed Tree-sitter's problem"; it's that Tree- | sitter hasn't yet resolved the problem yet - there are | other parsing problems besides Tree-sitter in Ruby itself | like those of classes (and classes are not part of Tree- | sitter) and things that are known as "type-traits" and so | on - so as it's not quite enough it can be done by other | things. The reason for using LR grammar is that when it | comes to this - what do I want from that grammar? | | The point I'm making here is that LR doesn't give a reason | for what you're doing. As a programmer you are trying to | write code that is portable because - if it works in a | domain you don't understand (such as Ruby) - then you don't | know what you're doing is wrong. There can be a domain (as | in any language) that's a lot more complex than this - but | since we've got that, how can I be sure it won't mess up | the code I'm writing? | dcreager wrote: | Ahh my mistake! :-) | | To be fair, we're cheating a little bit because the Ruby | grammar relies so heavily on an external scannar, which is | just under 1,000 lines of C++: https://github.com/tree- | sitter/tree-sitter-ruby/blob/master/... | chrisseaton wrote: | But for example how do you parse the difference between | `x = 14; x` and `y = 14; x`? In the latter case `x` is a | method call, and in the former it's a local variable | read. I can't see where the parser maintains a set of | local variables and where it queries this set. Is it | somehow done declaratively? If so that's a huge | achievement I don't think that's really been done before | in a parser generator. | | I really want to try tree-sitter for using in an actual | Ruby implementation because it's so beautiful! | dcreager wrote: | [EDITED to make the example actually line up with OP's | test] | | There's no symbol table in the parser, so at parse time, | we don't distinguish those cases: $ cat | test.rb module Test def test1 x = | 14; x end def test2 y = | 14; x end end $ tree-sitter parse | test.rb (program [0, 0] - [9, 0] (module | [0, 0] - [8, 3] name: (constant [0, 7] - [0, | 11]) (method [1, 2] - [3, 5] name: | (identifier [1, 6] - [1, 11]) (assignment [2, | 4] - [2, 10] left: (identifier [2, 4] - [2, | 5]) right: (integer [2, 8] - [2, 10])) | (identifier [2, 12] - [2, 13])) (method [5, 2] | - [7, 5] name: (identifier [5, 6] - [5, 11]) | (assignment [6, 4] - [6, 10] left: | (identifier [6, 4] - [6, 5]) right: | (integer [6, 8] - [6, 10])) (identifier [6, | 12] - [6, 13])))) | | In both cases the bit after the semicolon just parses as | (identifier). | | For some use cases (e.g. syntax highlighting, depending | on your colorization rules) it doesn't matter, and so we | don't want to pay the cost. If it does matter (like in an | actual implementation), then you'd have to implement this | yourself and drive it by the parse tree you get from | tree-sitter. | chrisseaton wrote: | Right you could just have a phase to fix-it-up after | parsing. Much better than trying to shoe-horn an | imperative action into a nice more-pure parser. Great | idea! | anaerobicover wrote: | No, the Ruby grammar is actually an outlier from what I've | seen; it has one of the largest/most complex external scanners: | https://github.com/tree-sitter/tree-sitter-ruby/blob/master/... | | Precisely because the language is complicated and less amenable | to LR parsing. | ComputerGuru wrote: | Not a ruby developer here: that sounds terrifying! Does it | make it harder to have a proper mental model of the language | (note: not the libraries) or is this mainly because of | flexibility (too many ways to skin one cat)? | anaerobicover wrote: | I don't write Ruby regularly either, but I wouldn't say | that _syntactic_ complexity, is necessarily equivalent to | _semantic_ complexity. And the syntax is the only part that | 's relevant to Tree-sitter: it's not an | interpreter/compiler. | | Note also that (as I alluded to above) the parsing | technique that Tree-sitter uses, "LR parsing", makes some | things more difficult to parse than they'd be with another | kind of parser. This is a deliberate trade-off, because LR | parsing makes certain features of Tree-sitter, like fast | re-parsing in response to input changes, much much easier. | tp3 wrote: | So, a syntactic tree is a list of elements, grouped by | their ordering, which are to be parsed from their | arguments, as they appeared in the input. Or a grammar | tree, which is a set of elements. There's many things we | can do to make Tree-sitter simpler to read and write. | Perhaps, like in Perl, there are syntactic categories of | types that make it much easier to find things like nodes | in a tree, since they're the ones that come in the input. | Or I'd be willing to say that maybe, like in Haskell, | certain aspects of the language, are syntactic | categories, like the parser. So some things that might | not be obvious in code, like what the syntax for a class | of names is, might be obvious in theory, too. Or, at | least they might be obvious in a particular way. Or some | aspects of the compiler are really special, and we can | infer those in terms of what the compiler does. Or, of | course, we can do all these other things, too. We can | rewrite the parser, or the compiler, to try to do more or | less anything that the parser does. Or maybe we can make | Tree-sitter a lot simpler in general. Which I think is | probably what you've been thinking about. | codesnik wrote: | It's mostly to work _less_ surprising to the programmer, | AFAIR. Probably the most complexity is from having to | differentiate local variables and methods depending if the | symbol had an assignment before in the scope. | revscat wrote: | Flexibility. "Too many" is debatable: most organizations | wind up settling on a subset of the idioms that Ruby | provides, and some of the more esoteric constructs see | infrequent use anywhere. | | There has been, however, discussion about the need to clean | up some of the lesser-used language feature, but obviously | doing so carries risks. | RangerScience wrote: | My mental model of Ruby is one the simplest of any of the | languages I've worked with, but it's also the hardest to | put into any words. JS actually does beat it out, and then | Scala and Python come after. | | Everything is kind-of-but-not-really an object, a | reference, and a function, all at the same time - which | _sounds_ complicated but in my head... turns out to be | pretty simple. Everything 's just kind of different flavors | of the same thing. `attr_accessor` is a good place to see | this in action. | | The flexibility comes more from the variety of available | core language options (procs, blocks, and lambdas) and core | libraries (map/each/collect, for example), not from a | variety of underlying concepts. | e12e wrote: | > Not a ruby developer here: that sounds terrifying! Does | it make it harder to have a proper mental model of the | language | | It is a little terrifying in the sense that I'd not want to | write language level tools (eg: syntax highlighter). | | But if you have scheme on one end and natural language on | the other, ruby leans a bit towards natural language - but | in a good way. In some ways ruby isn't that different from | Smalltalk - but it has a lot (sometimes I think too many, | sometimes not) _conveniences_. | | Parantheses and brackets are largely optional "where it | makes sense". Conditionals support postfix, eg these are | equivalent: if should_send?() | send_mail({to: 'u@x.com'}) end send_mail | to: 'u@x.com' if should_send? | brundolf wrote: | Here's what it looks like to call it from Rust: | https://github.com/tree-sitter/tree-sitter/tree/master/lib/b... | | Seems like this would make it much easier to bootstrap a | performant language-server. Very cool; maybe that will be my next | project. | dcreager wrote: | We also have several of the language grammars published as | crates: https://crates.io/search?q=tree-sitter (And doing the | same for other grammars is a fairly painless process.) | | So if you're writing a tool for a single language (like a | language server), it should be as easy as adding tree-sitter | and tree-sitter-blah to your cargo manifest. | brundolf wrote: | Awesome! Though my thinking was that it would have an | especially large impact for languages that aren't popular | enough to have their own LSP yet; you no longer have to be an | expert in writing interactive compilers to set up a | respectable LSP for a niche language, or even a home-grown | one | dcreager wrote: | Yes! This is a great point. It's similar to what I | mentioned over on this thread [1] about how we're working | on a more precise version of Code Navigation based on tree- | sitter. The tl;dr is that you'd write something like tree- | sitter queries [2], just like you do for the current fuzzy | Code Nav, but the query DSL would be a bit more | sophisticated, allowing you to specify the actual name | resolution rules of your language. One of the things we're | using to test this is an LSP shim that lets us test our | rules in VS Code (or any other LSP-compliant editor). | | [1] https://news.ycombinator.com/item?id=26227476 [2] | https://tree-sitter.github.io/tree-sitter/using- | parsers#patt... | pcr910303 wrote: | To me, the most impressive use of tree-sitter was an iOS text | editor that uses it to parse huge JSON files / mixed language | files and highlight them in a very robust way. [0][1] I'm hoping | tree-sitter becomes more common like LSP and Emacs can get exact | highlighting and other tools with it... | | [0]: https://twitter.com/simonbs/status/1352697855845273600 | | [1]: https://twitter.com/simonbs/status/1362492842141171720?s=21 | ducktective wrote: | Yeah but I don't think LSP specs contain syntax-highlighting or | semantic highlighting. | [deleted] | orra wrote: | LSP supports semantic highlighting: | https://microsoft.github.io/language-server- | protocol/specifi... | | Though AIUI the basic syntax highlighting is done by the | editor (e.g. VSCode uses Textmate grammar support). | picardythird wrote: | FYI there is tree-sitter.el for Emacs. | ACosmicDust wrote: | Emacs does have a package to use tree-sitter [0]. I think | emacs-lsp is aware of this highlighting backend and performs | pretty well. | | (semantic highlighting is pretty slow for C++ with font-lock, | with tree-sitter it's a breeze :)) | | [0]: https://github.com/ubolonton/emacs-tree-sitter ___________________________________________________________________ (page generated 2021-02-22 23:00 UTC)