[HN Gopher] I'm Porting the TypeScript Type Checker Tsc to Go ___________________________________________________________________ I'm Porting the TypeScript Type Checker Tsc to Go Author : msoad Score : 230 points Date : 2022-01-25 17:07 UTC (5 hours ago) (HTM) web link (kdy1.dev) (TXT) w3m dump (kdy1.dev) | nevir wrote: | Super skeptical that porting this to a new language is going to | have the kind of performance gains they expect ... assuming they | want to replicate all of tsc's functionality | eyelidlessness wrote: | It's already proven (esbuild, SWC, Bun) to drastically improve | performance of TypeScript's other responsibility: compilation | (tsc doesn't offer an option for this, but ts-node does, for | comparison). And while the type checker is certainly more | complex, I have a hard time imagining it won't show similar | improvements in a native language. | resoluteteeth wrote: | Compiling typescript is basically just stripping the type | definitions/annotations and other typescript-specific parts | of the syntax out. That's a huge difference from actually | type checking it. | mariogintili wrote: | k__ wrote: | tsc relies in shared state and is slow, so they port it to a | language that allows shared state. | | Strange reasoning. | pdpi wrote: | tsc is slow, so you port it to a faster language. tsc relies on | shared state, so you use a language that supports shared state | so you're just porting the same design to a new language | instead of redesigning the whole program. | Yoric wrote: | (er, some text in the parent seems to have vanished) | | Sure, if you absolutely wish it to, in an `unsafe` block, you | can define two writeable pointers to the same piece of data. I | don't think I've ever seen anybody write such code in Rust | (except perhaps when linking to a C library that required it?), | nor have I ever felt the need to do so. | | Now, it is true that Rust is designed to discourage writing | cyclic data structures. I can imagine that trying to port | (rather than redesign) an algorithm that deeply requires cyclic | data structures, that will make your life miserable. | wizzwizz4 wrote: | You actually _can 't_ define two `&mut` references to the | same data in Rust. It's invalid Rust code; if the compiler | notices, it'll refuse to compile, and if it _doesn 't_ | notice, your code will randomly break. | Yoric wrote: | You can define two `*mut`, though, in unsafe code. | anderskaseorg wrote: | A better option is to use Cell or RefCell; they allow | completely safe shared mutability in single-threaded | code. | | https://doc.rust-lang.org/std/cell/ | jerf wrote: | It sounds like you think "tsc is slow" _because_ "it uses | shared state". If that is what you're getting at, you've | connected two unconnected statements. | | It does tend to imply the new tsc isn't going to be able to | take much more advantage of concurrency than the current one, | which may put a limit on what performance gains it can expect. | I don't know if the current tsc is able to do anything | concurrently. (Here, "concurrently" doesn't merely mean "able | to have multiple events in flight and thus do multiple tasks at | the same 'time'" but truly "doing work on multiple CPUs | simultaneously".) Go is generally faster than Node, but it'll | be hard to guess by how much in this case. | | A Rust port would be harder but would probably lead on in a | direction to improve the concurrency. However, that may be an | infeasible amount of work, not because of Rust per se but | because a port is intrinsically easier than a port _and_ a | rearchitecting. | munificent wrote: | The reasoning is only strange if you assume that the cause of | slowness is the shared state. | | CPUs have no problems with mutability. JS is slow because of | dynamic typing, dispatch, flexible object layout, and an object | model that requires lots and lots of heap allocation and | pointer chasing. | pjscott wrote: | tsc is a large JavaScript program run at the command line, | which is a pathological case for a JIT-compiled language: by | the time the hot spots get native code generated, tsc may be | close to finished already. Porting it to any ahead-of-time | compiled language, without significant changes to the structure | of the code, is likely to be a big performance win. | tshaddox wrote: | As other people have pointed out, it doesn't seem like the | author (or anyone else, perhaps other than you) is suggesting | that shared state is a significant reason that tsc is slow. | | But moreover, my takeaway was that the author is making a | distinction between a _complete rewrite_ and a _port_ , where | the former seems to be any rewrite that just tries to do the | same thing (but may be architecturally very different), while | the latter seems to be an approach where you attempt to | directly adapt the original source code into a new language on | a statement-by-statement basis. The latter would likely require | the target language to support the programming paradigms which | are used heavily in the original source code. | konart wrote: | >shared mutability | | Sure but this goes against "Don't communicate by sharing memory", | no? | scottlamb wrote: | > Sure but this goes against "Don't communicate by sharing | memory", no? | | Yes, but that's advice rather than something Go requires. It's | reasonable to say it's outweighed by the desire to avoid | significantly restructuring the existing logic. The author is | explicitly choosing to match the upstream program's structure | to limit the project scope, and whether that structure is good | or not is almost besides the point. | | btw: especially in the Rust community, shared mutability | doesn't always mean shared between threads/tasks. It's also a | concern in single-threaded/non-concurrent code due to aliasing. | This case does actually seem to be involving multithreading | given that the author mentioned "using 8 threads". | newlisp wrote: | Curious, can you elaborate on what are the issues of | mutability in single-threaded code? | scottlamb wrote: | It comes down to how a language construct allocates freedom | to the programmer vs the optimizing compiler. | | * Rust references are strict. &mut references guarantee | that nothing else can access the backing memory while the | &mut is valid. & references guarantee that nothing | (including &mut by the rule above, or unsafe code with raw | pointers by programmer's pinkie swear) can change the | backing memory while the &mut is valid. This allows the | compiler to perform more optimizations behind the scene, | like assuming a value loaded into a register is still up- | to-date, even if there's been a function call between. | | * Rust raw pointers make no guarantees (but require the | programmer to type "unsafe" to use). | | * C pointers are IMHO weirder. iirc, two pointers of the | same type may alias (unless you use the rare "restrict" | keyword), but two pointers of different type can't, unless | one is "char*". | | * AFAIK, Go pointers are at least as permissive as C's. | They might not make any aliasing guarantees at all. | | * Other languages vary... | sjhuang26 wrote: | One thing that was not addressed in the post is the option of | using a GC library in Rust. I'm not a Rust expert, but to me, it | seems like this makes it quite feasible to quickly port a | compiler. They could wrap most complex data structures in GC and | optimize them as time goes by. That being said, I'd assume that | the library's GC would be slower than Go, so perhaps that's a | deal-breaker. | jkelleyrtp wrote: | TSC doesn't need to "stick around", right? Just a run-once and | the program is over? | | In those cases, https://github.com/fitzgen/bumpalo works | amazingly as an arena. You can pretty much forget about reference | counting and have direct references everywhere in your graph. The | disadvantage is that it's hard to modify your tree without | leaving memory around. | | We use it extensively in http://github.com/dioxusLabs/dioxus and | don't need to worry about Rc anywhere in the graph/diffing code. | anonymousCar wrote: | I think --watch mode means it's running while developing | connor4312 wrote: | Additionally, the language server used to power editors is | long lived. TypeScript isn't just the compiler. | smasher164 wrote: | I wonder how it would fare if instead of rewriting from scratch, | they applied automated transformation from the Typescript source | code to Go. It wouldn't have to capture all the semantics of | TS/JS, but be opinionated enough to transform the Typescript | source. | | Another possibility is to use something like GraalVM native-image | to compile Typescript's type-checker into a binary. This might | offer a speedup over its execution on V8. | marcus_holmes wrote: | Given that ESBuild [0] is a way better replacement for Webpack | already (and similarly written in Go) I can see how we could | pretty soon have an entire toolchain written in Go for dealing | with compiling JS. | | Hopefully the whole toolchain will be embeddable, too. Being able | to do configuration-in-code and produce an executable that does | the whole build step in milliseconds would be cool. | | [0] https://esbuild.github.io/ | newlisp wrote: | The author of esbuild also started the implementation of | esbuild in rust, found it painful too and switched to build it | in go. | pjmlp wrote: | For me the ideal use case for Rust is really for places where | automatic memory management is a hard sell, even if | technically possible, kernels, drivers, hypervisors, | firmware, GPGPU, ... | | For everything else I rather enjoy the productivity of | automatic memory management. | [deleted] | moltonel3x wrote: | SWC (by the same author as this tentative tsc port) is a part | of more and more people's typescript toolchain, and is written | in Rust. Having all your ecosystem in a single language feels | nice, but pragmatically different tools might be better in | different languages. | marcus_holmes wrote: | The point of having it all embeddable in Go is that we could | write a single executable that handles the entire process, | with the toolchain embedded and the configuration written as | code. | tantalor wrote: | This is a "rewrite" not a "port". | londons_explore wrote: | At first I thought this was for type checking Go code... Ya know, | all that interface{} stuff... | wrs wrote: | 62X seems like a lot. It makes me wonder if someone has put | serious effort into profiling and optimizing the JavaScript | version. If not, that could be a better use of time than a port. | I'd guess the TypeScript team may not be interested in | maintaining a Go version, so this port will will always be behind | the current TypeScript definition. | sam0x17 wrote: | Oof trust me from experience you do _not_ want to write a | compiler or parser or anything like that in Go. The string | support is so limited -- for example you have to implement basic | things like reverse string from scratch instead of them (and much | more) being in the standard library. Much better to use something | like Crystal or Rust. | ricardobeat wrote: | ESBuild seems to be doing fine. | | I'm a crystal fan though, would love to see more tooling use | it. | pjmlp wrote: | Way better than C. | ngrilly wrote: | The only string handling a compiler needs is usually during | tokenization. After that the compiler only manipulates tokens, | AST, SSA, IR, etc. | sam0x17 wrote: | And then you face Go's usual awkward issues with having a | lack of generics which becomes important when, say, | traversing a tree. | pjmlp wrote: | Thankfully that will end next month. | constexpr wrote: | I'm the author of esbuild. I hadn't written Go before I | started esbuild and I thought I would miss generics more, | but I didn't. There is literally only one situation in the | whole esbuild code base where I've found myself wishing for | generics: sorting an array. The workaround is easy and | involves implementing an interface with three methods, each | of which is typically a single line: | https://pkg.go.dev/sort#Interface. And I only needed a few | of these interfaces for a few different types. That wasn't | nearly enough of a reason to not use Go, especially | considering the other advantages. | Zababa wrote: | I see a few people wanting to rewrite tsc in something else than | Typescript because tsc itself is too slow. I don't remember | seeing any "proof" or analysis that Typescript is the limiting | factor in tsc. The basis often used is the performance of esbuild | compared to other bundlers. From the esbuild FAQ, "Why is esbuild | fast?": | | > Several reasons: | | > It's written in Go and compiles to native code. | | > Parallelism is used heavily. | | > Everything in esbuild is written from scratch. | | > Memory is used efficienty. | | Using another language will solve 1 and will maybe solve 2. From | this FAQ too: | | > For example, many bundlers use the official TypeScript compiler | as a parser. But it was built to serve the goals of the | TypeScript compiler team and they do not have performance as a | top priority. Their code makes pretty heavy use of megamorphic | object shapes and unnecessary dynamic property accesses (both | well-known JavaScript speed bumps). | | tsc itself seems like it could be optimized. By how much, I have | no idea. But it sounds like it would be easier than writing | something from scratch. esbuild, while a great tool, is limited | in scope. Typechecking Typescript on the other hand is a moving | target, and a fast one at that. All in all, I'm not convinced by | this approach compared to "regular performance work" on tsc | itself. I think people talk about that because it's easier to | start your project than to change tsc. In that case, the problem | sound political rather than technical. | | There's the x62 speed from the article, but there's no info on | what features are supported. I feel like this is something where | the devil is in the details. | | I'm happy being proven wrong on that subject, I fear that we're | going increase the number of tools in the frontend ecosystem for | no good reasons, and make every a bit harder again. | n0w wrote: | I had the same thought. | | A direct port of tsc to a "faster" language doesn't guarantee a | faster type checker. Sure it might be a bit faster, but my | guess is that the large performance wins come from a different | design that takes performance into consideration from the | start. | | Maybe the thought process is that a 1:1 rewrite will handle | compatibility and feature completeness? I suppose you could | optimise from there? | scottlamb wrote: | The "why not port to Rust" part boils down to these paragraphs: | | > tsc uses a lot of shared mutability and many parts depend on | garbage collection. Even though I'm an advocate and believer in | Rust, it doesn't feel like the right tool for the job here. Using | Rust required me to use unsafe too much for this specific | project. | | > tsc depends on shared mutability and has a cyclical mutable | reference. Rust is designed to prevent this behavior. Having two | references to the same data (shared mutability) is undefined | behavior in Rust. | | I'd be curious to hear more. In particular: | | * Two references to the same data: were cells too awkward? | | * Actual garbage collection: my first instinct would be to use an | arena for each compilation, destroying the whole thing at once | rather than doing reference counting or garbage collection. I | gather that's not cool in the age of language servers and | incremental compilation, though. Does tsc support those? | yazaddaruvala wrote: | All of that is valid for a rewrite, but not efficient if you're | porting the current implementation. | | As I understand it, in this case, Rust would be able to have a | line to line port from JavaScript. | | I do still think your overall point holds. Why not use a | library in the Rust ecosystem to help with this problem e.g. | some GC<T> type. | scottlamb wrote: | > All of that is valid for a rewrite, but not efficient if | you're porting the current implementation. | | Help me see it? | | I can easily see how getting rid of the cyclical structure | crosses from "line-to-line port" into "rewrite" territory. | I'm not saying you're wrong, but I'm having more trouble | seeing how "wrap some types in cells and pass along a | lifetime and arena parameter" crosses that threshold. It | might not be feasible at all, though, if tsc compilations are | long-lived for language server + incremental compilation | support. | | > Why not use a library in the Rust ecosystem to help with | this problem e.g. some GC<T> type. | | iirc I've seen some experimental GC library mentioned. I | think something like that is at least off the beaten path | enough that I can see how the author might decide it's better | to just use Go instead. | andreypopp wrote: | Well... How many paid work hours spent on tsc? Then does | TypeScript have a formal language specification? I'm pessimistic | it's possible to rewrite tsc with full compat (bug-to-bug, | otherwise it doesn't really make much sense) without MS | support... but who knows... I'd like fast compiles too, so good | luck! | paavohtl wrote: | A formal specification would be basically useless, if nothing | guarantees that the reference implementation actually follows | it. Sharing & contributing to the same test suite is a much | more practical approach. | andreypopp wrote: | Specification isn't being followed is of course useless. But | yeah, tests are definitely useless and in the absence of the | spec is the only sane way to approach such rewrite. | quickthrower2 wrote: | As a smoke you can swap it in for a lot of the npm ecosystem | and run the tests of those projects on the compiled code. | silasb wrote: | I'm surprised to see no one comment on Zig. It seems like the | bun.sh developers are making some bold changes with such a low- | level language. | kbd wrote: | I'm super bullish on Zig, but it's a bit early to bank on for | production things as the language is still changing/being | developed. | | To give another example of where this came up: they were | discussing using Zig for the Ruby JIT, but despite really | liking Zig the team decided to go with Rust because of its | maturity: https://bugs.ruby-lang.org/issues/18481#note-21 | newlisp wrote: | I'm more curious about if the author research Ocaml as an | alternative. | forty wrote: | go means it's going to be easier to integrate to esbuild :) | | I'm happy to see this, as honestly I don't really understand why | you would compile TS without type checking. | apazzolini wrote: | We use SWC with ts-node to strip types in dev for faster | iteration loops and run the full tsc typechecker in CI (and | optionally as a separate process on your local machine). | forty wrote: | Answering to myself, a colleague suggestion (from tsc doc | actually): use tsc --noEmit (for type checking) and a fast | compiler. | | tsc --noEmit duration is 2/3 of a normal build on our fairly | large code base, so that can definitely speeds things up (for | CI for example) | eatonphil wrote: | > I'm happy to see this, as honestly I don't really understand | why you would compile TS without type checking. | | I don't think I'm remotely alone in doing this but I'll share | from my perspective. I have two modes while writing TypeScript | stuff: 1) prototyping something where I don't want to spend | time writing types (or maybe I write some types but I don't | care if they are correct or line up yet) but I still want to | have JavaScript generated so I can test the thing out manually | and 2) getting types correct as part of long-term maintenance | and edge-case removal. | | There are fast TypeScript -> JavaScript generators today like | swc and esbuild. tsc is not fast. So I can speed up "1)" from | above if I use swc and esbuild and only use tsc for "2)" from | above. | | As people write faster implementations of tsc-the-type-checker | though I'll be happy to use that instead and just not use tsc | anymore. | tshaddox wrote: | Out of curiosity, how often do you have a project in the | first mode that gets large enough that the speed of compiling | TS to JS is significant? | eatonphil wrote: | There is a huge difference even in small and mid-size | projects. | | https://datastation.multiprocess.io/blog/2021-11-13-benchma | r... | nicoburns wrote: | If you're writing UI code then even a few seconds is slow. | You don't want to be waiting 5 seconds every time you move | something a couple of pixels to the left. | m-hilgendorf wrote: | > 1) prototyping something where I don't want to spend time | writing types | | It doesn't compile to JS, but this is one of the canonical | use cases for Stanza 's (1) "optional" type system | | (1) http://lbstanza.org/optional_typing.html | mixedCase wrote: | Have you tried doing Type-Driven Development? IME, most of | the time constraints are determined through types first with | the code being done later tends to replace the game of "what | kind of complex and non-obvious types fit my code the most" | with a "oh, the implementation is obvious" reaction. | | It also makes it easier to determine what behavior is worth | testing and which ones are already covered by your types, if | you tend to write tests last. If instead you're used to Test- | Driven Development you can do types->tests->code. | eatonphil wrote: | I've spent a lot of time programming in Standard ML and | yeah that's a prerequisite there. | | But no I don't prefer it to what I do in TypeScript. | forty wrote: | That makes sense, I would generally use JS directly for mode | 1), but I understand the use case. | mattkrick wrote: | Personally, I let vscode do typechecking on open files & have a | pre-commit git hook to typecheck changed files. | | When it comes to starting up a development server & building | the client, there's a huge cost to repeatedly typechecking the | same 1000+ files. By cutting out typechecking & only compiling, | I can reduce the webpack client build from 40 seconds to 3 | seconds (using sucrase). | forty wrote: | I'm answering to you but this is really for most siblings: | | I maintain a fairly large TS code base (nodejs backend). I | think a full rebuild is 1:30 on my relatively recent laptop | (i7-8650U, lots of RAM). But in practice I always use tsc -w | and compile is mostly instant after editing a file (I do have | to wait whenever I launch the command after I boot, but after | that, it's fast enough). | | tsc now support incremental compilation too, though I haven't | played with it too much as I'm happy with watch mode. | rezonant wrote: | Incremental compilation is _okay_ but it seems to perform | full rebuilds if the previous build is _too_ old which is | weird- meaning that more builds than expected take the | entire time to complete. Nonetheless I use it for all of my | builds. On the other hand this may just be because of the | spiraling nature of type inference, where TS needs to re- | examine a large amount of the program from a change that | appears simple. | | Personally I only have a small number of projects that take | more than 10-20 seconds to compile, but those ones are | painful. I should probably do the same with -w for those. | morelisp wrote: | What if not everyone else on your team is paying attention to | their editor's warnings? What if the LS's async response | doesn't come fast enough? What if, god forbid, you have to | change something on a different computer? | | If it's not in your CI/CD scripts, it's not actually getting | checked. | | > I can reduce the webpack client build... to 3 seconds | | This is about 10x longer than any interaction with a computer | should be. | tdfirth wrote: | This is similar to what I do (although I use esbuild), | however like an idiot I just run tsc manually so obviously I | forget sometimes and it takes 10 minutes to realise the build | failed. | | Not great... I'm ashamed but it's just me on this project | atm. | | edit: misread parent originally | bastawhiz wrote: | If it takes one minute to type check a codebase and ten seconds | to compile it to JavaScript, then whenever I simply need to | produce a JavaScript file without type checking I'll save | myself the fifty seconds. If my editor's TS language server | does the checking and my bundler is just compiling for my dev | environment, I don't want both tools duplicating each other's | work. | postalrat wrote: | You compile ts without type checking when the options to do so | are 10x+ faster than with type checking. | pjmlp wrote: | I really don't get this, when I save my TS files they are | already compiled into JS on the fly, ready to fed into a | <script />. | recursive wrote: | By what? What if they were updated from an upstream repo, | and never saved from an editor? What if you change branches | without an upstream repo? | pjmlp wrote: | By VS and VSCode, the IDE/editor that belongs to the same | company that does TypesScript. | | If I really need to call tsc explicitly, the few seconds | that I have to wait, versus the minutes on some other | languages, won't make me lose sleep over it. | antihero wrote: | It will be interesting to see if they create a language server so | we can use it in VSCode. | politician wrote: | I'm sorrowful that such a significant percentage of this post was | a preemptive apology for not using Rust. | newlisp wrote: | Well this a blow for the "Rewrite it in Rust" movement. | davepeck wrote: | (In a different direction, Microsoft's pyright type checker for | Python -- IMO the fastest and most correct type checker in the | Python ecosystem -- is implemented in TypeScript!) | zem wrote: | as someone who works on a competing typechecker, i'm definitely | very impressed by pyright, especially since it has a single | author! we have three people working on pytype and it | definitely feels like too little; the way the pyright author | manages such a great development velocity is mindblowing, and | means they got the initial architecture very right. | jupp0r wrote: | > tsc depends on shared mutability and has a cyclical mutable | reference. Rust is designed to prevent this behavior. Having two | references to the same data (shared mutability) is undefined | behavior in Rust. | | This problem has numerous solutions in Rust - (A)Rc, RefCell etc. | I don't understand how this caused the author of the article to | stop considering Rust. | petmon wrote: | Non-GC languages are bad in general at handling cyclical | references (mutable or not), and Rust is especially ill suited | for it because of the borrow checker. For example should an | arbitrary reference in tsc become an rc::Rc or an rc::Weak? You | have to tease out the implicit ownership model and one may not | even exist. | | tsc was designed expecting GC and it makes perfect sense to | port it to another fast GC language. | armchairhacker wrote: | I honestly like the explicit weak ref model and it's usually | trivial (child objects are strong references, parents and | siblings are weak), and rust has Gc as a last resort. | | But if the author's more familiar with Go, performance in Go | is likely good enough | throw10920 wrote: | I'm not a Rust fanboy (in fact, I rather despise the RESF, | especially given Rust's mediocre tooling), but: I think that | "You have to tease out the implicit ownership model and one | may not even exist" suggests a problem with the _program | itself_. | | I can't think of any reason why a program _must_ have an ill- | formed ownership model (unlike, say, the fact that many | interesting programs must have an embedded dynamic type | system (tagged unions etc) to do their thing). | | Now. The Rust language and existing tooling doesn't give you | any help in figuring out the ownership model, but I can | easily imagine a language and tools that _would_ help with | that. | | This isn't meant to refute anything you've said (you said | "[existing] non-GC languages are bad in general" not that | non-GC languages _cannot_ be good) - just add clarity. | bcrosby95 wrote: | A program doesn't have to have an ill-formed ownership | model. But if it does and you're trying to translate from | one language to another, you should probably pick the | language that makes the job easier. | | It reminds me of a programming assignment I had 20 years | ago. We had to take our teacher's old FORTRAN program and | "convert" it to Java or C++. I knew both equally well at | the time - and I picked C++ because it had goto whereas | Java didn't. | | This meant that I didn't really have to unravel the meaning | of the program. Whereas with Java, I would have to. | [deleted] | adamnemecek wrote: | > I rather despise the RESF, especially given Rust's | mediocre tooling | | Which tooling are you talking about? | chubot wrote: | If your program is an interpreter, then I think the | ownership model is going to be dominated by the program | being interpreted, not the intepreter itself... That is, | has to be general enough to interpret any program, and many | of those programs will have cyclic data structures. | | TSC is not an interpreter, but its type system is Turing | complete, so I wouldn't be surprised if it has some similar | issues. | klodolph wrote: | Speaking as someone who has written non-trivial compilers | in both Rust and Go... | | Compilers tend to use a bit thornier data structures and | graphs than you might see in other realms. It is _very_ | convenient to just represent these graphs as objects | connected by pointers, and know that you don 't have to | worry about objects that become unreachable. | | For example, the values in an SSA-based IR are all full of | pointers to other values. Each value represents the | computation to produce that value, like "x12 / 4". A | strength-reduction pass goes in and mutates this node, | replacing it with "x12 >> 2" (if such a change is valid). | It knows the change is valid because it follows pointers | from the node to its inputs. The change is available to all | upstream values immediately because it is mutated. These | changes may make other values unreachable, at which point, | you don't care about them. You only care about the values | which are reachable from computations with side effects. | These nodes _definitely_ have cycles and IMO, the notion of | weak references between them doesn 't make sense. | | I'm sure there's a way to do that in Rust. Please, nobody | try to explain how. I could figure it out. The thing is... | it's not obvious how to do it in Rust, and figuring out how | to "explain ownership to the Rust compiler" without | creating Rc<RefCell<T>> cycle leaks is a non-trivial task | that sucks away time I could be spending writing the | compiler. | | This reminds me of a paper I saw go by years back about | writing a compiler in Haskell. Pure Haskell code doesn't | let you mutate objects and doesn't let you do pointer | comparisons, so it creates problems if you're trying to | implement an algorithm which is conceptually pure (at a | high level, a function with inputs and outputs) but where | you're taking the implementation from a paper which | constructs the output iteratively, piece by piece. The | problem is that your data structures have to be annotated | to show which parts are mutable (IORef, STRef, or TVar), | but "which parts are mutable" may have a different answer | depending on whether you are inside some function | transforming or constructing a data structure. | | It was obvious to me, reading the paper, that while I love | functional purity, and functional purity solves all sorts | of problems and makes it easier to write code _in general,_ | it sometimes gets in the way of getting shit done. The same | is true of Rust 's notion of ownership... super useful when | it lets me write fast, efficient code without having to | think about ownership (when it just works and gets out of | the way), and super annoying when you have to figure out | how to explain your program to Rust. | sealeck wrote: | Rust does have a third-party `Gc` type (which used to be in | the stdlib). | speedgoose wrote: | I think the point is valid. As a rust beginner I tried to make | a graph data structure and it was difficult. Using RefCell is | not very nice because it moves the problem from compile time to | runtime and you still can't mutate the same data structure more | than once at the same time. | m-hilgendorf wrote: | Rust encourages you to use an adjacent list/matrix for your | graph representation, which is preferable almost all of the | time for graph algorithms. It's a little unwieldy if the | graph needs to be mutable during the core algorithms, but | almost all algorithms in practice assume it will be stable. | tomcam wrote: | This is interesting but I don't know Rust. Presumably your | graph data structure grows dynamically, meaning you have to | add nodes at runtime. When. you say you can't mutate the same | data structure more than once at runtime, what does it mean? | That you can't set both a parent and a child pointer in the | same function, or what? | [deleted] | [deleted] | speedgoose wrote: | I'm very far from being a rust expert but it was difficult | to build a bi-directional graph because it's difficult to | connect the nodes together. A simple recursive algorithm | will not work. I ended up using a crate. | | I found the following article interesting on the topic : | https://aminb.gitbooks.io/rust-for-c/content/graphs/ | jupp0r wrote: | It shouldn't be difficult: | | 1. You have a graph struct that has ownership of all | nodes (in Rc containers). | | 2. Nodes have Weak references to connected nodes | slaymaker1907 wrote: | While I hate Go and love Rust, there is also the issue that | RefCell/Gc/Rc are very different in terms of how expensive a | memory allocation is compared to node. Historically there were | very similar problems in trying to write C++ like one would | write Java. Memory allocators today are much better and closer | to GC languages, but I suspect it will still be more expensive | for very allocation heavy code. | | One way you could get this to work at the cost of some safety | (depending on how you do it) would be to use arena style | allocation. Obviously there are ways to do this safely in Rust, | but it is probably easier to do it unsafely for a port. | | However, if you just use a high performance language with a GC | like Go/C#/Java, the port will be easier and you won't have to | worry about doing too many allocations. | andyfleming wrote: | Here's a relevant GitHub thread where some conversation that has | been happening: https://github.com/swc-project/swc/issues/571 | hardwaregeek wrote: | This reminds me of when people try to make a new Ruby | implementation. They get the basics down and think "this isn't so | bad! I can do this!". Fast forward a few months and they're stuck | on some nasty little edge case around eval and method_missing | that they need to run Rails. Or some weird aspect of C | extensions. | | TypeScript's compiler is an unholy, unsound mess, because | JavaScript is an unholy, unsound mess. Sure the basics of type | checking structs and variables is somewhat easy. But what about | conditional types? What about weird generic constraints? What | about indexing types? There's some really confusing stuff going | on underneath the hood. | | It's also a mess that *does not have a spec*. Even if you somehow | manage to make a type checker that implements most of the | features, there's no way to ensure it has 1 to 1 parity with | TypeScript. Plus that parity can be broken at any point. | | The only way I could see this happening is if the author somehow | convinced Anders Hejlsberg or a significant portion of the | TypeScript team to work on this new implementation. I'm not sure | that's gonna happen. | spion wrote: | The good thing is you have a ton of code on the Internet to use | to test it. Unlike a dynamic language reimplementation, a | static language transpiler can check if the results produced | are identical to `tsc` | scottlamb wrote: | > This reminds me of when people try to make a new Ruby | implementation. They get the basics down and think "this isn't | so bad! I can do this!". Fast forward a few months and they're | stuck on some nasty little edge case around eval and | method_missing that they need to run Rails. Or some weird | aspect of C extensions. | | I think that's exactly why the author wrote this: | | > Eventually, I realized that rewriting a massive project such | as tsc would be extremely difficult to continue. But the | performance gains were exciting. This led me to try another | route: porting instead of a complete rewrite. So, I started | looking at the TypeScript source code. | | It will require ongoing work to keep up with changes to | upstream, but by matching upstream's structure as closely as | possible (given the language difference), the path to | supporting these nasty little edge cases should be relatively | clear. | mrkentutbabi wrote: | What is the difference between porting and complete rewrite? | I thought they are the same? | scottlamb wrote: | > What is the difference between porting and complete | rewrite? I thought they are the same? | | As with so many terms in computing, there aren't | universally agreed definitions. But in the author's post, I | read "porting" as switching languages while changing as | little as possible vs "complete rewrite" as | redesigning/rearchitecting it along the way. | travisd wrote: | I think here "rewrite" means completely changing the | internal structure of the code (while preserving the | functionality) whereas "port" means copy the structure of | the internal code with as little modification as possible. | And it's hard to do the latter in Rust since Rust is not | garbage collected. | jacobr1 wrote: | Porting can involve a spectrum of changes. If you are | porting to a different language, often for bug-for-bug | compatibility you try to maintain the same rough code and | class/object structure as the source codebase. This might | give you some non-idiomatic code at first, but once you | have good test coverage you can refactor. In this case you | might maintain some of the upstream code structure to | better match future changes. | ludamad wrote: | Interesting, in theory you could make a rough js to golang | compiler given the relatively strict subset of typescript | that I recall the typescript compiler using (not that the | strictness means a fully sound implementation is easy, but | that you can use the regularity to cheat text transforms) | danenania wrote: | If they focus on the core features and make a significant | enough improvement, perhaps the core TS team could be convinced | to pull it in as a native extension and make calls to it for | the operations it covers? If something like 62x improvement is | really achievable, it's hard to imagine the team not being | interested at all. | | I would personally be very happy to use a significantly slimmed | down subset of TypeScript if it provided a 62x performance | improvement. Slow type checking is currently the main drawback | of TypeScript development in large projects. | spion wrote: | 62x improvement is not achievable. 6.2x might be. | | edit: I see they are running it with 8 threads, in which case | yes 50x is achievable. In any sizable codebase though, you | should probably split into modules and run with some | parallelism via e.g. `wsrun` or `turborepo` | Androider wrote: | esbuild is > 125x faster than Webpack, doing the exact same | job. It's not some theoretical microbenchmark, you'll see | it when converting a project from one to the other. If a | software hasn't been designed with speed in mind, and I | don't think tsc has, there can be massive performance | improvements possible if you create a new implementation | with performance as a top goal. | spion wrote: | Webpack does way more than esbuild, including running a | typechecking compiler instead of just transpiling, | running compilers able to downlevel emit to ES5 and | providing a deep plugin architecture allowing you to hook | into any bit you like. But yes, it hasn't been designed | with speed in mind - it has been designed for maximum | extensibility instead. Its the same reason why Babel is | slow compared to sucrase (written in JS, currently faster | than SWC and esbuild but doing somewhat less - | https://github.com/alangpierce/sucrase) | | tsc has in fact been designed with speed in mind (I've | been following the project since before it moved to | GitHub, which is when they redesigned it for speed). | Going beyond 1 order of magnitude performance improvement | while implementing the full feature set is highly | unlikely. | epolanski wrote: | One of the main requirements of TypeScript is that the | compiler has to work in a browser. The monaco editor is the | prince jewel of TypeScript applications. | | So the only possibility I see is a webassembly alternative. | hardwaregeek wrote: | I'm not saying this is impossible but I think you'd have to | work rather carefully to not have to parse the code twice and | effectively run two type checkers in parallel. Which, on | second though, you could run the fast one then the slow one. | Perhaps that's an option. | | > I would personally be very happy to use a significantly | slimmed down subset of TypeScript if it provided a 62x | performance improvement. Slow type checking is currently the | main drawback of TypeScript development in large projects. | | We all think that until the slimmed down subset doesn't work | with our favorite libraries. Sure, there may be people who | use no dependencies, but I'd bet that there's sophisticated | types in some very popular libraries that even low dependency | programmers use. | travisd wrote: | To this point, many teams are already using subsets of | TypeScript to improve compilation times. Often times it's | things like always declaring return types on functions to | avoid the compiler having to infer it (especially between | module boundaries). Calling this a "subset of typescript" | might be strong wording, but it does suggest there is some | desire in that general area. | vosper wrote: | > To this point, many teams are already using subsets of | TypeScript to improve compilation times. Often times it's | things like always declaring return types on functions to | avoid the compiler having to infer it (especially between | module boundaries). | | I hadn't heard of this trick - how much improvement does it | make? Seems like it might be good for readability, too? | travisd wrote: | I haven't personally benchmarked it, but I know it's | available as an eslint rule (all exported functions must | have declared return types). | fault1 wrote: | It seems like depending on how you configure typescript | (e.g, in tsconfig), typescript is already something like an | ensemble of mini-languages with different dialects and | semantics. Much more so than other languages. But I agree, | restricted typescript that has to do less work (data or | control flow analysis) would probably be on the orders of | magnitude faster. | Rowern wrote: | The issue here is that if you're running in the JS ecosystem | you'll definitely want to use other people code (npm package | or internal lib), if the subset breaks JS compatibility then | you can break a significant amount of code without realising, | if it is "only" a TS subset, then you need to make sure that | each lib/package you import are compatible. Anyway this does | not seem like a good solution. | shados wrote: | to be fair, even within the TypeScript world that can be a | problem. If typescript versions don't line up, or if your | various strict options are too strict, you can get into | really weird situations. It -generally- works, but when it | doesn't, it's hell. | koolba wrote: | > I would personally be very happy to use a significantly | slimmed down subset of TypeScript if it provided a 62x | performance improvement. Slow type checking is currently the | main drawback of TypeScript development in large projects. | | Isn't that already available by running in "transpile only" | and having the type checking happen in the background? | mcintyre1994 wrote: | > I would personally be very happy to use a significantly | slimmed down subset of TypeScript if it provided a 62x | performance improvement. | | I'm just guessing but I think a lot of Typescript's really | complex stuff is to support third party libraries with really | complex types - at least they often refer to libraries | they've improved support for when they discuss complex | seeming additions. So I wonder how much of the JS ecosystem | you'd have to cut off to get to such a subset. A particular | example would be that I've often seen a chain of 20+ | dependencies leading to Lodash, and I'd guess that Lodash is | really complex to type. If you lose everything that's | eventually dependent on something that's really complex to | type, do you have much of NPM left? | acemarke wrote: | As a maintainer for Redux, Redux Toolkit, and React-Redux, | I can tell you that we would _love_ to have several | additional features in TS that would make our library TS | types much easier to maintain. Like, say, partially | specified generics. | | Tanner Linsley (author of React Table, React Query, and | other similar libs) has expressed similar concerns. | | (I'm actually starting to work on a talk for the upcoming | TS Congress conference about "lessons learned maintaining | TS libs", and this will be one of the things I point to.) | armchairhacker wrote: | Even if it's not official i could use a slimmed-down, faster | TypeScript. Even if it has slightly different semantics. I | always use skipLibCheck so TS only checks my code. | | It's a form of linting, not semantics, so it doesn't have to | be exact. | Waterluvian wrote: | I am absolutely blown away at how good TypeScript's type | checker is, given how much of an unholy mess JavaScript is, and | that the TypeScript language cannot just deprecate all the | awful parts. | | More than once I've yelled at my screen, "HOW DO YOU KNOW | THIS?!" about some inferred or narrowed type. | dmitriid wrote: | However, more than once I've also yelled "HOW DO YOU NOT KNOW | THIS?!" at something or other :) | epolanski wrote: | Just today I was triggered that this compiles. | | declare foo: never[] // it can only be an empty array | | foo[1] // compiles | | TypeScript is a nice type system, it's surprisingly | expressive and flexible, like e.g. in Haskell I was | surprised that there's this strange notion that for a given | type you can only have one `Eq` or `Ord` instance for some | type A. | | But in TypeScript you can have as many Ord interfaces as | you want. I may want to order some `User` type by last | login, first login or whatever I want, but in Haskell you | need to overcomplicate it and create a newtype. | Jenk wrote: | I wouldn't expect never[] to mean 0 length array, tbf. | It's a variable length array of _type_ never. It should | be something akin to: declare foo: | never[0]; | emteycz wrote: | I think the correct solution is: | declare foo: []; // fixed-length array (tuple) without | any items | armchairhacker wrote: | foo[1] is undefined (literally the undefined symbol), in | JavaScript index-out-of-bounds is ok. | | Although i don't know if TypeScript asserts that x[i] is | T | undefined if x is T[] and the length is non-trivial. | In strict mode this would be really useful as you can | just do x[i]!, but it requires you to think a bit about | index-out-of-bounds or at least throw on the undefined | case. | endgame wrote: | > you can only have one `Eq` or `Ord` instance for some | type A. | | This is a good thing - it stops you from using one Ord | instance when putting As into a Map or Set, and then | trying to pull them back out using another. Newtypes let | you provide "additional" instances in a safe way. | reuben364 wrote: | I've managed to get the compiler to typecheck | const x: Thing = exprA; const y: Thing = exprB; | const z = [x, y]; | | but somehow not: const z: Thing[] = [exprA, | exprB]; | | although exprA and exprB were pretty nasty typing wise. | lhorie wrote: | Yeah, the lack of specification for Typescript is quickly | turning into the Achilles heel for the web tooling ecosystem. | Several projects got the parsing part done, but a lack of | specification and the relatively fast release cycle means it | isn't really feasible to invest into actually implementing the | meat of the TS compiler, so everyone is stuck with that last | remnant of critical tooling running slowly. | pcj-github wrote: | No spec, but the golden files that check that the compiler is | working as intended are very comprehensive. If you can generate | the same ~38,000 files in this directory from the test cases, | you're pretty much on par with tsc: | https://github.com/microsoft/TypeScript/tree/main/tests/base... | jacobr1 wrote: | But that only works for a bug-for-bug compiler, no? If you | want to build something like a linter, you couldn't use them | to validate you are covering the fully syntactic range. | Though I guess it would help look at coverage if your linter | threw any errors, but it wouldn't validate correctness. | msoad wrote: | There is no specs so TypeScript's own impelemtption might not | be accurate to what the team imagined on how things should | work. | | That means if this implementation passes all TypeScript's | tests, it is as good as TypeScript. | | Each new issue that is opened in either repo that shows | discrepancy can be a new unit test added to the TypeScript | codebase. | scotty79 wrote: | Good thing is that the only thing it does is type-checking. So | if you miss some edge case it won't influence at all how your | ts code works. | | The only thing is it may not catch some type error, which JS | already does none of and TS can skip wherever you want or even | implicitly and people are fine with that. | | If I get 98% of TS typechecking 10 times faster. It's a huge | win. | simplify wrote: | You're vastly underestimating the advanced type system | features TypeScript requires to typecheck even the most basic | JavaScript code. It would be more like 10% rather than 98%. | Intersection and union types are no joke, and are notoriously | difficult to make performant. | skybrian wrote: | One way to collaborate would be to write a spec as you go, | along with an associated test suite that can be run against | multiple implementations. | | Presumably TypeScript has a test suite already that might serve | as a starting point? | newlisp wrote: | But in the meantime he gets to work on a fun project, learn a | lot and get paid for it at the same time ;) | adamnemecek wrote: | The way of dealing with cycles is generational arenas. Also where | is the codebase? I would like to see the uses of unsafe. | danenania wrote: | If anyone happens to know: what are tsc's bottlenecks? | | I presume that if the bottleneck was just number of loop | iterations, V8 would already be about as fast as C or any other | language? | | Is it simply that accessing/mutating javascript data structures | like objects, arrays, set, etc. is slow? Are those not fully | optimizable by V8 for some reason? Or is it something else like | the GC being slow? | [deleted] | cpcallen wrote: | Yes, I'm curious about this too. | | Given that gopherjs[1] can compile Go to JavaScript and in some | cases thereby achieve _better_ performance than the native Go | compiler[2], it 's not obvious to me that a JavaScript program | ported mechanically to Go is likely to be any faster than the | original. It seems to me that the effort might be better | applied to optimising the performance of tsc itself. | | [1] https://github.com/gopherjs/gopherjs [2] | https://medium.com/gopherjs/surprises-in-gopherjs-performanc... | newlisp wrote: | This is a problem that can take advantage of good parallelism, | Go will give you that and at the same time be more memory | efficient. | epolanski wrote: | I'm not so positive about these projects. | | They are good willed but TypeScript doesn't have an official | spec, it's an implementation-defined language. It's not C, it's | not JavaScript, it's whatever each release wants it to be. This | makes it very difficult to follow every version and evolution, | you're consistently chasing whatever has been merged on release. | folkhack wrote: | And it's the exact reason I'm really reluctant to fully buy | into Typescript... I always feel like I'm wrong about the "not | an official spec" part of things but you are 100% correct in my | understanding. | | Also on "it's whatever each release wants it to be" - that is | hell on earth if you're looking to target consistent software | processes and it only gets worse the more you scale (both | solution and team). | | I get the value of Typescript - hell I've used it extensively | myself... it's just that I can't stop feeling like this is all | an anti-pattern: trying to make a loosely-typed language a | typed language (if you can call it that?). =/ | BadOakOx wrote: | Just from a week ago on HN: | | I'm making a Typescript type-checker in Rust: | https://news.ycombinator.com/item?id=30033119 | epolanski wrote: | Next week it will be Zig then | slig wrote: | Cannot wait for their webpack replacement. | pjmlp wrote: | The thing with all those projects is playing catch-up with | upstream, and the possible variations from what is actually the | official semantics depending on the TypeScript version. | throwaway894345 wrote: | Personally I wouldn't fret about keeping pace with upstream. | Eventually the syntax will stabilize if it hasn't already. I'd | be more worried about fidelity--does this produce the same | result as the reference implementation? TFA says he's running | against the "conformance" test suite, so that should keep | things pretty tight and when bugs are found they'll be fixed in | time and presumably the test suite will be updated accordingly. | wwwigham wrote: | The `conformance` suite is the "set of tests people added | when a feature was first released", usually without the | decade of following regression tests in the `compiler` suite. | In TS itself, the `conformance` and `compiler` suite are | considered the part of the same unit most of the time. (The | compiler suite, however, is not helpfully subdivided by | feature, since it oft deals with cross-cutting concerns!) ___________________________________________________________________ (page generated 2022-01-25 23:00 UTC)