[HN Gopher] Advanced Compilers: Self-Guided Online Course ___________________________________________________________________ Advanced Compilers: Self-Guided Online Course Author : matt_d Score : 520 points Date : 2020-12-11 15:25 UTC (7 hours ago) (HTM) web link (www.cs.cornell.edu) (TXT) w3m dump (www.cs.cornell.edu) | jcq3 wrote: | I'm interested by the topic but I'm a junior backend/data eng, do | you think it could be helpful for my profile to dig into the | concepts of compilers other than for the culture itself? | ibains wrote: | I've worked on multiple compilers (optimizations expert) at MSFT | on VS and CUDA and gave developed a DSL and worked on Database | Compilers. I can't hire compiler people with right skills. We're | building an IDE and those parsers are written differently, and we | use Scala packrat parser combinators. These courses teach very | little judgement or industry relevant stuff. When do you use | packrat parser vs LALR vs LL? Good luck hiring an engineer with | advanced compiler courses knowing any of this. I'd like to sit | down all university professors who teach compiler courses and | teach them a course on what's relevant. | nialv7 wrote: | No offence, but parsing (in a compiler, not in general) is | probably the most boring part of a compiler. | | As this is a PhD course, I'd expect the goal of it to be | preparing students for research in the field, not writing | parsers for some industrial company. | | Also, there are definitely courses that teach you how to write | a parser. But we usually don't call them advanced. | swyx wrote: | perhaps you should guest lecture at a university and have them | record all the lessons? your kind of experience is rare. | chrisseaton wrote: | > Good luck hiring an engineer with advanced compiler courses | knowing any of this. | | Have you tried raising the level of compensation you offer? | There are engineers elsewhere doing this at other companies you | could presumably get if you put your hands in your pockets? | hnxs wrote: | "We can't hire for x" almost always means "we can't hire for | x for what we're willing to pay" | mhh__ wrote: | I'm normally fairly laissez-faire about the morality of | "exploiting" employees, but that works both ways - if you | can't find people for the amount of money you're charging | then tough shit | vector_spaces wrote: | I saw this great interview recently with Anders Hejlsberg at | MSFT on how modern compiler construction differs from what is | taught in traditional University courses. Is this what you're | alluding to? | | After watching that interview, it's strange to read a comment | like yours. While the architecture is completely different, it | doesn't frankly seem like that big a leap to go from "senior | engineer with X years working on thing that's tangentially | related to compilers" to being able to be productive in a | reasonable amount of time working with the new architecture. | What am I missing? | | https://youtu.be/wSdV1M7n4gQ | chrisseaton wrote: | It's pretty hard to pick up compiler skills because very | little of it is written down. It takes a lot of time (a few | years) working with code bases and papers to absorb it. | vector_spaces wrote: | I will say that I asked Shriram Krishnamurti about why books | on interpreters and compilers tend to avoid the question of | parsing like the plague and found his response a little | unsatisfactory. | | All these celebrated texts like SICP, EOPL, and Lisp in Small | Pieces avoid the question of parsing altogether by focusing | on s-expression based languages. | | His response was effectively that parsing is over-emphasized | in other books, and it's truly orthogonal to other PL | activities. Which I can understand, but in practice when I | need a parser in my day job I usually don't have much say in | the structure of the input "language" (since I'm usually | parsing something coming from another source that I have | little control over). And it would seem if you have an | instructor in a compilers course with this point of view, | what other class would give you an opportunity to learn | parsing in a formal setting? | Ambroisie wrote: | What ressources do you recommend I read, or what I can do, to | learn more about applicable compiler knowledge? | throwaway_pdp09 wrote: | Prev post by me, if it's any help | https://news.ycombinator.com/item?id=25210523 | [deleted] | Philip-J-Fry wrote: | Hiring for exact skills is what you do when you need the skills | ASAP and the actual supply of engineers is there. The time you | waste looking for someone with the exact knowledge you need | could have been spent getting a good compiler engineer and just | teaching them what you know. And magically that good engineer | will get better, it's pretty amazing stuff. | | I've was told multiple times, University/School teaches you how | to learn. You get your actual knowledge and skills from the | job. What I learned in 3 years of university was nothing | compared to what I learned in 6 months on the job. | kop316 wrote: | I will second this. I would much rather hire an engineer who | has a related background and wants to learn over an engineer | with the exact skillset but does not want to learn. It may | take a few months, but the former engineer will pick up what | you want them to know (if you mentor them correctly), and | past that inversion point, they are immensely more valuable. | bachmeier wrote: | > I'd like to sit down all university professors who teach | compiler courses and teach them a course on what's relevant. | | I don't teach compiler courses, but I'm an academic, and I do | keep in touch with industry people to find out what's important | for my data analysis course. A couple of big problems with your | proposal: | | 1. Time variation in "what industry wants". This year one | thing's hot, the next year it's something else. A bare minimum | requirement for any course is that a student taking it in their | second year should learn something that's still relevant three | years later when they're in the first year of their job. | | 2. Cross sectional variation. There's no such thing as "what | industry wants". Every place wants something different, with | only a small subset common to 80% of employers. | MaxBarraclough wrote: | ibains made a specific point about language processing in | IDEs being different to that of (traditional) compilers. | Presumably there exists a state-of-the-art there just as | there's a state-of-the-art for conventional compilers, but | academia doesn't give it as much attention. | throwaway_pdp09 wrote: | Text -> parser -> AST -> job done. If it's any different in | an IDE vs anything else I'd like to know how. | morelisp wrote: | Partial parse state and recovery are critical. You don't | want the entire bottom half of a file to lose semantic | analysis while the programmer figures out what to put | before a closing ). | throwaway_pdp09 wrote: | Does that issue go away if you use packrat vs some other | means? | morelisp wrote: | Packrat parsers are notably faster than recursive descent | parsers (also critical for IDE use) and by turning them | "inside out" (replacing their memoization with dynamic | programming) you get a pika parser which has very good | recovery ability. | | There are varying techniques to improve error recovery | for all forms of parsing but hacked-up recursive descent | (certainly the most common kind of parser I still write | for my hacked-up DSLs!) have poor error recovery unless | you put in the work. Most LR parsers are also awful by | default. | | When I was in university most focus was on LL and LR | parsers with no discussion of error recovery and more | focus on memory usage/bounds than speed. I also have no | idea how common it is to teach combinator-based parser | grammers these days; stuff like ANTLR and yacc dominated | during my studies. This would add another level of | unfamiliarity for students going to work on a "real | compiler". | IshKebab wrote: | Yes. Go and look up Tree Sitter and its references. | ohazi wrote: | Your IDE parser will be unusable if it goes bananas while | you're typing the characters needed to get from one | fully, correctly parseable state to the next. | | It needs to be able to handle: | printf("hello"); | | and also: prin | | and also: printf("He | | It also needs to be able to autocomplete function | signatures that exist below the current line being | edited, so the parser can't simply bail out as soon as it | reaches the first incomplete or incorrect line. | philosopher1234 wrote: | Which is still countered by: | | 2. Cross sectional variation. There's no such thing as | "what industry wants". Every place wants something | different, with only a small subset common to 80% of | employers. | | No? | MaxBarraclough wrote: | I don't think that really holds up. Ultimately you could | say this about any small mismatch between desired skill- | sets and those available, but at some point we call it | _hair-splitting_. | | If you need someone to build a parser for your IDE, | you're presumably better off with an expert in | traditional compilers, than with someone who knows | nothing at all of language processing. You'd be better | off still with someone experienced in language processing | within IDEs. Less mismatch is better, even if it's | impractical to insist on no mismatch at all. | u801e wrote: | > I'd like to sit down all university professors who teach | compiler courses and teach them a course on what's relevant. | | Courses should be used to teach general concepts while industry | should be used to show those how to apply those concepts. The | choice between using a packrat parser, LALR, or LL would be | easier to explain to someone who has been exposed to those | concepts and understands the advantages and disadvantages of | each approach compared to someone who has never heard of the | terms. | | Best practices change with time, but knowing the base concepts | and how to apply them will help people adapt. If they're only | exposed to what's considered relevant at this time, then they | won't have the necessary exposure to concepts that may pertain | to future best practice. | g9yuayon wrote: | Parsing is frontend of a compiler. That the compiler course | focuses on compiler backend does not mean it's not relevant to | the industry. I personally find aliasing, flow analysis, and | and optimization very relevant to what we needed to do. | | As for industry vs academia, wouldn't it be fair to say that | professors should teach which ever is more advanced? I | certainly don't believe that teaching Java generics, no matter | how relevant that is, is relevant to a PhD program, which is | supposed to push the state of the art of a subject. | cinericius wrote: | Do you think it is the responsibility of academia to teach | "industry relevant stuff"? I agree that courses generally don't | teach judgement well, but I do think that it is the | workplace/industry's responsbility to train their engineers and | not expect fresh grads to be fully equipped to work on | something that specific. | nicoburns wrote: | I definitely think there ought to be courses on such things | available. There should be an educational route that is | between pure academia which focusses on esoterica that is | only relevant to research and vocational courses that don't | include much innthe way of theory at all. The idea that | industry ought to pay for it presumably comes from a | background where universities are private an expensive. But | I'd like to live in a world where these kind of courses were | government subsidised and available for free or cheap. | bachmeier wrote: | > The idea that industry ought to pay for it presumably | comes from a background where universities are private an | expensive. | | In the US at least, where even public universities are | ridiculously expensive, there's no way to keep the current | system running unless our grads are able to get jobs that | pay well when they graduate. Indirectly industry is paying | for the courses taught by universities. In the current | environment we need to prepare students for employment. | | > I'd like to live in a world where these kind of courses | were government subsidised and available for free or cheap. | | I think we'd be better off as a country (US). I also think | that ship has sailed and it sank to the bottom of ocean a | couple decades ago. | fiddlerwoaroof wrote: | I tend to think of most university education as taxpayer | and student subsidized versions of the sort of training | that companies should be doing. I think, for most | programmers, a well-thought-out apprenticeship program | would be better both for them and for the company, except | that the amount of money we send to universities makes | this economically infeasible. | nordsieck wrote: | > Do you think it is the responsibility of academia to teach | "industry relevant stuff"? | | I mean, there are colleges that cater specifically to | industry. Digipen is a good example of this: they have very | close relationships with game studios and shoehorn their | curriculum into the requirements the state imposes on | colleges. | | These sorts of colleges are not particularly popular, | however, so I think most prospective students understand the | value of a more broad and timeless approach to software | education. | phaker wrote: | > Do you think it is the responsibility of academia to teach | "industry relevant stuff"? | | I don't think that was what ibains was going for, though i | don't fault you for seeing this in that comment. (Especially | that i don't think this course suffers from that problem and | generally i think things are rapidly improving on this | front.) | | That problem to me and i think to him too is that quite a lot | of things that such courses tend to teach as well thought | out, well working solutions and approaches just don't work | very well and frequently i find comments on what's a 'good | taste' solution and what isn't that are completely my | understanding of the problem space which is why. | | E.g. in parsing (as that's the topic he mentioned): First, | lots of people just spend way too much time on it. And they | focus on parts that are of zero use to beginners (like | explaining all several grammar families) and then use obtuse | parser generators that save no work and sometimes use them in | bizarre way (like picking a lalr parser generator then hand | editing the output to support a not-quite-lalr language). | Meanwhile a recursive descent parser is easy to write, fast | and gives pretty good error messages with _very_ little work. | You do need to know enough about grammars to be able to write | one down and know if it describes an ambiguous language etc | so this should be taught, but you don't need to understand | the language zoo well. | deeeeplearning wrote: | >I don't think that was what ibains was going for, though i | don't fault you for seeing this in that comment | | Isn't that literally exactly what he was going for? | | "I'd like to sit down all university professors who teach | compiler courses and teach them a course on what's | relevant." | chrisseaton wrote: | Most languages are context sensitive. | | Most language tools are context free. | | How did we go so wrong? | MaxBarraclough wrote: | Could this be viewed as supply exceeding demand? | | Context-free grammars are ripe for theoretical computer | science work even if they're not practically relevant. On | the flipside, I suppose the constraints of context-free | grammars are seen as a price not worth paying when | designing a language. | throwaway_pdp09 wrote: | Can you give an example (or three) where that lets us | down? That would be very helpful to me I suspect. | chrisseaton wrote: | Examples of which bit? | | Languages that are context-sensitive? C, C++, Java, | JavaScript. | | Examples of tools based on the starting expectation that | languages are context-free? Yacc, Bison, Jay. | | Examples of the problems this causes? Well we're using | the wrong tool for the job, right from the start. Instead | of using an appropriate tool the first thing we do is | bend our tools out of shape. We use side-effect actions | to subvert the tool's model in an uncontrolled way. We | don't get the full benefits of the tool's original model | and can't rely on its guarantees. | WalterBright wrote: | Languages being context sensitive results from hacking in | new language features. It's a constant struggle to keep D | context free :-/ | Kranar wrote: | Strictly speaking, D is not context free nor is any | statically typed programming language. That said most | languages, including D, have supersets that are context | free and can be used to build an AST which can then be | refined further with context sensitive checks (variables | are used after declaration) and then even further with | type checks. | | Many language tools don't need a full blown language | parser and can get by with a context-free parser for the | language's superset. Things like autocompletion, spell | checking, documentation/highlighting can all be | implemented using context-free parsers. | deeeeplearning wrote: | This is such an empty criticism. Universities are not job | training facilities. You want people who have the exactly right | skillset? Hire smart kids and train them, you know, the way | it's been done for hundreds of years. | UncleEntity wrote: | > These courses teach very little judgement or industry | relevant stuff. | | Hasn't that been the debate for, like, ever? | | Should the universities teach you how _to learn_ or should they | teach you how to hit the ground running at your first post- | graduation job? | | I'd say anyone who took phd-level courses on compiler design | shouldn't take too much training to become a valuable team | member as they've proven they can learn what is important for | the given task but maybe that's just my uninformed opinion. | nicoburns wrote: | I feel like "teach you to learn" vs "teach you useful | skills"is a false dichotomy, and often just an excuse for | poor teaching. They can and should do both. The reasom they | don't is because they're so research focussed, which is fine | as an option but a problem because there are no or vert few | good options for high level academic courses that are more | practically focussed. | knuthsat wrote: | Parser combinators allow elegant implementations of parsing but | are ridiculously slow compared to some Knuth-optimized parsers. | mafribe wrote: | As elegant as parser combinators are, it is not so easy to | work out exactly what grammar they are implementing. | woodruffw wrote: | I second this (although I work more on translation and modeling | than optimization). IME, it's much easier to find otherwise | competent, savvy researchers and spin them onto compiler work | than it is to take someone who's gone through an "advanced" | course. | [deleted] | yters wrote: | I agree in general. I have a PhD in comp eng and know a lot of | academia type subjects, but only a small portion is especially | useful in a practical setting. Also a lot of academic knowledge | seems needlessly obtuse. It becomes a lot clearer when looking | at the practical problems that spurred the academic fields. At | times I get the aha as to why this abstract concept is | addressing a meaningful question, and if that's how academic | education oriented itself, around these basic meaningful | questions it would help people learn much more effectively. | zakember wrote: | Any resources you can suggest to point interested engineers in | the right direction? | seanmcdirmid wrote: | You are thinking about it the wrong way. In this niche, you | will almost never find people with the exact skills you are | looking for. Better to find people with tangentially related | skills and experience and let them grow into the role. Academia | is never going to be comprehensive enough to prepare students | for every niche that has some demand. Rather, they should just | focus on teaching a core that will help them learn what they | need later on demand. | | > When do you use packrat parser vs LALR vs LL? | | If you need ok performance and good error recovery (especially | if you are doing stuff in an IDE!), none of the above: | recursive descent is not that hard to implement, and is | incredibly flexible. | helloycombinat wrote: | > When do you use packrat parser vs LALR vs LL? Good luck | hiring an engineer with advanced compiler courses knowing any | of this | | Honestly, I have no idea why you would want any of those. Can | you explain in a few sentences? | | I written a parser before and it was faster than everything I | used that does the same thing or similar. And I haven't had any | professional or teachers teach me parsing | | I once wrote a json parser in .NET. It was faster than | literally everything I tried but I already know it can't touch | simdjson and I haven't compared it to the new dotnet parser but | I imagine they do it faster. I'm actually confused how people | write such shitty parsers when I barely know what I'm doing. | Like are they using a link list or revisiting characters | multiple times to parse? (I think two visits is acceptable. One | to find the end of a number and another pass to convert the | number to value although I'm sure it's not hard to do one pass) | mafribe wrote: | I understand the problem. | | I teach compilers at a big university. And I would love to hire | graduates with compiler skills for my startup, but find it | difficult. | | There are several structural problems that conspire to keep | students from acquiring knowledge in low-level programming | domain like compilation: a course needs to fit with the rest of | the curriculum, and with student interest. I cannot get | students interested in lexing and parsing, because they hear | all day how exciting deep learning and big data are. Lexing and | parsing are deemed solved in the 1960s, a sentiment I disagree | with. In addition, classical theoretical computer science (e.g. | automata and languages) is rarely taught with enough depths in | many universities, so students lack the technical background. | You can cover this in a compilers course (I do) but it takes | time, and the students see it as an ordeal. Compilers as a | subject is _not_ helped by the most widely recommended book | being the _Dragon Book_ which is dated and way too long to be | useful for beginners. Compare the _Dragon Book_ with the | material available as introduction to machine learning ... Many | of the existing textbooks are also not covering modern | compilation themes, in particular JITs, and compilation for | multi-core and GPUs. I 'd say there is currently no good | undergraduate textbook on compilers. I could probably, with a | lot of work, cobble something reasonable together from my | lecture notes, but I don't believe in _books_ , I'd like to do | a good MOOC, but building suitable software infrastructure | requires more work than I am currently willing to put in. | | My department tried hard to remove the compilers course, as "no | longer relevant", and "too hard". I threatened to resign if it | was not kept as a mandatory course. For now this worked. | pwdisswordfish4 wrote: | Your HN profile is empty, so there seems to be no good way to | check out your institution's program or to contact you. | markus_zhang wrote: | Thanks. I'm actually very interested in lexing and parsing, | because that's probably 99% of the stuff a layman-programmer | gets to do with compiler theory in job. I mean not everyone | gets chance to write the backend part, but parsing skill is | almost used universally. | mafribe wrote: | A fun exercise is writing a lexer generator or a parser | generator. It's probably easier starting out with writing a | lexer generator like Flex for your favourite language. I | recommend using the Flex input format, than you can test | your creation using Flex as a testing oracle! This is not | something you can do in a weekend. | waynesonfire wrote: | I never took a compilers class during university study and it | is my greatest regret. I've since been self learning and it | hasn't been easy. | | Thanks for advocating for the importance of compilers. This | lack of knowledge has been a limiter for me many times. I'd | really like to fill a skill gap and be able to write a DSL or | a little language to solve a problem. | mafribe wrote: | From my experience in industry, while hardly any programmer | will ever need to write a modern code generator, few things | are more useful in practise than being able quickly to | design a suitable DSL and implement a corresponding | performant and correct lexer and parser. This is not | something that can easily be picked up on the side. | | In addition, understanding what happens under-the-hood when | you program in a high-level language is an eye-opening | experience for many students. This sentiment comes up in | student evaluations year-in, year-out. | morelisp wrote: | As someone who is happy when he interviews a candidate who at | least knows you don't start parsing with `string.split()` I | gotta say, your bar is extremely high. Surely MSFT can afford | some internal training? | TACIXAT wrote: | Is that such a critical hiring criteria? That seems like | something I could learn in a week. | chrisseaton wrote: | > That seems like something I could learn in a week. | | So why not learn it in a week then apply for the job? | | I would suggest the reason is... because you cannot learn it | in a week. | mhh__ wrote: | You can't gain experience in a week but you aren't working | alone. | | This is an example of know-how, not theory. Different | problems require different things. | throwaway_pdp09 wrote: | I have some idea what I'm talking about, see my post | https://news.ycombinator.com/item?id=25210523 and I don't kniow | about why using a packrat over anything else. I pull something | off the shelf, antlr for my latest project, then get on with | it. | | The parser is the most trivial side of things, if you consider | that a key skill it comes across as being an issue from the | 1970s, not today. Writing a RD by hand is not a big deal anyway | as someone else said. | | If your into optimising compilers you're focusing on a very odd | area to criticise. I'd gave though more about dataflow using | lattices or whatever, graph stuff etc. | | So parsing aside, what are the 'relevant' skills you feel are | missing? | | (and what are 'Database Compilers'? Nearest I can think of that | may fit are query optimisers) | mathattack wrote: | I love this line... | | "Real 6120 also has an end-of-semester course project--in the | self-guided version, your end-of-semester assignment is to change | the world through the magic of compilers" | chrisseaton wrote: | It's great that there's a lecture on dynamic compilation - so | often overlooked. | tomcam wrote: | What a treat. Thanks for this submission. | fermentation wrote: | Any idea if I can jump right into this after my only experience | being Crafting Interpreters? | johntortugo wrote: | Of course. Why not? | mdergosits wrote: | I took the undergraduate version of this course while at Cornell. | It was a wonderful course. I taught me a lot how computers work, | and how to organize a large (for me at the time) software | project. | | One of the most painful parts of the the project was using Scala | which had ridiculously long compile times which made iterating | quite a pain. | | My favorite memory was that for one project we had to submit a | program written meant to stress test everyone's parser. | 37ef_ced3 wrote: | Domain-specific compilers are a bigger field, where there's more | opportunity to do something interesting | | Take C (or some other language) and its mature optimizing | compilers as a foundation, and write a program that compiles a | high-level description of a domain-specific program into C | | For example, NN-512 (https://NN-512.com) is a compiler that takes | as input a simple text description of a convolutional neural net | inference graph and produces as output a stand-alone C | implementation of that graph | | GNU Bison (https://www.gnu.org/software/bison/) is another | example, where the output is a parser: _Bison reads a | specification of a context-free language, warns about any parsing | ambiguities, and generates a parser (either in C, C++, or Java) | which reads sequences of tokens and decides whether the sequence | conforms to the syntax specified by the grammar_ | | Halide (https://halide-lang.org/) is a language for fast, | portable computation on images and tensors, and it originally | generated human-readable C++ code as output (but not anymore, now | it feeds directly into LLVM) | | Can anyone provide other examples? (Code generators for network | packet processing?) | ritter2a wrote: | AnyDSL (https://anydsl.github.io/) might be worth mentioning, | it is a framework for creating domain specific languages using | partial evaluation to give powerful abstractions to the user | while still producing high-performance code for different | target platforms. On the website, you can find that they | already have DSLs for ray tracing and image stencil codes as | well as works for bioinformatics. | chrisseaton wrote: | > Take C (or some other language) and its mature optimizing | compilers as a foundation, and write a program that compiles a | high-level description of a program into C. | | Don't you still need many of the techniques taught in this | class to build something like that? The material doesn't look | inherently targeted at low-level languages or machine-code as a | target. | dbcurtis wrote: | One way to think of it is to split the "compiler" problem | into three big pieces. 1) Front-end, 2) back-end, 3) tooling. | | Front end is lexing, parsing, building the AST (in whole or | just keeping pieces of it around), semantic analysis. When | that is done, you can say "Yup, valid program. It is possible | to generate code." So, to your question, yes. Those | techniques from the course are 100% in play. | | Back-end is turning the AST into something you can optimized | and generate code from. (Sometimes there is a "middle-end" in | there....) High-level optimizations, low level optimizations, | memory allocation, register allocation. | | Tooling is everything that keeps your life from being | miserable as a user -- starting with debug symbols and these | days people expect library and package management, etc, etc. | | So if you are interested in exploring the Next Great Syntax | or some semantic problems that can be re-written into C, then | doing a C-front is a great way to do that. Let the C compiler | handle all the really-really-really-hard back-end issues for | you, and avoid all that distraction. But.... expect that | getting debug symbols from your spiffy language into the | ready-to-link object file is going to be, as they say, "non- | trivial". So... great for experiments and a great learning | exercise, but it's hard to do a production compiler that way. | samps wrote: | Hi! This course is about the "middle end," FWIW. We do not | do parsing or codegen in 6120, and there is no goal (as | there is in many undergrad-level compilers courses) of | making a complete, end-to-end compiler for a C-like | language. | kodoque2 wrote: | Do you have a structured course for Domain-specific compilers | available online? | tardismechanic wrote: | MATLAB Coder - Generate C and C++ code from MATLAB code | https://www.mathworks.com/products/matlab-coder.html | | Actually the whole family of embedded code generation products | from MathWorks https://www.mathworks.com/solutions/embedded- | code-generation... | davidpolberger wrote: | I have just spent the last 13 months significantly reworking a | compiler that turns Excel-like formulas into JavaScript. The | compiler is used by an app designer, and the formula language | is meant to be easy to pick up for people used to spreadsheets. | A significant difference compared to Excel is that our compiler | features static typing, enabling helpful error messages to be | produced at build-time. | | (Spreadsheets, on the other hand, tend to try very hard to make | sense of types at runtime, using type coercion to turn values | of the wrong type into something of the expected type they can | work with. That produces surprising results and confuses users. | =IF(2, 3, 4) will generally produce 3, because 2 -- which | should be a logical/boolean value -- is interpreted as TRUE. We | think it's a better idea to warn the user that the first | parameter is wrong, and to do that as soon as possible.) | | By far the largest challenge with reworking the compiler has | been the type system. I have expanded our type system from the | four standard spreadsheet types to encompass more than 80, as | well as added support for lambdas (for use with functions like | SUMIF and FILTER), type inference, parameterized types and | union types. The biggest addition is probably an imperative | mode, that executes formulas with side-effects in response to | events being fired. This is all in service of features we have | wanted to add for years, that have been held back by our | (previously) simplistic type system. | | I did take a compiler class at university, more than 15 years | ago, but while that proved very useful when writing the formula | parser (an operator-precedence parser, initially straight out | of a textbook), type theory was not part of the class. I had to | pick up bits and pieces by reading the Java Language | Specification (JLS) and by experimenting. | | The Cornell class doesn't seem to feature type theory either, | which I think is unfortunate. Now that the pendulum has swung, | yet again, firmly in the direction of static typing, perhaps | static typing will become a bigger part of CS compiler | curriculums. | tomcam wrote: | Your project sounds like fun. What/for whom is the project? | What level of Excel formula compatibility are you shooting | for, e.g. just in the neighborhood like Google Sheets, | reasonably close as in Libre Office, or full compatibility | with the latest version? If the latter you must have one | gnarly test suite. Would love to hear more about it. | tomcam wrote: | OK, I see it's https://www.calcapp.net which looks like a | smart product idea. My best to you. Would still love to | hear more about the other questions. | davidpolberger wrote: | Thanks! | davidpolberger wrote: | The compiler is for Calcapp, an app designer for Excel- | savvy users. Our aim is to offer a function library that is | easy to pick up for our target audience. That means that we | generally follow Microsoft's lead in terms of naming and | parameters, but we often offer extensions. Here are a few | examples: | | * Functions like SUMIF operate on a range or an array and | use a textual mini-language to determine which values to | operate on. =SUMIF(B1:B2, ">3"), where B1 contains 2 and B2 | contains 4, returns 4, because only B2 satisfies the ">3" | condition. We're not big fans of that mini-language (which | expects you to use the string concatenation operator & to | handle dynamic values). We support it, through an | overloaded function, but also offer a version which takes a | lambda as the second parameter. In Calcapp, SUMIF({ 2, 4 }, | Item > 3) and SUMIF({ 2, 4 }, E -> E > 3) (with a named | lambda parameter) are equivalent to SUMIF({ 2, 4 }, ">3"). | The lambda syntax enables much more complex conditions to | be expressed. | | * Some functions like SORTBY take a sort order parameter, | which is an integer where 1 means sort in ascending order | and -1 means sort in descending order. We support functions | with integer sort order parameters, but also offer an | overloaded version taking an enumerated value instead | (SortOrder.ASCENDING or SortOrder.DESCENDING), which we | think helps with readability. | | * Excel offers logical functions like AND, OR and NOT. We | like the C-style operators &&, || and ! better, so we | support them in addition to Excel's functions. | https://www.calcapp.net/blog/2017/09/16/logical- | operators.ht... | | * Excel's logical operators (like =, <> and <=) return | arrays when applied to arrays and ranges. For instance, { | 1, 2 } = { 1, 3 } returns { TRUE, FALSE }. (This is easier | to see in recent versions of Excel, where array results | "spill," meaning that a return array with two elements | spills to occupy two cells of the grid. In earlier | versions, as well as Google Sheets, you'll have to use | INDEX to tease out the results.) We support = and !=, but | also == and !=, where the two latter operators always | return scalar (non-array) values. == enables users to | determine if two arrays are identical, without having to | use AND on the result array (which is the standard Excel | trick). Also, our == operator is more traditional in that | string comparisons are case-sensitive (unlike =). | | * Experienced Excel users like to use double negation to | turn logical arrays into number arrays. We have a | specialized operator, --, for that, because allowing two | unary operators to follow one another would introduce other | issues. Here's an example: https://exceljet.net/excel- | functions/excel-sumproduct-functi... | | * The Excel operators * and + can be used with logical | arrays and ranges to signify logical AND, as well as | logical OR, respectively. | (https://exceljet.net/formula/sum-if-cells-contain-either- | x-o...) We have overloaded versions of those operators that | take logical arrays for that reason. (Again, Calcapp uses | static typing, which makes this a lot more complicated for | us than it is for Excel.) | | * The latest versions of Excel have great support for | dynamically-allocated arrays (along with new array | functions like FILTER and SORT). Their new calculation | engine supports users providing arrays where, | traditionally, non-array parameter values are expected. If | arrays are provided for these parameters, then an array is | returned. For instance, =SQRT({ 4, 16 }) returns { 2, 4 }. | Microsoft internally calls this "lifting" parameters. We | refer to these parameters as being "array-capable." In our | type system, an array-capable type is a union type | (T|Array<T>), with the notable difference that if an array | is provided to an array-capable parameter, then the return | type is no longer T, but Array<T>. | | We do have a test suite for the runtime formula function | implementation with a couple of thousand unit tests. Given | the number of supported formula functions, though, we | should probably have many more. | | Most of what I have described here pertains to a new | Calcapp version, which hasn't been released yet. In | particular, while I am done implementing the compiler, it | still needs to be documented (lots of Javadoc), user-facing | documentation needs to be written and the runtime system | needs to be implemented, so we still have a few (fun!) | months to go before everything is ready. | | Thanks for taking an interest in what we do. | klelatti wrote: | Not open source but I'm working on a system that turns a DSL | (embedded into Jupyter notebooks) into either c or c with | opencl for a niche finance application. Not a huge project | (<10k lines of code). Nature of the problem means that DSL can | be very simple (it's aimed at non-programmers). This approach | also means that could easily add other targets - e.g. cuda or | wasm. | 37ef_ced3 wrote: | Awesome. Link? | klelatti wrote: | Thanks and sorry - not open source and in testing with | single client at the moment. | archi42 wrote: | Oh, compiler construction :) I really enjoyed that lecture during | my M.Sc. | | "parallelization, just-in-time compilation, and garbage | collection" sound fun, too. Especially JIT. | yla92 wrote: | On the related note, what would you guys recommend as an | introduction to compilers, say for beginners? | gilmi wrote: | cs4410[1] is really really good, and cse131[2] is basically the | same thing but with videos. I highly recommend them both. | | [1]: https://course.ccs.neu.edu/cs4410/ | | [2]: https://ucsd-cse131-f19.github.io/ | matt_d wrote: | Compilers books: I'd start with "Engineering a Compiler" by | Keith Cooper and Linda Torczon. | http://craftinginterpreters.com/ is also a pretty great, | programming-oriented intro, which may be good to work through | alongside. For more on the analysis & compiler optimization | side, "SSA-based Compiler Design" | (http://ssabook.gforge.inria.fr/latest/; GitHub Mirror: | https://github.com/pfalcon/ssabook) is a good follow-up. | | Further readings: Book recommendations in | https://github.com/MattPD/cpplinks/blob/master/compilers.md#... | as well as program analysis resources (in particular lattice | theory, type systems and programming languages theory, related | notation): | https://gist.github.com/MattPD/00573ee14bf85ccac6bed3c0678dd... | | Courses: I can recommend the following: | https://github.com/MattPD/cpplinks/blob/master/compilers.md#... | | Particularly (in alphabetical order--I think these are all | great, so including highlights of what I've liked about them): | | - IU P423/P523: Compilers (Programming Language Implementation) | - Jeremy Siek, with the course book "Essentials of Compilation: | An Incremental Approach" (pretty interesting approach, with | programming language features developed incrementally having a | fully working compiler at each step, cf. | http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf; | implementation language Racket), | | - KAIST CS420: Compiler Design - Jeehoon Kang (good modern | treatment of SSA representation itself, including the use of | block arguments, | https://mlir.llvm.org/docs/Rationale/Rationale/#block- | argume..., as well as SSA-based analysis and optimization; Rust | as an implementation language), | | - UCSD CSE 131: Compiler Construction - Joseph Gibbs Politz, | Ranjit Jhala (great lecturers, both Haskell and OCaml edition | were interesting; fun extra: one of the Fall 2019 lectures | (11/26) has an interesting discussion of the trade-offs between | traditional OOP and FP compiler implementation), | | - UCSD CSE 231: Advanced Compiler Design - Sorin Lerner (after | UCSD CSE 131: for more on analysis & optimization--data flow | analysis, lattice theory, SSA, optimization; fun extra: the | final Winter 2018 lecture highlighted one of my favorite | papers, https://pldi15.sigplan.org/details/pldi2015-papers/31/P | rovab...), | | - UW CSE CSEP 501: Compilers - Hal Perkins (nice balanced | introduction, including x86-64 assembly code generation, with | the aforementioned "Engineering a Compiler" used as the course | textbook). | spekcular wrote: | Thank you so much for this comment. It's incredibly difficult | to find modern compiler learning resources - when I ask, I | often get disparaging comments about the dragon book and not | much more. | | Please, consider putting this list on your blog, or somewhere | more visible than a HN comment. | [deleted] | mhh__ wrote: | Engineering a compiler is by far the best introductory book | that is also modern. | | Advanced compiling for high performance is one of not many good | accounts of scheduling for non-toy architectures. | | Muchnick's book is a good encyclopedia of optimisations | although the code is unreadable and buggy. | UncleEntity wrote: | I've been reading "Modern Compiler Design" currently, not too | sure how beginner friendly it is though. | | Also, haven't made it too far into the book so can't really | give a recommendation either way. | brundolf wrote: | Crafting Interpreters has been great. It starts with an AST- | walking interpreter (in Java), so you get to focus on parsing | and the simplest form of interpretation, and then it moves to a | bytecode interpreter (in C) so you get to learn about compiling | to a more primitive target via a lower-level language (much of | which should translate to a "real" compiler). Very | approachable, has lots of side-context and an entertaining | style. The language you interpret was custom-designed for the | course, which helps a lot, and resembles a simplified | JavaScript without all the messy parts. | diehunde wrote: | Stanford Compilers Course | mhh__ wrote: | No code generation? | glouwbug wrote: | It looks like its more about generating IR. The professor wrote | his own LLVM like tool called Brill, and use of the tool seems | to be the focus of the course. | helloycombinat wrote: | That doesn't sound difficult? if (x86) write byte X, Y, Z else | if arm write byte J K L | richardjennings wrote: | A fantastic submission, very much appreciated. | | I find the presentation style of subtitles with surrounding | context extremely helpful. I often lose the ability to listen | when concentrating; the subtitle context provides a superior | recovery mechanism to the usual, try to work out what was missed. | I look forward to purchasing augmented reality glasses at some | point in the future which add this advantage to in-person | interactions. | loosetypes wrote: | Yes! | | It's the complete opposite of, and significantly more user | friendly than, how zoom recordings synchronize meeting | subtitles with audio/video playback. | | In their case they only show the current line in a text | message-like sidebar. As the content plays, it snaps the scroll | position of the messages to current, hiding the rest. This | breaks scrolling ahead or back while listening and you lose any | context. | | Hopefully I'm just missing a setting - I try to minimize the | time I spend in there. Is there a way I can take their damn | horse blinders off? | CalChris wrote: | I too appreciate the subtitles and I tend to have them on when | I watch+listen to a DVD. You can even click on the sentence and | the video jumps to that point. However ... a little proof- | reading would help here. _To get things started_ gets | transcribed into _Ticketing, sir._ | rhendz wrote: | What would be some cool use cases to design your own compiler? | mhh__ wrote: | The simplest case where they prove useful is usually when your | regex is longest than 20 characters. Write a parser once, and | it'll be readable to people other than yourself (if it isn't | your schema is bad) | MaxBarraclough wrote: | Seems a pity to throw out the various advantages of the high- | level solution just because your regex engine doesn't support | a scalable syntax. | | Advanced regex systems such as the _Ragel_ parser-generator | support a composition-friendly syntax, [0] but the average | regex system doesn 't, and I guess there's no quick fix for | that. | | [0] http://thingsaaronmade.com/blog/a-simple-intro-to- | writing-a-... | MaxBarraclough wrote: | There are various weird and wonderful niche languages out | there, if you appreciate programming languages as an end in | themselves. Two examples: Sentient [0] and Factor [1][2]. | | [0] https://sentient-lang.org/ | | [1] https://factorcode.org/ | | [2] Factor's home-page does a poor job of presenting examples | to give a flavour for the language, so here's one on GitHub: | https://github.com/factor/factor/blob/master/basis/roman/rom... | caditinpiscinam wrote: | Sentient looks neat! Does it use the same strategies as | Prolog under the hood? | MaxBarraclough wrote: | Good question, I'm afraid I don't know enough about either | language to answer that. | | The Sentient pages mention Prolog, for what that's worth: | https://sentient-lang.org/intro/declarative | satthrow wrote: | > The MiniSat solver is Sentient's default. It is a version | of MiniSat that has been compiled to JavaScript with | Emscripten. This means that it can run in a web browser and | it does not have any dependencies when run from the | command-line. | sethhovestol wrote: | A (not)compiler can be made to look for issues in your codebase | that already exists. It would need all the tools of a compiler, | just that the target is the same as the source language. | prando wrote: | Has anyone taken Compilers course from David Beazley: | https://www.dabeaz.com/compiler.html. I'd love to hear your | feedback, please | pbiggar wrote: | Even if you don't do the course, the first paper they list is a | fascinating must-read: | | Producing Wrong Data Without Doing Anything Obviously Wrong! Todd | Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. | ASPLOS 2009. | | https://dl.acm.org/doi/10.1145/2528521.1508275 | | I happened to be at that ASPLOS and I thought it was fascinating. | Good to see it recognized as one of the most important papers in | compilers. | nandaja wrote: | Anyone interested in following the course together during the | holidays? To keep each other motivated etc. | lanius wrote: | Yes! How should we organize this? Maybe start with an IRC | channel? | nandaja wrote: | Yes! an IRC channel would be great! Do you want to start it | and post the info here? Or else I can start | lanius wrote: | You can start :) | iKlsR wrote: | Discord seems faster and more likely to attract a small | group nowadays. | nandaja wrote: | Just created one here https://discord.gg/RBPmmdWg | agumonkey wrote: | I remember using IRC chans when doing coursera moocs, it | was fun. See ya there | JackTheHopper wrote: | Do you have a group for us to join? | nandaja wrote: | Okay, I just created this on discord | https://discord.gg/RBPmmdWg. Considering it might be more | accessible for people. Wondering if I should post this out of | the thread. | mrtranscendence wrote: | I'm interested. I may not have the most relevant background, | though -- I'm a data scientist with an econometrics education. | The topic really does interest me, though. | nandaja wrote: | I'm a devops engineer. So not very related either! | thechao wrote: | If you have to stop "early" please get through the simple DCE | and LVN passes. | | A solid basic-block level DCE & LVN optimization along with a | judicious "compile time evaluator" (simple constant folding) | will let you produce code at around 80% of the level of LLVM. | | Obviously, you're not going to get the magic that comes of | LLVM's scalar analysis pass, but you're not going to be | embarrassed with your performance, either. | hypersoar wrote: | I'm game. My background is in math and not CS, so I can't say | I'm certain I have the prerequisite knowledge. | joe_the_user wrote: | I'm interested. I'm writing a compiler-interpreter now but my | broad compiler-construction chops are spotty and going through | this systematically might be useful. | jbquark wrote: | I would love to join as well. Loop me in too, if there is IRC | channel | nandaja wrote: | Created a discord channel https://discord.gg/RBPmmdWg | bensonalec wrote: | I would love to do so! I'm a masters student who has been | wishing there was an advanced compilers course in my program, | there's only the undergrad level one, so this is perfect! Let | me know how I can get in contact. | jtlienwis wrote: | They should renumber it CS 6502. | glouwbug wrote: | Funny you mention 6502, because as wonderful as its assembly | is, its not that friendly for higher level languages. I tried | writing a basic C like language for it with 16 pointer support | (see LDA ($ZP),Y), and came to the haunting realization that | function calls require pushing the de-reference of zero page | pointers to the stack which change their de-referencing address | space from 0-255 (eg. main) to 256-512 for subsequent stack | frames. | | I then realized that at a speed of 1MHz, that the pointer de- | referencing, the stack frame magic, and endless copies to just | parse basic expressions just made 6502 a pointless target for a | C like language. | | It was fun though. | abdullahkhalids wrote: | This course is asynchronously delivered. | | Question for students: how do you find synchronous courses with | some discussions during class, vs asynchronous courses where | discussions are limited to an online written forum or office | hours? | the_arun wrote: | Asynchronous discussions are durable as content (in the | internet). They are also very useful for working students. But | they need focus to retain them in memory. | kyawzazaw wrote: | Not a problem at all. For non-introductory courses, this works | well. | | But needs to have at least some sort of synchronous at some | point. In my one class, we use Notion for discussing so it has | a live sense of discussion. | cmrdporcupine wrote: | Well, I don't feel like cleaning the basement or finishing my | yard work before the snow, so I might just do this course over | the holiday instead. | mcguire wrote: | The course materials look good. | | But what is a "PhD level course"? All I've seen is undergrad and | graduate, and the difference between masters and PhD has nothing | to do with classes. | abdullahkhalids wrote: | There is no formal difference between masters and PhD courses. | I think he calls it a PhD level course because it is "research | oriented" and requires work that perhaps a Masters student | looking to complete credits might not be interested in. | | The evaluation criteria is also fairly subjective and rough, | which might not work for a larger masters level course. | jcelerier wrote: | > the difference between masters and PhD has nothing to do with | classes. | | huh ? what's your education system ? in europe it's generally 3 | year bachelor, then 2 year masters, then 3 years phd so when | you have classes during your phd hopefully they are different | that the one you get during your masters | hyperpape wrote: | Depends on the field. I studied philosophy (Ph.D. program, | but finished with an M.A.), and the M.A. is often just a | degree given after the first few years of the Ph.D. There are | a handful of required first year courses (logic, basic | metaphysics, philosophy of science and ethics), but otherwise | students take whatever courses are relevant to their | interests. It helps that philosophy typically has fewer | classes with explicit prerequisites. If you take an advanced | course, you might be lost, but it doesn't feel the same as | not being knowing a formula or how to derive something. | | Some courses are more basic, and some are more like | collaborative research seminars. Advanced students gravitate | towards the latter (and typically take fewer courses and/or | audit more of them), but a first year with the right | background can still take them. Conversely, an advanced | student might take an introductory course in an area they | lack background in to broaden their knowledge. | | There also are terminal M.A. programs, but they're less | common, and the majority of students who do Ph.Ds at | prestigious institutions don't do one. | mcguire wrote: | I honestly have never figured out European education systems. | Sorry! | | But this is Cornell, which I assume is like other research- | oriented US universities. You have a 4 (or maybe 5) year | undergraduate, leading to a Bachelors and then apply to | graduate programs. The graduate schools may have specific | Masters tracks, but not really any general Masters program. | Instead, you spend the first year or two taking graduate | classes until you either satisfy the assorted requirements or | pass an oral qualification (They still have those, right?), | after which you focus entirely on research, publications, and | your dissertation defense. If you leave school before | defending, you get the Masters as a sort of consolation prize | ---most US PhDs in my experience don't have Masters. | CodeBrad wrote: | I am in the US, and at my school masters and PhD students all | take the same classes. | | It is not ideal because many of the masters students are | coming from a non-CS background. Some of the graduate level | CS courses I have taken were easier than 4th year level | undergrad classes. | andrewnc wrote: | This tracks well with my experience also. I was | disappointed, for the most part, the several graduate CS | courses for this reason. | mcguire wrote: | By the time I'd finished graduate classes, I had a hard | time with proofs. After the 10th or 12th introduction to | predicate logic with slightly different notation, it just | all becomes a big, confusing mass. | [deleted] | Jarwain wrote: | In the US, it's typically a 4 year undergrad, 2 year masters, | and a ~5-7 year PhD depending on the program itself. From | what I understand, while there are some classes you take as a | PhD student, a majority of your work is doing research | butterthebuddha wrote: | For PhD students, the masters is integrated into the 5 | years they spend doing their PhD. They are expected to take | courses as a "pre-candidate" and do primarily research | after they pass their qualifying exams and become | candidates. | mrtranscendence wrote: | This is usually true but not always. I attended an | economics program with a terminal master's offering, and | we spent a year and a half as master's students before | moving on to PhD work. | cinntaile wrote: | Just for completeness sake. In the EU a bachelor's is 3 | years, a master 1 or 2 (2 being the standard) and a PhD 3-5 | years. It depends on the country, so it's possible to get | 3+2+5. | pfdietz wrote: | Nice, although I'd have wanted a segment on compiler testing. | samps wrote: | Indeed; I previously had these papers on the list but had to | take them out for time this semester: | | - Finding and understanding bugs in C compilers. Xuejun Yang, | Yang Chen, Eric Eide, and John Regehr. PLDI 2011. | https://dl.acm.org/citation.cfm?id=1993532 - Compiler | validation via equivalence modulo inputs. Vu Le, Mehrdad | Afshari, and Zhendong Su. PLDI 2014. | https://dl.acm.org/citation.cfm?id=2594334 | pfdietz wrote: | There are just too many fun things to teach. | | More recent paper: yarpgen | | https://github.com/intel/yarpgen/blob/main/papers/yarpgen- | oo... ___________________________________________________________________ (page generated 2020-12-11 23:00 UTC)