[HN Gopher] The naked truth about writing a programming language...
       ___________________________________________________________________
        
       The naked truth about writing a programming language (2014)
        
       Author : pcr910303
       Score  : 71 points
       Date   : 2020-05-02 18:23 UTC (4 hours ago)
        
 (HTM) web link (www.digitalmars.com)
 (TXT) w3m dump (www.digitalmars.com)
        
       | chubot wrote:
       | On regexes, he is generalizing from a small number of examples.
       | Some lexers are straightforward to write by hand, but others are
       | better done with a code generator.
       | 
       | edit: I should really say that you should consider using _regular
       | languages_ to lex your programming languages, not (Perl-style)
       | _regexes_. Those are different things:
       | 
       | https://swtch.com/~rsc/regexp/
       | 
       | I used re2c for Oil and it saves around 5K-10K lines of
       | "groveling through backslashes and braces one a time" (e.g. what
       | other shells do)
       | 
       | http://www.oilshell.org/blog/2019/12/22.html#appendix-a-oils...
       | 
       | And the parser is faster than bash overall:
       | 
       | http://www.oilshell.org/blog/2020/01/parser-benchmarks.html
        
       | matheusmoreira wrote:
       | Is it true that optimization follows the 80/20 rule? What are
       | some common performance issues that new languages and their
       | implementations face? Are there common optimization techniques
       | that can be applied in order to make the new language competitive
       | with existing ones?
       | 
       | For example, I know that it's generally better to compile
       | programs into a linear code structure such as bytecode instead of
       | interpreting a tree structure directly.
        
         | barrkel wrote:
         | Linear encoding is better than interpreting a tree because
         | accessing adjacent elements in arrays is faster than chasing
         | pointers.
         | 
         | Poor performance depends on what your language does. What's
         | poor performance for SQL will be different than for JS.
         | Features which don't have efficient implementations will be
         | slow. If the language encourages use of inefficient features,
         | then it will generally be slower.
         | 
         | The easiest optimization technique is to leverage an existing
         | back end; for example, target LLVM or JVM or .NET.
        
           | matheusmoreira wrote:
           | > Linear encoding is better than interpreting a tree because
           | accessing adjacent elements in arrays is faster than chasing
           | pointers.
           | 
           | Yes. From this we can derive some general optimization
           | principles: reduce indirection, increase data locality.
           | 
           | Are there any others?
           | 
           | > Poor performance depends on what your language does. What's
           | poor performance for SQL will be different than for JS.
           | 
           | Let's consider modern dynamic languages like Javascript and
           | Python. What performance problems did these languages face?
           | Which optimizations had the biggest impact on code execution
           | performance? Which optimizations were the easiest to
           | understand and implement?
           | 
           | For example, I know Javascript virtual machines will
           | automatically create hidden classes for objects. This allows
           | the data to be accessed by constant offsets rather than
           | lookups.
        
         | WalterBright wrote:
         | Optimizing code is a book-length topic just for an
         | introduction. It's also true that knowing how optimizers work
         | can feed back into improving the language design.
         | 
         | For example, `const` in C++ doesn't mean the data is immutable
         | - it can change with any assignment through a pointer. No
         | optimizations assuming immutability will work. That's why D has
         | an `immutable` qualifier, giving the optimizer to do
         | optimizations assuming it does not change.
         | 
         | For a famous example, Fortran assumes two arrays never overlap.
         | In C/C++ they can. This is the source of a persistent gap in
         | performance between Fortran and C/C++. C attempted to fix it by
         | adding the `restrict` qualification, but this failed because it
         | is just too arcane and brittle for most users.
         | 
         | In D, I'm working on adding an Ownership/Borrowing system,
         | which will enable the compiler to figure out the pointer
         | aliasing optimization opportunities.
        
           | muldvarp wrote:
           | An Ownership/Borrowing system like the one Rust uses? Rust is
           | sadly not (yet) able to use its stricter rules about
           | ownership and aliasing for optimization, because the LLVM
           | backend has been shown to have a number of bugs that prevent
           | such optimizations.
           | 
           | Sadly, I'm not that knowledgeable on optimization. Do you
           | expect that such optimizations would make most programs
           | significantly faster or is it more of a special case that
           | would not improve the performance of most programs?
           | 
           | Keep up the good work with D. I especially love its
           | template/metaprogramming capabilities!
        
             | WalterBright wrote:
             | > Do you expect that such optimizations would make most
             | programs significantly faster or is it more of a special
             | case that would not improve the performance of most
             | programs?
             | 
             | I don't know because I have no personal experience with
             | such. What I have to go on is the Fortran vs C/C++
             | experience I mentioned.
        
       | Animats wrote:
       | That's a good set of questions for 2014. Questions that have
       | become important more recently include:
       | 
       | - Imperative? Functional? Some mixture of both? Mixtures of the
       | two tend to have syntax problems.
       | 
       | - Concurrency primitives. The cool kids want "async" now. Mostly
       | to handle a huge number of slow web clients from one server
       | process. Alternatively, there are "green threads", which Go calls
       | "goroutines". All this stuff implies some level of CPU
       | dispatching in compiled code.
       | 
       | - Concurrency locking. Really hard problem. There's the
       | Javascript run to completion approach, which reappears in other
       | languages as "async". Ownership based locking, as in Rust, is
       | promising, but may lock too much. And what about atomic
       | operations and lockless operations? Source of hard to find bugs.
       | Get this right.
       | 
       | - Ownership. Rust got serious about this, with the borrow
       | checker. C++ now tries to do "move semantics", with modest
       | success, but has trouble checking at compile time for ownership
       | errors. Any new language that isn't garbage collected has to
       | address this. Historically, language design ignored this problem,
       | but that's in the past.
       | 
       | - Metaprogramming. Templates. Generics. Capable of creating an
       | awful mess of unreadable code and confusing error messages. From
       | LISP macros forward, a source of problems. Needs really good
       | design, and there aren't many good examples to follow.
       | 
       | - Integration with other languages. Protocol buffers? SQL? HTML?
       | Should the language know about those?
       | 
       | - GPUs. Biggest compute engines we have. How do we program them?
        
         | zzo38computer wrote:
         | I would say possibly to use a different programming language
         | for GPU than CPU. I like the idea of Checkout (see [0]) for GPU
         | programming, although unfortunately it is not implemented and
         | the preprocessor is not yet invented. (The trigonometric
         | functions are also missing. They should probably have at least
         | sine, cosine, and arctangent, since these functions seem like
         | useful to me when doing graphics.)
         | 
         | [0] http://esolangs.org/wiki/Checkout
        
       | WalterBright wrote:
       | Author here. AMA!
        
         | nafey wrote:
         | Are you familiar with Jai language being developed by Jonathan
         | Blow? What is your opinion on its design?
        
           | WalterBright wrote:
           | My opinion is that Jonathan Blow is an amazing developer and
           | he should be joining us on D!
           | 
           | It's been a while since I reviewed the language, and I'd have
           | to re-review it to say anything sane here. But one thing
           | stuck out at me. In D, the support for Compile Time Function
           | Execution (CTFE) is very extensive, and opens the door to a
           | lot of incredible metaprogramming abilities. Jai has this
           | too, but has extended it to allow system calls.
           | 
           | While his demos of programs being run in the compiler is
           | impressive, D doesn't support CTFE system calls for a good
           | reason:
           | 
           | It'll become a vector for malware. People download source
           | code from the intertoobs all the time, and just compile it.
           | Heaven help us if that means your system is now compromised.
           | I'm not going to open that door and make people afraid to
           | compile D code.
           | 
           | It's not worth it.
        
             | h-cobordism wrote:
             | I assume most people run the code they compile on the very
             | machine they compile it on, so how does preventing compile-
             | time system calls help stop malware?
        
             | tomp wrote:
             | Why not instead alert the programmer that compiling this
             | code would call such-and-such system calls?
             | 
             | That's IMO how all software should work. Apple did
             | something similar, AFIAK, in the latest version of MacOS,
             | and failed - it's obviously something that is extremely
             | hard to retrofit onto existing software. But allowing
             | arbitrary execution for _new_ software, and sensibly
             | limiting it (e.g.  "this program cannot access the
             | internet") is, I predict, fairly feasible.
        
             | ChrisKnott wrote:
             | I mean, do those people who download and compile code not
             | then generally... run it...?
             | 
             | Perhaps I'm missing something but the issue here seems to
             | be whether you trust the source, not whether syscalls are
             | made at compile time or run time?
        
             | david2ndaccount wrote:
             | Other languages (such as python) have install-time code
             | execution. When you pip install a library, the library can
             | run arbitrary code and of course has access to anything and
             | everything via the python interpreter. The only thing
             | protecting you is faith in pypi. How is compile-time code
             | execution any different?
        
         | Rochus wrote:
         | The article is from 2014. Is it still current? What would you
         | change in 2020?
        
           | WalterBright wrote:
           | I'd write the same thing today. 6 years more experience just
           | confirms it.
        
             | Rochus wrote:
             | Good to know, thanks.
        
         | Rochus wrote:
         | What is your opinion about self-hosting (i.e. writing the
         | parser/compiler in its own language)? Is that really desirable,
         | or even necessary, or just a gimmik (I know what Wirth says,
         | wonder what you think)?
        
           | mhh__ wrote:
           | (Not Walter but) Self-hosting in a compiler should be done
           | wherever possible - if we ignore that if the compiler self-
           | hosts it becomes it's own test suite (writing tests is good
           | but it's quite difficult to find weird behavioural bugs using
           | unittests if the tests only test one thing at a time) - it
           | makes it so much easier to work on the compiler (For example,
           | the main D compiler can build itself in a second or two on my
           | machine - if I had to cart around some other huge toolchain
           | it would be much slower). Another boon is that, if people who
           | are really good (let's say) Go programmers want to make the
           | Go compiler better they then don't have to try and write go
           | in C++ if the main compiler is written in C++.
           | 
           | Also, it can also be a plan to rush to a MVP, rewrite the MVP
           | in your language and go from there. This has the added
           | benefit of giving you a decent sized program in your language
           | at no extra cost - aside from testing this should make you
           | design a better language as you learn what works at what
           | doesn't in the "real world".
           | 
           | It's not always possible (Please don't write a compiler in
           | Javascript, or even C if possible - pain and lack of
           | abstraction respectively).
        
       | zzo38computer wrote:
       | In some programming languages, e.g. TeX, Forth, PostScript, etc
       | you cannot really syntax highlight the program without executing
       | it. However, syntax highlighting is a feature that I can do
       | without, and I don't think those programming languages are bad.
       | In some programming languages, such as C and Free Hero Mesh, you
       | can read the sequence of tokens despite there are macros, which
       | you need not parse to find where the tokens are (although I think
       | older versions of the C preprocessor did not work like this).
       | 
       | They mention tools. I do think valgrind is sometimes helpful; I
       | do use that. However, Git is not the only version control system;
       | some people prefer others, such as Mercurial, Fossil, etc. I use
       | Fossil.
       | 
       | They also mention error messages. I like the first one; print one
       | error message and quit. However, in some cases, it might be
       | possible to easily just ignore that error and continue after
       | displaying the error message (and then finally fail at the end).
       | It might also be possible to skip a lot of stuff, and then
       | continue, e.g. if there is an unknown variable or function then
       | you might display an error message and then skip the entire
       | statement that mentions it.
       | 
       | They mention a runtime library. This should be reduced as much as
       | possible, I think, and if you can make it so that it has
       | functions which will only be included in the compiled program
       | when they are used, that can be a good idea, since you can
       | include some functions that many people don't need and don't
       | waste space and time with them.
       | 
       | Something they did not mention but I think it is very good to
       | have and useful is metaprogramming and 'pataprogramming.
       | 
       | But I also think that different programming languages can be good
       | for different purposes, and that some are domain specific, and
       | some are easier to write than others. This can be done using
       | existing syntax or new syntax, and I have done both, and so have
       | some others.
       | 
       | Another thing that I can recommend to have is a discussion system
       | with NNTP; even Digital Mars themself have a NNTP-based
       | discussion system. Perhaps better will be to have the same
       | messages with NNTP, web forum, and mailing lists; again, this is
       | what Digital Mars does.
       | 
       | However, I do think that minimizing keystrokes is helpful.
       | Minimizing runtime dependencies and runtime memory usage is also
       | helpful.
        
       | xiphias2 wrote:
       | ,,grammar should be redundant. You've all heard people say that
       | statement terminating ; are not necessary because the compiler
       | can figure it out. ''
       | 
       | I'll stay with my non-redundant Julia language, thank you very
       | much. It gave me a lot of joy to programming, and I don't
       | remember having big problems with error messages. Even if I have,
       | syntactic errors are trivial to find.
       | 
       | The hard parts of efficient programming are memory management
       | (garbage collection), controlling vectorizing instructions,
       | balance between ease of GPU programming and efficiency,
       | parallelization, not semicolons at the end of a line.
       | 
       | All programming languages have to decide on how low level control
       | they give to the computer resources, and how they abstract them,
       | to make high level programming possible without sacrificing too
       | much efficiency.
        
         | adimitrov wrote:
         | Whilst you're not wrong about what _you_ value about a
         | programming  "language", I'd say there is an important
         | distinction to be drawn between two things people tend to
         | conflate: the _language_ and the _runtime_.
         | 
         | Perhaps a counterexample is in order to explain it: Java the
         | language and Java the runtime (VM, stl) are two different
         | things entirely. Kotlin, for example makes a lot of different
         | choices in _language design_ (e.g. semicolons) while reusing
         | the same _runtime_ (GC, threads, etc.)
         | 
         | Of course, you can often not tease these two concerns apart
         | easily and neatly, but there is merit in giving thought to the
         | ergonomics of the pure _language_ side of things. At the end of
         | the day, a programming language doesn 't need an implementation
         | to be useful (e.g. as a teaching tool) as it's an abstract,
         | formal concept.
        
       | pbiggar wrote:
       | This advice is quite good for certain types of programming
       | language, if you look at the world the same was as he does. We're
       | implementing Dark from a completely different worldview, and from
       | that vantage point, a lot of these aren't exactly wrong, but
       | different.
       | 
       | > Syntax matters
       | 
       | The approach we took with Dark is that there isn't a syntax, per
       | se, in that there isn't a parser. There's certainly a textual
       | view of Dark and so it's important that the code looks good (it
       | currently looks only OK, in my opinion). But as a result, we have
       | other options for minimizing keystrokes (autocomplete built-in),
       | parsing (again, no parse), and minimizing keywords (you're
       | allowed have a variant with the same name as a keyword, which
       | isn't allowed in languages with parsers (well, lexers, but same
       | idea).
       | 
       | > context free grammars
       | 
       | No parser, no need to have a grammar. His point about IDEs is
       | great - we only support our own IDE (a controversial decision to
       | be sure!)
       | 
       | > Redundancy
       | 
       | This is an esoteric parsing problem, that only applies if you
       | have a parser. No parser means no syntax errors. We are left with
       | editor errors (how does the editor have good UX) and run-time
       | errors.
       | 
       | > Implementation
       | 
       | He's right about how hard error messages are, so I feel good
       | about our "not parsing" approach.
       | 
       | > Compiler speed
       | 
       | Our compilation is instant. The way it's instant is:
       | 
       | - very small compilation units: you're editing a single function
       | at a time, and so nothing else needs to be parsed.
       | 
       | - no parser: The editor directly updates the AST, so you don't
       | have to read the whole file (there isn't a "file" concept). Even
       | in JS, that means an update takes a few milliseconds at the most.
       | 
       | - extremely incremental compilation: making a change in the
       | editor only changes the exact AST construct that's changing.
       | 
       | > Lowering
       | 
       | This is really about compilation. One thing you can do, which is
       | what we do, is have an interpreter. Now, interpreters are slow,
       | but we simply have a different goal with the language, which is
       | to run HTTP requests quickly. We do run into problems with the
       | limit of the interpreter, but we plan to add a compiler later to
       | deal with this.
       | 
       | Really what I'm suggesting here is that compiled languages look
       | at having interpreters in addition to compilers.
       | 
       | > i/o performance
       | 
       | > memory allocation
       | 
       | I think he's approaching this with an implicit goal of "it must
       | be as fast as possible", which isn't necessarily a goal for all
       | languages.
       | 
       | > You've done it, you've got a great prototype of the new
       | language. Now what? Next comes the hardest part. This is where
       | most new languages fail. You'll be doing what every nascent rock
       | band does -- play shopping malls, high school dances, dive bars,
       | etc., slowly building up an audience. For languages, this means
       | preparing presentations, articles, tutorials, and books on the
       | language. Then, going to programmer meetings, conferences,
       | companies, anywhere they'll have you, and show it off. You'll get
       | used to public speaking, and even find you enjoy it (I enjoy it a
       | lot).
       | 
       | Well this is certainly correct!
        
         | tomp wrote:
         | > The approach we took with Dark is that there isn't a syntax,
         | per se, in that there isn't a parser.
         | 
         | Can you explain how that works? Unless Dark is a purely visual
         | programming language (in which case I'd say that there is
         | "syntax", it's just a bit more abstract, and in any case, it
         | seems that visual languages haven't really caught up as an
         | idea), I find that hard to believe.
        
       | somewhereoutth wrote:
       | I am currently developing a language, though more as a research
       | project than anything practical. Apologies for jumping on this
       | post, but it is a good opportunity to organize and set out my
       | thoughts (and possibly someone might find it interesting):
       | 
       | Essentially the language is a pure functional language that takes
       | the untyped lambda calculus and adds decoration terms as first
       | class citizens in the calculus. These terms can be used to wrap
       | selected combinators (eg church numerals). The normal beta
       | reduction rules are extended to handle these decorations in a
       | useful way.
       | 
       | The decorations allow predicates to be formed that are total over
       | the term space (eg isChurchNumeral?), which means that function
       | call identifiers can be dynamically dispatched based on their
       | arguments (using a certain amount of term assembly from the
       | surface syntax). Adhoc polymorphism can thus be encoded in a
       | transparent fashion.
       | 
       | This has been sufficient to build a numerical tower up to and
       | including the Complex numbers, such that the usual operations
       | add, mul, sub, div are defined within and between Natural,
       | Integer, Rational and Complex numbers. It can also manipulate
       | strings as if they were lists, whilst retaining their
       | 'stringyness'. All without needing number or string specific
       | features internal to the runtime (save parsing and formatting on
       | the way in and out).
       | 
       | REPL examples:                 > (sum [1 -3 2.5 3/2 1+2i])
       | > 3+2i            > (reverse "hello")       > "olleh"
       | 
       | Something resembling Haskell's typeclasses naturally arises, with
       | the usual definitions of Functor, Monad, and Applicative.
       | 
       | It is, needless to say, astonishingly slow.
        
         | tpush wrote:
         | From your description it sounds like these decorators are an
         | alternative to type annotations?
         | 
         | Mind showing how these decorators look and work? I'm also
         | building a language, and always interested in seeing novel
         | features like these :).
        
       | billconan wrote:
       | I want to implement a toy programming language, but I have
       | questions regarding the following in that article:
       | 
       | > Context free grammars. What this really means is the code
       | should be parseable without having to look things up in a symbol
       | table. C++ is famously not a context free grammar. A context free
       | grammar, besides making things a lot simpler, means that IDEs can
       | do syntax highlighting without integrating in most of a compiler
       | front end, i.e. third party tools become much more likely to
       | exist.
       | 
       | Many complex languages, like c++, rust are not context free. We
       | therefore need semantic analysis.
       | 
       | What programming language features will be compromised if I stick
       | to context free?
       | 
       | Will the end result be as expressive as c++/rust?
       | 
       | Any programming language that is massively adopted is context
       | free?
       | 
       | BTW, the toy language I want to build will be something similar
       | to javascript.
        
         | desc wrote:
         | https://stackoverflow.com/questions/898489/what-programming-...
         | 
         | Most languages have context-free syntax, which is what the
         | article refers too. There really is no reason to sacrifice
         | that. Even modern PHP recognises the value of having a parse
         | tree independent of an entire compiler.
         | 
         | Context-free _semantics_ is an entirely different matter, and I
         | 'm not even sure what it'd mean...
        
           | chrisseaton wrote:
           | > Most languages have context-free syntax
           | 
           | Do they? I haven't done a survey of languages but I would
           | guess most are context-sensitive.
        
           | chubot wrote:
           | The two top answers are in conflict. The second answer with
           | 43 points is closer to right:
           | 
           |  _There are hardly any real-world programming languages that
           | are context-free in any meaning of the word._
           | 
           | The first answer with 41 points is totally wrong: _The set of
           | programs that are syntactically correct is context-free for
           | almost all languages_
           | 
           | -----
           | 
           | A better source is this whole series by Trevor Jim, which has
           | nice ways of relating theory to practice. He lists a bunch of
           | reasons why you could consider nearly all programming
           | languages not context-free.
           | 
           | http://trevorjim.com/parsing-not-solved/ -- hundreds of
           | parser generators support context-free grammars, but there
           | are almost no context-free languages in practice.
           | 
           | http://trevorjim.com/python-is-not-context-free/ -- A main
           | point here is that you have to consider the lexer and parser
           | separately. Grammars don't address this distinction, which
           | arises in essentially all programming languages. Also there
           | is some lexical feedback, similar in spirit to C's lexer
           | hack.
           | 
           | http://trevorjim.com/haskell-is-not-context-free/
           | 
           | http://trevorjim.com/how-to-prove-that-a-programming-
           | languag...
           | 
           | http://trevorjim.com/c-and-cplusplus-are-not-context-free/ --
           | the way LALR(1) conflicts are resolved in practice can make a
           | language not context-free
           | 
           | (copying from a recent comment)
        
             | [deleted]
        
             | WalterBright wrote:
             | The implementation of D has a lexer that is independent of
             | the parser, and a parser that is independent of the rest of
             | the implementation. I have resisted enhancement proposals
             | that would put holes in those walls.
        
               | [deleted]
        
               | billconan wrote:
               | could you give an example for
               | 
               | > Any programming language that is massively adopted is
               | context free?
               | 
               | I want to have an intuition on what is context free and
               | what is not.
               | 
               | Is D context free?
               | 
               | I mean I want to make a useful language, is being context
               | free practical?
        
               | WalterBright wrote:
               | D is context free in that the parser does not require
               | semantic analysis in order to complete its parse.
        
             | billconan wrote:
             | Thank you very much for the references. They are very good
             | reads!
        
             | zzo38computer wrote:
             | Is Haskell context-free if you don't use the indentation
             | layout mode? Haskell does support using braces and
             | semicolons too. However, this is not true of Python (as far
             | as I know).
        
             | matheusmoreira wrote:
             | > A main point here is that you have to consider the lexer
             | and parser separately. Grammars don't address this
             | distinction, which arises in essentially all programming
             | languages.
             | 
             | So why does this happen? Couldn't context-free language
             | parsers emulate a lexer by treating individual characters
             | as symbols?
        
               | chubot wrote:
               | If there's no feedback, you can consider the lexer and
               | parser separately as languages.
               | 
               | - The lexer recognizes a set of strings of characters.
               | 
               | - The parser recognizes a set of strings of tokens (as
               | returned by the lexer)
               | 
               | But those two languages will have different power in
               | general, so, like Jim points out, when you say something
               | like "Python is context-free" or "Haskell context-free",
               | it's not clear what you're talking about. And it's really
               | false under any reasonable interpretation.
               | 
               | So you can consider them separately, and make a precise
               | statement. But if there's feedback between the two, then
               | you can't do that anymore. The theory doesn't tell you
               | any properties a that the (lexer + parser + feedback
               | mechanism) possess.
               | 
               | That is, regular languages and context-free languages
               | have all sorts of properties proven about them, including
               | ones that let you write code generators. You don't get
               | any of those properties when you have ad hoc feedback
               | mechanism between the lexer and parser.
               | 
               | ----
               | 
               | Someone could come up with formalisms for specific types
               | of feedback.
               | 
               | I think OCaml's Menhir has done some of this, but I don't
               | remember the details off hand.
               | 
               | They have written parsers for C (CompCert) and POSIX
               | shell and addressed some of the gaps between theory and
               | practice. I don't use it but they're at least tackling
               | the right problems.
               | 
               | But note that C uses a specific type of feedback (the
               | lexer hack), which isn't identical to what other
               | languages use. So you would have to come up with a
               | formalism for each one, and probably nobody has done
               | that.
        
         | tomp wrote:
         | The purpose of context-free, or rather _unambiguous_ grammars
         | in general, isn 't for the benefit of _compilers_ , but for the
         | benefit of _humans_. A computer can parse almost everything,
         | even regexes (with backtracking etc.), it 's the humans that
         | trip over fancy grammars.
        
         | steveklabnik wrote:
         | Rust is _almost_ context free; there's one relatively rarely
         | used bit that requires context. And usually it's only one or
         | two bytes of it.
        
           | h-cobordism wrote:
           | You're referring to raw string literals, right?
        
             | steveklabnik wrote:
             | Yes. You need to include more #s than you have in the
             | string, and it's rare to have a ton of them, let alone a
             | raw string literal in the first place.
        
           | [deleted]
        
         | tom_mellior wrote:
         | This point in the article is pretty weird (the rest is good). I
         | don't think IDEs do a lot of parsing to do syntax highlighting,
         | isn't it all just regex matching to identify the types of
         | tokens? I'd be interested in real-world examples of IDEs doing
         | something more complex to achieve syntax highlighting. And
         | conversely, examples of imperfect syntax highlighting of C++
         | due to the undecidability of its input language. I mean, yes,
         | an actual parser for C++ must be able to execute template
         | computations, and that's a pain, but why would that be relevant
         | to syntax highlighting? It isn't.
         | 
         | Now, there _are_ valid reasons for running a proper language
         | frontend from the IDE: Error reporting of all kinds, not just
         | syntax but also type errors and whatever else the compiler
         | likes to complain about. But there the IDE should _not_ try to
         | replicate the parser. Parsing only a context-free surface
         | language will not catch type errors, exactly because enforcing
         | a static type system requires one to  "look things up in a
         | symbol table". So the actual compiler should provide an IDE-
         | friendly mode where it runs its frontend and reports errors
         | back, and the IDE should not try to roll its own version of
         | this.
         | 
         | In other words, the IDE is not relevant either way.
         | 
         | > Any programming language that is massively adopted is context
         | free?
         | 
         | Statically typed languages require semantic analysis and symbol
         | table lookups, so they are out.
         | 
         | And yet we have context-free grammars for all widely adopted
         | statically typed programming languages. And we also have
         | additional semantic checks for all of them. The notion of a
         | "context free programming language" being one that has context
         | free syntax and no semantic checks at all is not a useful
         | notion. Don't worry about it while designing your language.
        
           | alpaca128 wrote:
           | > I don't think IDEs do a lot of parsing to do syntax
           | highlighting, isn't it all just regex matching to identify
           | the types of tokens?
           | 
           | IntelliJ highlights variables etc. differently if they're
           | unused and can highlight different identifiers in different
           | colors. Not sure about other current IDEs.
           | 
           | That said I never felt like I needed such features, normal
           | syntax highlighting does the job pretty well for me.
        
             | dehrmann wrote:
             | IntelliJ has context-aware autocomplete. Type
             | this.field.get and it will show methods and fields starting
             | with get for whatever type field is. There's a lot of
             | context needed to make that happen.
        
           | geophile wrote:
           | I'm pretty sure that the JetBrains products (Intellij,
           | PyCharm, etc.) have a pretty deep understanding of the
           | languages they handle. Java is strongly typed, and Intellij
           | does a perfect job, (as far as I can tell), of highlighting,
           | and more importantly, refactoring. Hard to see how they could
           | do that with just regex matching.
           | 
           | By contrast, Python is more dynamic, and so the refactoring
           | in PyCharm is pretty weak. It usually catches some of what it
           | needs, but I often need to find the rest. It does appear to
           | be the case that PyCharm is doing flow analysis to infer
           | types. I.e., even PyCharm is doing more than regex matching.
        
           | pasquinelli wrote:
           | >> Any programming language that is massively adopted is
           | context free?
           | 
           | > Statically typed languages require semantic analysis and
           | symbol table lookups, so they are out.
           | 
           | Isn't that the typechecker, not the parser?
        
           | neutronicus wrote:
           | Emacs definitely does parsing of C/C++ (or delegates to an
           | LSP server, depending) in order to do indentation, users'
           | preferences for which tend to depend heavily on syntactic
           | context.
        
       | mrlonglong wrote:
       | Zortech C was fabulous, wrote a disk editor in it many moons ago.
       | Kudos!
        
       | DonaldFisk wrote:
       | You might be doing this as a learning exercise, in which case it
       | doesn't need to be particularly innovative, and writing another
       | Lisp or Forth implementation is fine. In fact, I'd recommend
       | beginning along those lines, followed by developing several
       | different new languages, probably domain-specific ones. Your
       | first attempts probably won't be worth keeping. (I haven't kept
       | the object oriented Prolog I wrote about 25 years ago, but I do
       | regularly use a small Prolog I recently wrote in Lisp.)
       | 
       | The languages at the roots of all programming languages are
       | Fortran, Lisp, Cobol, Algol60, APL, Snobol, Forth, Prolog, OPS5,
       | SASL, Smalltalk, Prograph. (If I've missed any out, or any of the
       | above have important predecessors, please let me know.) A new
       | language is unlikely to be radically different from any of those
       | languages or their descendants - of those I listed, the most
       | recent is from the mid 1980s. Within the descendants, which is
       | your favourite? Can you make big improvements to it? If so, make
       | them. Otherwise, can you improve another language so that it
       | becomes your favourite?
       | 
       | One heuristic I've found useful: the right design choices at the
       | start might be unknown, but when I need make them, they're
       | obvious to me, and often a particular choice leads to other
       | equally obvious choices. If the choice isn't obvious, I've
       | probably done something wrong.
       | 
       | My own preference is for pure rather than hybrid languages with
       | simple regular syntaxes (i.e. one idea expressed well), and very
       | strong static typing.
       | 
       | My favourite language is Common Lisp or rather a subset of it. I
       | have trouble thinking of ways to make it significantly better.
       | But I also like the idea of dataflow (a few decades ago I knew
       | some of those working on experimental dataflow hardware), and
       | graphical programming seems to be the right approach to dataflow.
       | Prograph was the closest existing language to what I wanted. It's
       | isn't pure dataflow, it's dynamically typed, it's object oriented
       | (so, unless it's Smalltalk it's hybrid), and entering programs in
       | it involves too many mouse gestures and keystrokes. So I'm
       | implementing a new language which corrects those deficiencies.
       | It's been particularly difficult because I have to write an IDE
       | as well as the language, and there aren't any textbooks I can
       | rely on for help.
       | 
       | I'll have succeeded if I use the new language more often than I
       | do Lisp, and if along the way I get a few more papers (two so
       | far) and some academic interest, so much the better. I don't
       | expect a wide user base. If that was important to me, it would
       | have C syntax and I'd try to get corporate backing. Your
       | criterion for success might differ from mine.
        
       | didibus wrote:
       | After learning s-expression based syntax, I am just baffled why
       | we even bother with anything else.
       | 
       | When you play around with different Lisps, the syntax is always
       | the same, the language differences becomes the semantics only.
       | 
       | Other languages put too much emphasis on the syntax in my
       | opinion. And while I understand the "popularity" appeal. I've
       | almost never seen someone learning the s-expression syntax and
       | afterwards not liking it.
       | 
       | Basically I think it be worth it to push people to learn the
       | s-expression syntax just so we can stop wasting our time with
       | syntax afterwards.
       | 
       | Also if I recall, without user macros, I think s-expression
       | syntax is context free no?
        
         | chrisseaton wrote:
         | > I've almost never seen someone learning the s-expression
         | syntax and afterwards not liking it.
         | 
         | I don't particularly like s-expression syntax - I think
         | s-expressions suit the computer at the expense of the
         | programmer, which I think is backwards. I think they're verbose
         | and noisy and that obscures the meaning I want to see in the
         | text as a person. Yes they're easier to parse, but I want to
         | make my life easier, not the computers. And yes they're
         | convenient for metaprogramming but I don't want to optimise for
         | the meta case.
         | 
         | A concrete example - I'm very happy working with precedence.
         | I've been doing it since I started school. My five-year-old can
         | understand precedence. Using precedence I can reduce ceremony
         | and noise in my code and allows me to more naturally look at an
         | expression and take in its meaning. But s-expressions don't
         | like using precedence.
        
         | WalterBright wrote:
         | This reminds me of the debates in college (late 1970's) of RPN
         | calculators vs Infix calculators.
         | 
         | RPN's big advantage was reduced number of keystrokes, which is
         | important with a calculator. But when dealing with formulas
         | where you can actually type with a keyboard the number of
         | keystrokes is not important, readability is much more
         | important, and infix wins.
         | 
         | > I've almost never seen someone learning the s-expression
         | syntax and afterwards not liking it.
         | 
         | I'm a counterexample. Sorry :-)
        
         | daenz wrote:
         | It's simple, but it's fundamentally reductionist. You can ease
         | the cognitive load by making certain programming structures
         | first-class citizens in the language.
        
           | amelius wrote:
           | Yes but now you have to worry about parentheses and at what
           | nesting level certain constructs should live.
        
           | F-0X wrote:
           | > You can ease the cognitive load by making certain
           | programming structures first-class citizens in the language.
           | 
           | Which, in every s-expression based language I'm aware of, can
           | be achieved using macros.
        
             | chrisseaton wrote:
             | Isn't that the point - s-expressions aren't good enough so
             | people work around them by writing more conventional
             | languages and parsers (the reader macros) to avoid having
             | to write in them.
        
               | catalogia wrote:
               | I don't think that comment was about reader macros. I
               | don't see reader macros used much, while normal macros
               | are frequently used to create new control structures.
        
       | jakear wrote:
       | Off topic:
       | 
       | Annoying that this is totally unreadable even at 300% zoom on an
       | iPhone 11 Pro.
       | 
       | I feel like a set of people refuse to learn proper HTML/CSS as
       | some sort of statement, not realizing their laziness renders
       | their work unavailable to the visually disabled.
       | 
       | HN behaves similarly poorly.
        
         | nerdponx wrote:
         | The irony is that if this site were _actually_ simple (i.e.
         | just text without a sidebar) it would be as responsive as you
         | needed it to be. In the worst case scenario, you could fix it
         | with client-side CSS.
         | 
         | The problem here is that the site is complicated beyond what
         | HTML is meant to do: it has a sidebar. It's no longer just a
         | document; it's a document _and_ a navigation menu. Effectively
         | it 's a small software application, and once you're in the
         | business of writing software applications you ought to be in
         | the business of accessibility and UX design.
        
       ___________________________________________________________________
       (page generated 2020-05-02 23:00 UTC)