[HN Gopher] My experience binding a couple of scripting engines ...
       ___________________________________________________________________
        
       My experience binding a couple of scripting engines with C++
        
       Author : germandiago
       Score  : 56 points
       Date   : 2021-05-24 09:38 UTC (13 hours ago)
        
 (HTM) web link (germandiagogomez.medium.com)
 (TXT) w3m dump (germandiagogomez.medium.com)
        
       | heinrichhartman wrote:
       | > [lua] was discarded because of 3. It has unfamiliar syntax, but
       | worse, unfamiliar semantics: no classes, use tables, start
       | indexing at 1 and other oddities, just as being able to call
       | functions with the wrong number of arguments and returning nil on
       | the way. Also, use tables for both hash tables and arrays. It is
       | powerful, do not misunderstand me, and Lua supports good
       | concurrency. It was just not what I was looking for because of
       | the mentioned things.
       | 
       | I can fully understand why lua is not a good fit for this case,
       | however, I would like to add some color to the picture.
       | 
       | The most powerful way to for C(++) - lua interop, is not the
       | official CAPI but luajit/FFI: https://luajit.org/ext_ffi.html
       | 
       | This allows for allocation of C objects on the heap and on the
       | stack and FAST function calls. Doing the same for C++ is possible
       | but requires some work e.g. http://lua-
       | users.org/lists/lua-l/2011-07/msg00492.html
       | 
       | Furthermore:
       | 
       | - unfamiliar syntax -- The syntax is tiny -- and I found nothing
       | unexpected about it.
       | 
       | - no classes -- There are many class libraries available for lua.
       | Just pick one. Used penlight classes quite a bit, without running
       | into major issues.
       | 
       | - use tables for both hash tables and arrays. -- Yes. This is on
       | the API side, under the hood hashes and arrays are used where
       | appropriate.
        
         | fullstop wrote:
         | I grumbled about indexing starting at 1, but once you get used
         | to it it makes a lot of problems easier.
         | 
         | I've spent the last twenty-ish years in C and string
         | manipulation just sucks. Think about it, you declare a buffer
         | of length 20, the indexes are from 0->19, and the 19th byte
         | needs to be a null if you are using it as a string and are
         | using the entire buffer. Also, the standard library is not
         | guaranteed to null terminate in all situations.
         | 
         | Lua's string indexing feels far more natural to me.
        
           | tannhaeuser wrote:
           | This. The insistence on using 0-based string offsets is
           | purely a C thing (where it makes sense) inherited on to
           | languages that wanted to stay close to C or appeal to C devs
           | (even though it does not make sense). An easy way to check is
           | looking into awk which, as a DSL for string manipulation
           | written itself in C, deliberately uses 1-based string
           | offsets, and where many/most common string expressions
           | collapse to a very compact form, which makes even more sense
           | because empty string results are interpreted as false in
           | conditions.
        
             | jhvkjhk wrote:
             | It's not a C thing, it's a math/utility thing.
             | 
             | Dijkstra: Why numbering should start at zero https://www.cs
             | .utexas.edu/users/EWD/transcriptions/EWD08xx/E...
        
               | corysama wrote:
               | Which is funny because Lua got it's 1-base from FORTRAN
               | which, I believe, adopted it to make TRANslating math
               | FORmulas easier.
        
               | tovej wrote:
               | I agree in part that it's a C thing. More accurately it's
               | an offset from the array base thing.
               | 
               | The first element is at index 0 because its address is
               | base + 0 * sizeof(element)
               | 
               | The second element is at index 1 because its address is
               | base + 1 * sizeof(element)
        
           | sporedro wrote:
           | The indexing starting at 1 is something that has always
           | annoyed me. I just haven't seen any other language make that
           | decision, and I'm not sure the reasoning for it really
           | outweighs the fact every other language just goes with 0 due
           | to the origin of it.
           | 
           | Lua is a great language for sure though.
        
             | fullstop wrote:
             | Pascal did, but I believe that was because index 0
             | contained an 8-bit length. There was no need for null
             | termination, and strings were limited to 255 bytes.
        
           | vlovich123 wrote:
           | Small correction. The 20th byte (at index 19) needs to be
           | null.
           | 
           | The mismatch with the English language and how people
           | naturally count is definitely there and annoying. And yes,
           | string manipulation in C is especially broken although I
           | think the indexing is the smallest problem there.
           | 
           | However, it's extremely natural when you think about it in
           | terms of memory access. For example, in a 1-based indexing
           | system, ptr[0] would point 1 character behind your pointer
           | (weird) and ptr[-1] would point 2 back (wtf). Having the
           | index map neatly to the offset makes a lot of sense to me. In
           | fact, when I first started programming in VB6 20 years ago
           | and only had a math background, the 1-based indexing was
           | natural but I could never figure out why I had so many bugs
           | related to array and string offsets.
           | 
           | I'll also note that most programming languages are 0-based
           | and interop with C is not really the goal (Java, JavaScript,
           | Ruby, Python, etc). In fact, Python and Perl's string
           | manipulation is some of the best out there and they are 0
           | indexed.
        
             | fullstop wrote:
             | I figured that someone would pipe up about the 19th byte vs
             | 19th index, and I'm glad. This is just semantics, but does
             | index 0 represent the first byte or the zeroth byte?
             | 
             | I completely agree regarding memory access, but would argue
             | that strings and memory should not be treated in the same
             | way. Having the index represent the length - 1 has caused
             | countless off-by-one bugs [1] that would not have been
             | there in the first place if string indexes started with 1.
             | 
             | Java, JavaScript, Python, (maybe Ruby, I'm not fluent
             | there) also bite the user if you attempt index data outside
             | of the string's range. C will happily index whatever you
             | want, and these bugs can often remain hidden for decades.
             | 
             | 1. https://cwe.mitre.org/data/definitions/193.html
        
               | Joker_vD wrote:
               | Index N represents "skip N elements". That gives you very
               | easy additive/subtractive behaviour, unlike "how many
               | numbers are there from 7 to 17?" scenarios.
               | 
               | Sure, 17-7 gives you 10, but that's not the final answer,
               | you have to add 1 to get the right answer, 11. Sorry, no,
               | you actually subtract 1 and the right answer is 9. Wait,
               | no: 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, okay, 11 in
               | total, so you have to add 1, got it right the first time.
        
             | quietbritishjim wrote:
             | Another way to look at it, in languages like Python that
             | support slicing, is that the indices refer to the
             | boundaries between the elements rather than the elements
             | themselves.                    a b c d         0 1 2 3 4
             | 
             | From this 4 character string you can slice s[1:3] and get
             | 'b', 'c'. s[i:j] will always have j - i characters
             | (ignoring negative indices). s[i] is as if it were short
             | for s[i:i+1] e.g. s[3] is s[3:4] and that gives you 'd'.
             | 
             | Admittedly C++ doesn't have slicing so this is less
             | relevant, but I think it's still an interesting aspect to
             | the discussion (without getting bogged down in ideology).
        
               | germandiago wrote:
               | It has with std::span
        
         | CraneWorm wrote:
         | > [...] There are many class libraries available for lua. Just
         | pick one. [...]
         | 
         | Alternatively, embrace the prototype-based programming and
         | just... don't add classes.
        
           | pwdisswordfish8 wrote:
           | In practice though, is prototype-based inheritance useful for
           | anything other than implementing your own class system on top
           | of it?
        
             | CraneWorm wrote:
             | I hold an opinion that, if you use the prototype-based
             | approach, you should avoid inheritance like the plague ;)
        
               | pwdisswordfish0 wrote:
               | You should avoid it whether prototypes are involved or
               | not.
        
         | pwdisswordfish8 wrote:
         | Also,
         | 
         | > unfamiliar semantics: no classes, use tables, start indexing
         | at 1 and other oddities, just as being able to call functions
         | with the wrong number of arguments and returning nil on the way
         | 
         | Other than 1-based indexing, those semantics should be very
         | familiar from JavaScript.
        
           | germandiago wrote:
           | It still feels weird to me to call table an array, a hash
           | table and do classes via metatables in very different ways.
           | For sure it is powerful.
           | 
           | But it does not fullfill well the zero-friction I was looking
           | for when integrating. You have to change a bit your mindset
           | when integrating and using this stuff.
           | 
           | In Wren you have lists, which are basically arrays and
           | dictionaries, ranges and all things I already know how to use
           | from Python/C++.
           | 
           | That said, inheritance is not working smooth and ChaiScript
           | does not even have inheritance itself. But for my purposes a
           | class, concurrency and familiar data structures and patters
           | was enough.
        
           | dahfizz wrote:
           | Javascript is not known for its intuitive and sane design,
           | and certainly not from the perspective of C/C++ devs
        
             | tines wrote:
             | The criterion was "familiar", which is nothing if not
             | Javascript -- nobody said anything about intuitive or sane
             | :)
        
       | gigel82 wrote:
       | JavaScript would definitely be my first choice if I had to
       | integrate another scripting language into a native program today
       | (doubly so if the eventual target is web, like the author implies
       | with WebAssembly support).
       | 
       | Depending on your needs (small vs. fast, low memory usage vs.
       | full JIT, etc.) you can pick anything from JerryScript / QuickJS
       | all the way to V8.
       | 
       | Perhaps the only thing missing is a universal C/C++ API for
       | embedding JavaScript engines that lets you swap out easily to
       | test different trade-offs.
        
       | TonyTrapp wrote:
       | Developers working with C-like languages might not like Lua
       | because of its different looks and behaviour. I was in the same
       | situation and started looking into Squirrel because of that. But
       | eventually I went back to Lua, for one important reason: The
       | ecosystem. Lua has a lot of adjacent tools and a huge community.
       | If you want to do something in Squirrel, you will be much more on
       | your own, which can be frustrating. Lots of Lua's quirks can be
       | easily worked around, but lack of a community supporting the
       | language can't. This is especially important if you want to open
       | your scripting API to users of your software.
        
         | germandiago wrote:
         | I think what you say is true. Anyway, I do not need state-of-
         | the-art technology in my case. Basically I really wanted, and I
         | think (in order of importance, but being the two former points
         | the most important with a big difference), this:
         | 
         | 1. Concurrency support
         | 
         | 2. Easy to bind into C++
         | 
         | 3. Familiarity, etc.
         | 
         | Namely, if only Lua had existed and provided that Sol2 exists
         | and makes it bind it to C++ easy, I would have chosen that. But
         | since Wren + WrenBind17 existed, Wren had more familiar syntax
         | and it was a viable choice, I went for that. I was trying to
         | find the past of least resistance (lower learning curve, easier
         | to bind, concurrency making my code easier, since I am familiar
         | already with most patterns)
         | 
         | As for ChaiScript, it was the first thing I took since it was
         | so easy to embed. But it had its own problems: lack of
         | concurrency and it does not point the file and line of errors,
         | which is _very_ painful because it drops your productivity.
         | 
         | And scripting... scripting is about productivity, at least that
         | is what I was using it for.
        
       | germandiago wrote:
       | Constructive feedback for the article is welcome. Thanks!
        
         | Rochus wrote:
         | Did you have a look at e.g.
         | https://root.cern.ch/root/html534/guides/users-guide/CINT.ht...
         | and https://root.cern/cling/?
         | 
         | > _Lua ... It has unfamiliar syntax ... start indexing at 1_
         | 
         | Syntax is not unfamiliar, just more Pascal like; if you use
         | LuaJIT you can use zero based indices and a powerful FFI for
         | direct C code integration.
        
           | pierrec wrote:
           | > if you use LuaJIT you can use zero based indices and a
           | powerful FFI
           | 
           | Not entirely, Lua standard libraries still expect everything
           | to be one-indexed, while FFI structures are zero indexed. So
           | with LuaJIT you often end up with a mix of 0 and 1 indexed
           | code, which in my experience was workable but definitely a
           | pain point.
        
             | Rochus wrote:
             | You should avoid the Lua C API in LuaJIT because it is not
             | supported by the JIT (i.e. it makes your code running in
             | the interpreter instead of the JIT). Using zero based
             | indices in Lua code running on LuaJIT works well.
        
               | pierrec wrote:
               | I'm not talking about the Lua C API, but things as simple
               | as this:                   > stuff = {"one", "two",
               | "three"}         > stuff[1]         one
               | 
               | These native Lua structures are baked in, and they're a
               | lot more flexible than FFI structures - presumably if
               | you're using Lua, it's because you want to take advantage
               | of that flexibility and those affordances. FWIW I've
               | written a lot of LuaJIT, and usually I kept the lower-
               | level FFI stuff separate from the higher-level code using
               | Lua data structures, so I rarely encountered that
               | discrepancy between them, but still something to keep in
               | mind.
        
               | Rochus wrote:
               | You can e.g. do                 > stuff = { [0]="one",
               | [1]="two" }       > print(stuff[0])       one
               | 
               | Works well; I wrote e.g. https://github.com/rochus-
               | keller/Smalltalk#a-smalltalk-80-in... that way.
               | 
               | EDIT: even this works                 > stuff = {
               | [0]="one", "two", "three" }       > print(stuff[0]) ->
               | one       > print(stuff[1]) -> two
        
               | pansa2 wrote:
               | > _stuff = { [0]= "one", "two", "three" }_
               | 
               | In this case, is it possible to make iteration start with
               | the element at index 0? Maybe by implementing a custom
               | version of `ipairs`?
        
               | Rochus wrote:
               | When using                 > stuff = { [0]="one", "two",
               | "three" }       > for k,v in pairs(stuff) do print(v) end
               | 
               | it prints all three elements in the correct order. I
               | rarely use iterators for performance reasons anyway.
               | Instead of ipairs one can use                 > for
               | i=0,#stuff do print(stuff[i]) end
        
           | germandiago wrote:
           | Well, by this I mean "unfamiliar to me", of course. Lol.
           | 
           | Actually Lua is something to consider from the point of view
           | of usage: it is an industry standard actually. However, all
           | those small quirks in semantics... and classes can be done in
           | many ways (that is what I understand, via metatables)...
           | 
           | In ChaiSCript or Wren there is one true way and you are done.
           | You might like it or not, but it leads to less confusion,
           | especially if you use most of the time what is in the
           | mainstream.
           | 
           | This is by no means a bad thing in itself, it is just about
           | how ergonomic or time-consuming it could be for myself: I
           | just feel more comfortable with ChaiScript, Wren or Squirrel
           | than with Lua. Even AngelsCript is also more similar to what
           | you already have. So when exposing APIs there is much less
           | friction.
           | 
           | Truth to be told, there is also
           | https://github.com/ThePhD/sol2 which looks great and
           | something to consider. It makes binding things quite easier
           | and gives you object-oriented Lua. You could rely on that.
           | 
           | It was just my subjective choice. There is no 100% right
           | choice. Probably, if I found people that are comfortable with
           | Lua I would use that. But the case is that this is a project
           | of mine as it stands now.
        
           | jcelerier wrote:
           | > Syntax is not unfamiliar, just more Pascal like;
           | 
           | how is that not unfamiliar
        
             | coldtea wrote:
             | In that it still refers to a hugely popular family of
             | languages...
        
               | jcelerier wrote:
               | Chinese is also a hugely popular language, does not mean
               | that it is familiar to a large amount of humans
        
               | coldtea wrote:
               | The analogy breaks as Pascal knowledge is not confined to
               | one geographical area or ethnicity. The same for
               | languages inspired by Pascal syntax, with are tons.
               | 
               | No matter how you slice it or dice it, Pascal and Pascal-
               | like syntax are not some obscure niche languages...
        
             | Rochus wrote:
             | Well, why would you then consider Python to be familiar?
        
               | oblio wrote:
               | Because Python is 100000x more popular than Pascal in
               | 2021?
        
               | Rochus wrote:
               | Python syntax is more similar to Pascal than to e.g. Java
               | or JS.
               | 
               | " _Modula-3 is the origin of the syntax and semantics
               | used for exceptions, and some other Python features._ "
               | (from https://docs.python.org/3/faq/general.html#why-was-
               | python-cr...). Also the predecessor languages ABC and
               | SETL were in the Algol tradition.
               | 
               | > _Python is 100000x more popular than Pascal_
               | 
               | It's about factor 8 on https://www.tiobe.com/tiobe-index/
               | or factor 3 (in score) on
               | https://spectrum.ieee.org/static/interactive-the-top-
               | program..., whatever you prefer as a reference. Delphi
               | (which is Object Pascal) is still a widely used language.
        
               | oblio wrote:
               | I generally go by job listings. It's trivial to find
               | Python jobs, Pascal/Delphi jobs are very rare.
        
               | Rochus wrote:
               | Maybe you can post a link to a job site where there are
               | _100000x_ more Python than Pascal /Delphi/Ada jobs. Btw.
               | you can filter the IEEE ranking by jobs which seems to
               | correspond well with what I see on monster or indeed.
               | Anyway, the discussion was about whether the Lua or
               | Pascal style syntax is unfamiliar or not.
        
               | germandiago wrote:
               | You make a good point. I started my Computer Science and
               | Engineering degree (I am european, not american, so the
               | equivalent looks like kind of a merge of both areas) with
               | Python, C and C++ on the programming side.
               | 
               | Pascal was discarded a few years back in my university.
               | And yes, by familiar I mean exactly what you mean: you
               | see nowadays Java, Python, C++, C, C#, but Pascal is
               | disappeared.
               | 
               | Disappeared since long ago since I do not know even the
               | syntax myself by casual reading around.
        
               | [deleted]
        
               | jcelerier wrote:
               | If coming from C++ I definitely wouldn't, especially the
               | module system and reference binding in python is WEIRD.
               | I'd say C, C#, maybe D and Java would fit ? My criterion
               | would be "can a new grad student who only learned c++ be
               | productive in a couple days"
        
               | Rochus wrote:
               | There may be a difference between your view and that of
               | the majority of developers. My primary language is also
               | C++, but languages of the Pascal family (to which Python
               | is related) remain very popular.
        
           | germandiago wrote:
           | As for looking at CINT, CLing, yes I did. I prefer to use a
           | dynamic language with coroutines out of the box. It is
           | actually what I was looking for besides ease of binding it
           | and "familiarity" in semantics/syntax in a broad, imprecise
           | way I defined for myself.
        
         | Zababa wrote:
         | > Prefer dynamic to static typing, since static typing can
         | remove the coding speed: it makes you think about types.
         | 
         | I'm wondering what you mean exactly by this. Do you not think
         | about types when programming with a language with dynamic
         | typing?
        
           | germandiago wrote:
           | No, I think I expressed myself the wrong way.
           | 
           | What I mean is that if you have to annotate all your code
           | with types (like in AngelScript), this will slow you down for
           | two reasons. First, you need to think about types, and
           | second, refactoring is more rigid.
           | 
           | If it is optional, it is ok, you can take advantage of it at
           | will (ChaiScript supports types in parameters, but
           | optionally).
        
             | Zababa wrote:
             | Thanks, that clarifies it and makes sense.
        
       | pwdisswordfish8 wrote:
       | I kind of expected to see QuickJS or Duktape here; if the author
       | considers 'something like javascript-ey' to be just fine, he
       | might as well have used JavaScript itself, with all its strengths
       | and faults.
        
         | germandiago wrote:
         | Well. I think I was a bit inaccurate. When I said javascript-ey
         | what I meant is also familiar. Something like Squirrel and Wren
         | do a good job.
         | 
         | The ones you mentioned, as far as I investigated, were not
         | dead-easy to integrate into C++, one of the top requirements.
         | 
         | Take into account that I have to expose my own types, not just
         | ints and basic types.
         | 
         | The original API was coded in a natural C++ way. APIs that wrap
         | well in that sense could be Sol2 for Lua, Chaiscript,
         | Wrenbind... which integrate with custom types and smart
         | pointers.
         | 
         | With other scripting languages and their libraries you need
         | additional work
        
           | pwdisswordfish8 wrote:
           | Duktape's host API is pretty much a ripoff of Lua's, and the
           | latter wasn't rejected on those grounds, so...
           | 
           | QuickJS's API isn't particularly well-documented, but it's
           | not hard to find your way around it either if you dig into
           | the source (the engine is very hackable, too; you might even
           | fix some of the language's design flaws - obligatory wat talk
           | reference - if you're so inclined). The host API follows the
           | CPython model, with objects represented by pointers and
           | explicit reference counting on the C side. There are some
           | predefined macros to ease defining built-in classes. Some
           | type-level hackery in C++ might ease things even further. I
           | don't know how much deader-easier you want it.
        
             | germandiago wrote:
             | Take a look at how pybind11, wrenbind17, sol2 or Chaiscript
             | do it. That is how easy I want it: I can expose custom
             | types and global state easily. I do not want just ints and
             | const char */double.
             | 
             | These bindings do from decent to great.
        
       | pansa2 wrote:
       | > _Python [...] could be difficult to port to Web Assembly down
       | the road_
       | 
       | The Pyodide project has already compiled CPython to WebAssembly -
       | why is that a worse solution than compiling one of these other
       | scripting language interpreters to WASM?
        
         | pansa2 wrote:
         | One issue could be size - CPython's native binary is an order
         | of magnitude larger than Lua's, and the same is probably true
         | when using WASM.
         | 
         | Perhaps something like MicroPython could solve that, though.
        
         | zurn wrote:
         | Indeed, Python is one of the most well behaved scripting
         | languages for WebAssembly, and people were running in browser
         | for a good while already with WebAsembly predecessors
         | (emscripten and asm.js).
        
           | germandiago wrote:
           | This was not my information at the time. But thanks for the
           | info, it is helpful.
           | 
           | With https://github.com/pybind/pybind11 there is really great
           | integration with C++ and Python is my second home after C++
           | actually.
           | 
           | Anyway, I am quite happy with Wren and it seems to be fast
           | (not a requirement for my project, though)
        
         | tyingq wrote:
         | While it "works" python under WASM means downloading a very
         | large interpreter and runtime and waiting quite a long time for
         | it to start up.
         | 
         | On my i5 laptop, this demo downloads about 8Mb and takes a
         | couple of seconds to load up:
         | http://karay.me/truepyxel/demo.html
         | 
         | Lua, by comparison, is very small and has a fast startup under
         | WASM.
        
       ___________________________________________________________________
       (page generated 2021-05-24 23:01 UTC)