[HN Gopher] My experience binding a couple of scripting engines ... ___________________________________________________________________ My experience binding a couple of scripting engines with C++ Author : germandiago Score : 56 points Date : 2021-05-24 09:38 UTC (13 hours ago) (HTM) web link (germandiagogomez.medium.com) (TXT) w3m dump (germandiagogomez.medium.com) | heinrichhartman wrote: | > [lua] was discarded because of 3. It has unfamiliar syntax, but | worse, unfamiliar semantics: no classes, use tables, start | indexing at 1 and other oddities, just as being able to call | functions with the wrong number of arguments and returning nil on | the way. Also, use tables for both hash tables and arrays. It is | powerful, do not misunderstand me, and Lua supports good | concurrency. It was just not what I was looking for because of | the mentioned things. | | I can fully understand why lua is not a good fit for this case, | however, I would like to add some color to the picture. | | The most powerful way to for C(++) - lua interop, is not the | official CAPI but luajit/FFI: https://luajit.org/ext_ffi.html | | This allows for allocation of C objects on the heap and on the | stack and FAST function calls. Doing the same for C++ is possible | but requires some work e.g. http://lua- | users.org/lists/lua-l/2011-07/msg00492.html | | Furthermore: | | - unfamiliar syntax -- The syntax is tiny -- and I found nothing | unexpected about it. | | - no classes -- There are many class libraries available for lua. | Just pick one. Used penlight classes quite a bit, without running | into major issues. | | - use tables for both hash tables and arrays. -- Yes. This is on | the API side, under the hood hashes and arrays are used where | appropriate. | fullstop wrote: | I grumbled about indexing starting at 1, but once you get used | to it it makes a lot of problems easier. | | I've spent the last twenty-ish years in C and string | manipulation just sucks. Think about it, you declare a buffer | of length 20, the indexes are from 0->19, and the 19th byte | needs to be a null if you are using it as a string and are | using the entire buffer. Also, the standard library is not | guaranteed to null terminate in all situations. | | Lua's string indexing feels far more natural to me. | tannhaeuser wrote: | This. The insistence on using 0-based string offsets is | purely a C thing (where it makes sense) inherited on to | languages that wanted to stay close to C or appeal to C devs | (even though it does not make sense). An easy way to check is | looking into awk which, as a DSL for string manipulation | written itself in C, deliberately uses 1-based string | offsets, and where many/most common string expressions | collapse to a very compact form, which makes even more sense | because empty string results are interpreted as false in | conditions. | jhvkjhk wrote: | It's not a C thing, it's a math/utility thing. | | Dijkstra: Why numbering should start at zero https://www.cs | .utexas.edu/users/EWD/transcriptions/EWD08xx/E... | corysama wrote: | Which is funny because Lua got it's 1-base from FORTRAN | which, I believe, adopted it to make TRANslating math | FORmulas easier. | tovej wrote: | I agree in part that it's a C thing. More accurately it's | an offset from the array base thing. | | The first element is at index 0 because its address is | base + 0 * sizeof(element) | | The second element is at index 1 because its address is | base + 1 * sizeof(element) | sporedro wrote: | The indexing starting at 1 is something that has always | annoyed me. I just haven't seen any other language make that | decision, and I'm not sure the reasoning for it really | outweighs the fact every other language just goes with 0 due | to the origin of it. | | Lua is a great language for sure though. | fullstop wrote: | Pascal did, but I believe that was because index 0 | contained an 8-bit length. There was no need for null | termination, and strings were limited to 255 bytes. | vlovich123 wrote: | Small correction. The 20th byte (at index 19) needs to be | null. | | The mismatch with the English language and how people | naturally count is definitely there and annoying. And yes, | string manipulation in C is especially broken although I | think the indexing is the smallest problem there. | | However, it's extremely natural when you think about it in | terms of memory access. For example, in a 1-based indexing | system, ptr[0] would point 1 character behind your pointer | (weird) and ptr[-1] would point 2 back (wtf). Having the | index map neatly to the offset makes a lot of sense to me. In | fact, when I first started programming in VB6 20 years ago | and only had a math background, the 1-based indexing was | natural but I could never figure out why I had so many bugs | related to array and string offsets. | | I'll also note that most programming languages are 0-based | and interop with C is not really the goal (Java, JavaScript, | Ruby, Python, etc). In fact, Python and Perl's string | manipulation is some of the best out there and they are 0 | indexed. | fullstop wrote: | I figured that someone would pipe up about the 19th byte vs | 19th index, and I'm glad. This is just semantics, but does | index 0 represent the first byte or the zeroth byte? | | I completely agree regarding memory access, but would argue | that strings and memory should not be treated in the same | way. Having the index represent the length - 1 has caused | countless off-by-one bugs [1] that would not have been | there in the first place if string indexes started with 1. | | Java, JavaScript, Python, (maybe Ruby, I'm not fluent | there) also bite the user if you attempt index data outside | of the string's range. C will happily index whatever you | want, and these bugs can often remain hidden for decades. | | 1. https://cwe.mitre.org/data/definitions/193.html | Joker_vD wrote: | Index N represents "skip N elements". That gives you very | easy additive/subtractive behaviour, unlike "how many | numbers are there from 7 to 17?" scenarios. | | Sure, 17-7 gives you 10, but that's not the final answer, | you have to add 1 to get the right answer, 11. Sorry, no, | you actually subtract 1 and the right answer is 9. Wait, | no: 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, okay, 11 in | total, so you have to add 1, got it right the first time. | quietbritishjim wrote: | Another way to look at it, in languages like Python that | support slicing, is that the indices refer to the | boundaries between the elements rather than the elements | themselves. a b c d 0 1 2 3 4 | | From this 4 character string you can slice s[1:3] and get | 'b', 'c'. s[i:j] will always have j - i characters | (ignoring negative indices). s[i] is as if it were short | for s[i:i+1] e.g. s[3] is s[3:4] and that gives you 'd'. | | Admittedly C++ doesn't have slicing so this is less | relevant, but I think it's still an interesting aspect to | the discussion (without getting bogged down in ideology). | germandiago wrote: | It has with std::span | CraneWorm wrote: | > [...] There are many class libraries available for lua. Just | pick one. [...] | | Alternatively, embrace the prototype-based programming and | just... don't add classes. | pwdisswordfish8 wrote: | In practice though, is prototype-based inheritance useful for | anything other than implementing your own class system on top | of it? | CraneWorm wrote: | I hold an opinion that, if you use the prototype-based | approach, you should avoid inheritance like the plague ;) | pwdisswordfish0 wrote: | You should avoid it whether prototypes are involved or | not. | pwdisswordfish8 wrote: | Also, | | > unfamiliar semantics: no classes, use tables, start indexing | at 1 and other oddities, just as being able to call functions | with the wrong number of arguments and returning nil on the way | | Other than 1-based indexing, those semantics should be very | familiar from JavaScript. | germandiago wrote: | It still feels weird to me to call table an array, a hash | table and do classes via metatables in very different ways. | For sure it is powerful. | | But it does not fullfill well the zero-friction I was looking | for when integrating. You have to change a bit your mindset | when integrating and using this stuff. | | In Wren you have lists, which are basically arrays and | dictionaries, ranges and all things I already know how to use | from Python/C++. | | That said, inheritance is not working smooth and ChaiScript | does not even have inheritance itself. But for my purposes a | class, concurrency and familiar data structures and patters | was enough. | dahfizz wrote: | Javascript is not known for its intuitive and sane design, | and certainly not from the perspective of C/C++ devs | tines wrote: | The criterion was "familiar", which is nothing if not | Javascript -- nobody said anything about intuitive or sane | :) | gigel82 wrote: | JavaScript would definitely be my first choice if I had to | integrate another scripting language into a native program today | (doubly so if the eventual target is web, like the author implies | with WebAssembly support). | | Depending on your needs (small vs. fast, low memory usage vs. | full JIT, etc.) you can pick anything from JerryScript / QuickJS | all the way to V8. | | Perhaps the only thing missing is a universal C/C++ API for | embedding JavaScript engines that lets you swap out easily to | test different trade-offs. | TonyTrapp wrote: | Developers working with C-like languages might not like Lua | because of its different looks and behaviour. I was in the same | situation and started looking into Squirrel because of that. But | eventually I went back to Lua, for one important reason: The | ecosystem. Lua has a lot of adjacent tools and a huge community. | If you want to do something in Squirrel, you will be much more on | your own, which can be frustrating. Lots of Lua's quirks can be | easily worked around, but lack of a community supporting the | language can't. This is especially important if you want to open | your scripting API to users of your software. | germandiago wrote: | I think what you say is true. Anyway, I do not need state-of- | the-art technology in my case. Basically I really wanted, and I | think (in order of importance, but being the two former points | the most important with a big difference), this: | | 1. Concurrency support | | 2. Easy to bind into C++ | | 3. Familiarity, etc. | | Namely, if only Lua had existed and provided that Sol2 exists | and makes it bind it to C++ easy, I would have chosen that. But | since Wren + WrenBind17 existed, Wren had more familiar syntax | and it was a viable choice, I went for that. I was trying to | find the past of least resistance (lower learning curve, easier | to bind, concurrency making my code easier, since I am familiar | already with most patterns) | | As for ChaiScript, it was the first thing I took since it was | so easy to embed. But it had its own problems: lack of | concurrency and it does not point the file and line of errors, | which is _very_ painful because it drops your productivity. | | And scripting... scripting is about productivity, at least that | is what I was using it for. | germandiago wrote: | Constructive feedback for the article is welcome. Thanks! | Rochus wrote: | Did you have a look at e.g. | https://root.cern.ch/root/html534/guides/users-guide/CINT.ht... | and https://root.cern/cling/? | | > _Lua ... It has unfamiliar syntax ... start indexing at 1_ | | Syntax is not unfamiliar, just more Pascal like; if you use | LuaJIT you can use zero based indices and a powerful FFI for | direct C code integration. | pierrec wrote: | > if you use LuaJIT you can use zero based indices and a | powerful FFI | | Not entirely, Lua standard libraries still expect everything | to be one-indexed, while FFI structures are zero indexed. So | with LuaJIT you often end up with a mix of 0 and 1 indexed | code, which in my experience was workable but definitely a | pain point. | Rochus wrote: | You should avoid the Lua C API in LuaJIT because it is not | supported by the JIT (i.e. it makes your code running in | the interpreter instead of the JIT). Using zero based | indices in Lua code running on LuaJIT works well. | pierrec wrote: | I'm not talking about the Lua C API, but things as simple | as this: > stuff = {"one", "two", | "three"} > stuff[1] one | | These native Lua structures are baked in, and they're a | lot more flexible than FFI structures - presumably if | you're using Lua, it's because you want to take advantage | of that flexibility and those affordances. FWIW I've | written a lot of LuaJIT, and usually I kept the lower- | level FFI stuff separate from the higher-level code using | Lua data structures, so I rarely encountered that | discrepancy between them, but still something to keep in | mind. | Rochus wrote: | You can e.g. do > stuff = { [0]="one", | [1]="two" } > print(stuff[0]) one | | Works well; I wrote e.g. https://github.com/rochus- | keller/Smalltalk#a-smalltalk-80-in... that way. | | EDIT: even this works > stuff = { | [0]="one", "two", "three" } > print(stuff[0]) -> | one > print(stuff[1]) -> two | pansa2 wrote: | > _stuff = { [0]= "one", "two", "three" }_ | | In this case, is it possible to make iteration start with | the element at index 0? Maybe by implementing a custom | version of `ipairs`? | Rochus wrote: | When using > stuff = { [0]="one", "two", | "three" } > for k,v in pairs(stuff) do print(v) end | | it prints all three elements in the correct order. I | rarely use iterators for performance reasons anyway. | Instead of ipairs one can use > for | i=0,#stuff do print(stuff[i]) end | germandiago wrote: | Well, by this I mean "unfamiliar to me", of course. Lol. | | Actually Lua is something to consider from the point of view | of usage: it is an industry standard actually. However, all | those small quirks in semantics... and classes can be done in | many ways (that is what I understand, via metatables)... | | In ChaiSCript or Wren there is one true way and you are done. | You might like it or not, but it leads to less confusion, | especially if you use most of the time what is in the | mainstream. | | This is by no means a bad thing in itself, it is just about | how ergonomic or time-consuming it could be for myself: I | just feel more comfortable with ChaiScript, Wren or Squirrel | than with Lua. Even AngelsCript is also more similar to what | you already have. So when exposing APIs there is much less | friction. | | Truth to be told, there is also | https://github.com/ThePhD/sol2 which looks great and | something to consider. It makes binding things quite easier | and gives you object-oriented Lua. You could rely on that. | | It was just my subjective choice. There is no 100% right | choice. Probably, if I found people that are comfortable with | Lua I would use that. But the case is that this is a project | of mine as it stands now. | jcelerier wrote: | > Syntax is not unfamiliar, just more Pascal like; | | how is that not unfamiliar | coldtea wrote: | In that it still refers to a hugely popular family of | languages... | jcelerier wrote: | Chinese is also a hugely popular language, does not mean | that it is familiar to a large amount of humans | coldtea wrote: | The analogy breaks as Pascal knowledge is not confined to | one geographical area or ethnicity. The same for | languages inspired by Pascal syntax, with are tons. | | No matter how you slice it or dice it, Pascal and Pascal- | like syntax are not some obscure niche languages... | Rochus wrote: | Well, why would you then consider Python to be familiar? | oblio wrote: | Because Python is 100000x more popular than Pascal in | 2021? | Rochus wrote: | Python syntax is more similar to Pascal than to e.g. Java | or JS. | | " _Modula-3 is the origin of the syntax and semantics | used for exceptions, and some other Python features._ " | (from https://docs.python.org/3/faq/general.html#why-was- | python-cr...). Also the predecessor languages ABC and | SETL were in the Algol tradition. | | > _Python is 100000x more popular than Pascal_ | | It's about factor 8 on https://www.tiobe.com/tiobe-index/ | or factor 3 (in score) on | https://spectrum.ieee.org/static/interactive-the-top- | program..., whatever you prefer as a reference. Delphi | (which is Object Pascal) is still a widely used language. | oblio wrote: | I generally go by job listings. It's trivial to find | Python jobs, Pascal/Delphi jobs are very rare. | Rochus wrote: | Maybe you can post a link to a job site where there are | _100000x_ more Python than Pascal /Delphi/Ada jobs. Btw. | you can filter the IEEE ranking by jobs which seems to | correspond well with what I see on monster or indeed. | Anyway, the discussion was about whether the Lua or | Pascal style syntax is unfamiliar or not. | germandiago wrote: | You make a good point. I started my Computer Science and | Engineering degree (I am european, not american, so the | equivalent looks like kind of a merge of both areas) with | Python, C and C++ on the programming side. | | Pascal was discarded a few years back in my university. | And yes, by familiar I mean exactly what you mean: you | see nowadays Java, Python, C++, C, C#, but Pascal is | disappeared. | | Disappeared since long ago since I do not know even the | syntax myself by casual reading around. | [deleted] | jcelerier wrote: | If coming from C++ I definitely wouldn't, especially the | module system and reference binding in python is WEIRD. | I'd say C, C#, maybe D and Java would fit ? My criterion | would be "can a new grad student who only learned c++ be | productive in a couple days" | Rochus wrote: | There may be a difference between your view and that of | the majority of developers. My primary language is also | C++, but languages of the Pascal family (to which Python | is related) remain very popular. | germandiago wrote: | As for looking at CINT, CLing, yes I did. I prefer to use a | dynamic language with coroutines out of the box. It is | actually what I was looking for besides ease of binding it | and "familiarity" in semantics/syntax in a broad, imprecise | way I defined for myself. | Zababa wrote: | > Prefer dynamic to static typing, since static typing can | remove the coding speed: it makes you think about types. | | I'm wondering what you mean exactly by this. Do you not think | about types when programming with a language with dynamic | typing? | germandiago wrote: | No, I think I expressed myself the wrong way. | | What I mean is that if you have to annotate all your code | with types (like in AngelScript), this will slow you down for | two reasons. First, you need to think about types, and | second, refactoring is more rigid. | | If it is optional, it is ok, you can take advantage of it at | will (ChaiScript supports types in parameters, but | optionally). | Zababa wrote: | Thanks, that clarifies it and makes sense. | pwdisswordfish8 wrote: | I kind of expected to see QuickJS or Duktape here; if the author | considers 'something like javascript-ey' to be just fine, he | might as well have used JavaScript itself, with all its strengths | and faults. | germandiago wrote: | Well. I think I was a bit inaccurate. When I said javascript-ey | what I meant is also familiar. Something like Squirrel and Wren | do a good job. | | The ones you mentioned, as far as I investigated, were not | dead-easy to integrate into C++, one of the top requirements. | | Take into account that I have to expose my own types, not just | ints and basic types. | | The original API was coded in a natural C++ way. APIs that wrap | well in that sense could be Sol2 for Lua, Chaiscript, | Wrenbind... which integrate with custom types and smart | pointers. | | With other scripting languages and their libraries you need | additional work | pwdisswordfish8 wrote: | Duktape's host API is pretty much a ripoff of Lua's, and the | latter wasn't rejected on those grounds, so... | | QuickJS's API isn't particularly well-documented, but it's | not hard to find your way around it either if you dig into | the source (the engine is very hackable, too; you might even | fix some of the language's design flaws - obligatory wat talk | reference - if you're so inclined). The host API follows the | CPython model, with objects represented by pointers and | explicit reference counting on the C side. There are some | predefined macros to ease defining built-in classes. Some | type-level hackery in C++ might ease things even further. I | don't know how much deader-easier you want it. | germandiago wrote: | Take a look at how pybind11, wrenbind17, sol2 or Chaiscript | do it. That is how easy I want it: I can expose custom | types and global state easily. I do not want just ints and | const char */double. | | These bindings do from decent to great. | pansa2 wrote: | > _Python [...] could be difficult to port to Web Assembly down | the road_ | | The Pyodide project has already compiled CPython to WebAssembly - | why is that a worse solution than compiling one of these other | scripting language interpreters to WASM? | pansa2 wrote: | One issue could be size - CPython's native binary is an order | of magnitude larger than Lua's, and the same is probably true | when using WASM. | | Perhaps something like MicroPython could solve that, though. | zurn wrote: | Indeed, Python is one of the most well behaved scripting | languages for WebAssembly, and people were running in browser | for a good while already with WebAsembly predecessors | (emscripten and asm.js). | germandiago wrote: | This was not my information at the time. But thanks for the | info, it is helpful. | | With https://github.com/pybind/pybind11 there is really great | integration with C++ and Python is my second home after C++ | actually. | | Anyway, I am quite happy with Wren and it seems to be fast | (not a requirement for my project, though) | tyingq wrote: | While it "works" python under WASM means downloading a very | large interpreter and runtime and waiting quite a long time for | it to start up. | | On my i5 laptop, this demo downloads about 8Mb and takes a | couple of seconds to load up: | http://karay.me/truepyxel/demo.html | | Lua, by comparison, is very small and has a fast startup under | WASM. ___________________________________________________________________ (page generated 2021-05-24 23:01 UTC)