[HN Gopher] Regex engine internals as a library ___________________________________________________________________ Regex engine internals as a library Author : kretaceous Score : 301 points Date : 2023-07-05 13:36 UTC (9 hours ago) (HTM) web link (blog.burntsushi.net) (TXT) w3m dump (blog.burntsushi.net) | recov wrote: | Burntsushi is a machine. | agumonkey wrote: | lets him and borkdude meet | kzrdude wrote: | Deterministic and finite such, or what? | sd9 wrote: | Non-deterministic, deterministic, it matters not | ttul wrote: | Story time. At ActiveState, my colleague and I - both fresh out | of school - were tasked with building a regular expression | debugger for the Komodo editor. We hired the legendary Perl guru | Mark Jason Dominus to hack some hooks into Perl's regex engine | and then by exposing these hooks to the UI, we allowed users to | watch the execution of their regexes step by step. | | These days, various web-based tools offer superior functionality. | But in 2001, Komodo's Rx Debugger was absolutely state of the art | and so much fun to work on. | mschaef wrote: | Any recommendations on specific web based tools for this? | imaginary_unit wrote: | not OP, but these are some I used in the past: | | https://www.debuggex.com/ | | https://regex101.com/ | Twirrim wrote: | I've found myself in the past in a situation where I needed an | offline regex debugger (dealing with air-gapped networks, so | the folks that needed to use the tool literally couldn't get at | any online site, nor would they be permitted to even think of | putting what they were working with in to an online one, no | matter how it's designed), and it seems almost all of the work | has gone in to online tools. The offline tools are scarce and | somewhat lacking in comparison to, say, https://regex101.com/. | nerdponx wrote: | RegexBuddy is great and runs fine on WINE. | aorth wrote: | Good question! I use regex101.com a few times a month and | I've thought about offline tools too. I just happened to see | this GNOME application called Wildcard recently: | | https://flathub.org/apps/com.felipekinoshita.Wildcard | | Haven't tried yet... | lesuorac wrote: | It seems like the site doesn't really need a server so people | have made offline versions. | | So should be able to burn something like [1] to a disk. | | [1]: https://github.com/ibaaj/Regex101.com-offline- | app/pull/1/fil... | jakobnissen wrote: | BioJulia has published Automa.jl, which is a pure-Julia regex | engine that can insert arbitrary Julia code at compile time. | | Not to take away from Rusts regex which are far more advanced | than Automa, but I just take issue with that library being the | first time regex internals have been exposed as a library :) | burntsushi wrote: | These sound like two different things. PCRE2, for example, has | support for "callouts" which sound similar to what you're | saying? https://www.pcre.org/current/doc/html/pcre2callout.html | --- Other things like ragel and re2c have done similar things. | | What I'm talking about in this blog is very different from | that. This is about taking the internals of the regex library | itself and turning them in a separately versioned library that | others can compose. | | This makes _somewhat_ less sense for a backtracker because most | backtrackers just have one engine: the backtracker. But an | automata based library usually has many different engines that | can be composed in different ways. Still, backtrackers have | things they _could_ expose but don 't in practice, such as a | regex parser and an AST. | nsajko wrote: | You're both correct. In the context of Julia, where | compilation and run time are arbitrarily interleaved, | something like re2c can be a library and used at run time. | alpaca128 wrote: | I experimented with the regex-automata crate in the past and it's | the only one I found which can be used for text editors because | the direct access to the underlying DFA makes it compatible with | any text data structure. The usual regex library APIs expect the | input to be one contiguous string. | yazaddaruvala wrote: | For a problem at work, I needed to create a RegexSet of over 10 | MM really long regexes. No engine out of the box could handle it. | Even Rust's RegexSet wasn't good enough by default. | | However, trying to use and read through regex-automata and regex- | syntax (even in 2018) was such a very valuable and helpful | learning resource. I ended up modeling the project at work off | Lucene APIs, but it was all only after learning the basics from | the regex crates. | | Best set of libs! Thanks sushi! | OJFord wrote: | I'm guessing the answer's 'no' because you didn't elaborate to | begin with, but on the off-chance - can you share any more | about what the problem/project was? | burntsushi wrote: | Yeah 10 million regexes is huuuuge. Aho-Corasick can barely | handle 10 million literals. | | Future work is definitely trying to make the regex engine scale | better with more patterns. Currently, it will crap out well | before 10 million regexes, and I'm not sure that goal is | actually attainable. But it can certainly be better than what | it is today. | | And of course, Hyperscan is really the gold standard here as | far as multi-pattern searching goes. I'm not sure how well it | will handle 10 million patterns though. | jjice wrote: | I've only given a quick read/skim, but this is extremely | impressive. BurntSushi has a lot of create stuff out there, but | the Rust regex crate is legendary. The fact that Rust has had | such a performant and easy to use regular expression library for | so long is a real treat to the community. | | This article as a whole is also a treasure. There's a fairly | limited number of people who have written a ton about regular | expressions, but they all do so extremely well. Russ Cox's | article series (as mentioned in this one) is really great. I used | it in college to write a regular expression engine one summer | after I had started to fall in love with the perfect cross | between theory and practice that regular expressions are. | | The changes for more in depth testing here are also very | interesting, especially for a crate that I'm sure if critical so | a lot of the ecosystem, and I appreciate the writeup on such a | deep dive topic. | | Are regular expressions hard to read? Sometimes they can, | especially if you haven't gone out of your way to take deep | dives. Are they not perfect for a lot of parsing tasks? Sure, | there are definitely things people shouldn't use regexs for that | they do, like validating emails. At their core though, regular | expressions are one of the most power dense tools we have in | pretty much every language. The insane stuff you can get done | (sometimes in a way that would be frowned upon) in so little time | if just incredible. | | I'd love for there to be more content on regular expressions as | well. Right now I'm only familiar with one book that really does | a great job (at a practical level), and that's Mastering Regular | Expressions by Jeffrey Friedl. On a theory level, a lot of | compiler books will talk about them to varying degrees. The | Dragon Book has some good content on regular expressions at an | implementation level too. | | Does anyone have any other book recommendations for regular | expressions? | | And if BurntSushi or any other regular expression engine | contributors see this, I really appreciate the effort y'all put | in to building such powerful software. | btown wrote: | https://www.cs.princeton.edu/courses/archive/fall19/cos226/l... | and https://kean.blog/post/lets-build-regex are excellent | introductions to implementing a (very) simplified regex engine: | construct a nondetermistic finite state automaton for the | regex, then perform a graph search on the resulting digraph; if | the vertex corresponding to your end state is reachable, you | have a match. | | I think this exercise is valuable for anyone writing regexes to | not only understand that there's less magic than one might | think, but also to visualize a bunch of balls bouncing along an | NFA - that bug you inevitably hit in production due to | catastrophic backtracking now takes on a physical meaning! | | Separately re: the OP, https://github.com/rust- | lang/regex/issues/822 (and specifically BurntSushi's comment at | the very end of the issue) adds really useful context to the | paragraph in the OP about niche APIs: | https://blog.burntsushi.net/regex-internals/#problem-request... | - searching with multiple regexes simultaneously against a text | is both incredibly complex and incredibly useful, and I can't | wait to see what the community comes up with for this pattern! | jcranmer wrote: | > Are regular expressions hard to read? Sometimes they can, | especially if you haven't gone out of your way to take deep | dives. Are they not perfect for a lot of parsing tasks? Sure, | there are definitely things people shouldn't use regexs for | that they do, like validating emails. At their core though, | regular expressions are one of the most power dense tools we | have in pretty much every language. The insane stuff you can | get done (sometimes in a way that would be frowned upon) in so | little time if just incredible. | | The real crowning use case for regular expressions (at least in | terms of parsing-like tasks) is when you've got formats with | varied delimiters. So one format I was parsing this weekend was | basically header:field1,field2,field3"data"hash (with a fixed | number of fields). Or another format I work with is | suite~split/test1,test2@opt1:opt2^hw1^hw2#flags1#flags2 (most | of the elements of which are optional). Regular expressions | shine here; using primitives like split don't quite cut it in | these formats. | | And I think this also is a major cause of why regular | expressions tend to become unreadable quickly. If you look at | parsing via regular expressions as a whole, there's basically | three things that are being jammed into one expression: what | are the delimiters between fields, what is valid in each field, | and which fields are optional. These are basically three | separate concerns, but most regex APIs are absolutely terrible | at letting you separate these concerns into separate steps (all | you can do is provide the string that combines all of them). | tialaramex wrote: | > header:field1,field2,field3"data"hash (with a fixed number | of fields). | | In a language where regex is just right there, like Perl, I | agree that the natural way to parse this is always a regex, | although on the other hand in Perl the natural way to do | almost everything is a regex so maybe Perl was a bad example. | | In a language with split_once (and a proper string reference | type) I actually rather like split_once here, splitting away | header, and then field1, and then field2, and then field3, | and then data, leaving just hash. | | I guess this gets to your concern, by writing it with | split_once we're clear about a lot of your answers, each | delimiter is specified with what it's splitting, none of the | fields are optional (unless we write that) and (if we write | code to check) what is valid in each field. | jcranmer wrote: | Yeah, split_once is pretty handy, although chaining | together can get a little verbose. It would be nice to | write this: let (y,m,d,h,m,s) = | break_str!(s, year-month-day hour:minute:second)?; | | instead of this: let (y, split) = | s.split_once('-')?; let (m, split) = | split.split_once('-')?; let (d, split) = | split.split_once(' ')?; let (h, split) = | split.split_once(':')?; let (m, s) = | split.split_once(':')?; | burntsushi wrote: | I'll use this as an opportunity to plug the regex crate's | new 'extract' API. :-) use | regex::Regex; fn main() { | println!("{:?}", extract("1973-01-05 09:30:00")); | } fn extract(haystack: &str) -> | Option<(&str, &str, &str, &str, &str, &str)> { | let re = Regex::new( | r"([0-9]{4})-([0-9]{2})-([0-9]{2}) | ([0-9]{2}):([0-9]{2}):([0-9]{2})", | ).unwrap(); let (_, [y, m, d, h, min, s]) = | re.captures(haystack)?.extract(); Some((y, m, | d, h, min, s)) } | | Output: Some(("1973", "01", "05", "09", | "30", "00")) | | That gets you pretty close to what you want here. | | (The regex matches more than what is a valid date/time of | course.) | Izkata wrote: | Python (even in 2): >>> import re | >>> pattern = | re.compile(r"([0-9]{4})-([0-9]{2})-([0-9]{2}) | ([0-9]{2}):([0-9]{2}):([0-9]{2})") >>> results = | pattern.match('1973-01-05 09:30:00') >>> | results.groups() ('1973', '01', '05', '09', '30', | '00') >>> (y, m, d, h, min, s) = results.groups() | | (`results` will be None if the regex didn't match) | | Regexes are one of those things where, once you | understand it (and capture groups in particular) and it's | available in the language you're working in, string- | splitting usually doesn't feel right anymore. | burntsushi wrote: | I believe all of the people in this thread understand | regexes extremely well. :-) | | There is a lot of reasonable room to disagree about when | and where regexes should be used. | kbenson wrote: | The thing that I think sometimes gets lost when people are | talking about the strengths and weaknesses of regular | expression use, is that it can matter quite a bit what the | language in question is like. If you're using a language that | compiles down to a low level or is expected to have a fairly | close match of expressions to specific code run (C, C++, | Java, etc) then the choice is between using a regular | expression library which often does an _good enough_ match of | what you want to accomplish in a performance manner that | rolling your own optimized parser is hard to justify. | | When using an interpreted language like Perl or Python, it's | a slightly different story, since often there's no way to get | as good of performance out of a custom parser written in that | language as you can get from the optimized regular expression | library included (if you use it competently). The overhead in | the interpreted language can make some tasks impossible to do | as quickly as teh regular expression engine can. In a way, a | regular expression library is to parsing what Numpy is to | computation. | | I remember a task I had to parse a bunch of _very simplistic_ | XML in Perl a decade ago, where it consisted of the | equivalent of a bunch of records of dictionaries /hashes | (solr), and needed them as Perl hashes to work on them. I | surveyed every XML parsing module available to me in Perl, | and couldn't quite get the performance I needed, given the | file was a few GB and it needed to be done as quick as | possible multiple times a day. I implemented a simple outer | loop to grab a working chunk of the file, a regex to split | that chunk into individual records text chunks, and another | regex to return each key/value as an alternating list I could | assign directly to a hash. It was 5-6x faster than the faster | XML streaming solution I found that I could use as a Perl | module (which were all just wrapping X libs). | | Could I have coded a custom solution in C that parsed this | specialized format faster? Undoubtedly, but I'd still be | marshaling data through the C/Perl interface, so would have | taken a hit to get the data easily accessible to the rest of | the codebase, which is less of an issue for regular | expressions in Perl since they're a native capability of the | language and deeply integrated. Using regular expressions as | a parsing toolkit yielded very good results at a fraction of | the time and effort, and honestly in my opinion that's what | regular expressions are all about. | chubot wrote: | What kind of content are you looking for? From a user POV, I | think there is thankfully not much to learn about automata- | based regexes like RE2 and rust/regex. The exception is if | you're writing a performance-sensitive system entirely around | regexes, like an intrusion detection system or search engine. | | A regex is a very small concept with a very large syntax :) | It's "just" literals, alternation, repetition, and | concatenation. So it's mostly being able to see through the | syntax IMO. | | Nice animated diagrams in this recent post - | https://news.ycombinator.com/item?id=35503089 | | (On the other hand, from the implementation POV, there is a | wealth of material online, but you already know where to find | that :) I still think someone should "cliff notes" the Cox RE2 | articles, since the ideas are important but there is a lot of | detail. There are also dozens of references in those articles | for people to read; it was a comprehensive survey at the time.) | | --- | | BTW Friedl's book covers backtracking engines, which makes | sense because PCRE, Perl itself, Python, Java, JavaScript, etc. | all use that style | | But as far as I remember, most of the book is irrelevant for | users of automata-based engines. It's a whole bunch of detail | about optimizing backtracking, reordering your clauses, and | lots of detail about engine-specific and even version-specific | differences. | | Personally I just ignore all of that, and it doesn't cause me a | problem. I use regexes all the time in every language, and | stick to the features that an automata-based engine can handle. | | Once you go into the backtracking stuff, then you might want | the Friedl book, and that's a huge waste of time IMO. It's way | easier to do that kind of thing outside of a regex engine, with | normal parsing techniques. | | And honestly once you go outside, you'll find you don't even | need to backtrack. Perl-style regexes can be an astoundingly | inefficient way to write a pattern recognition algorithm. | | --- | | Eggex is my attempt to separate the two kinds of things, since | it's not obvious based on the syntax: | https://www.oilshell.org/release/latest/doc/eggex.html | | It makes the syntax smaller, more familiar, and more | composable. It's closer to the classic lex tool (e.g. GNU | flex). It's funny that lex has a way better syntax for regular | expressions for Perl, but it's used probably 10,000x less often | because most people want an interpreter and not a compiler to C | code. But lex has 90%-100% of what you need, and it's way | simpler, with a nicer syntax. | | Literals are quoted so you don't have the literal/metachar | confusion, and you can reuse patterns by name. The manual is | worth skimming: | | https://westes.github.io/flex/manual/Simple-Examples.html#Si... | | I use re2c though, not flex, and its manual is also worth | skimming: https://re2c.org/manual/manual_c.html | amelius wrote: | Does this RegEx library utilize a JIT, just like the ones in | most JavaScript implementations? | | If not, then perhaps this is a case where JavaScript beats | Rust. | burntsushi wrote: | Nope. Usually only backtrackers have a JIT (not strictly | true, I believe there is a JIT of a Thompson NFA and there is | also redgrep). This library is not a backtracker. | | I would be interested in experimenting with a JIT in the | future, primarily as a way to speed up capture group | extraction. But this regex library uses a lazy DFA, which has | excellent theoughput characteristics compared to the normal | backtracking interpreter (and perhaps even a backtracking | JIT). | | > If not, then perhaps this is a case where JavaScript beats | Rust. | | Haha. No. Follow the link to rebar to see performance | comparisons. A JIT isn't everything. | https://github.com/BurntSushi/rebar | BoppreH wrote: | Can this be used for non-string lists? I always found it | frustrating that we have this amazing machinery for searching and | modifying lists of _characters_ , but as soon as I have a list of | _numbers_ or _dates_ it all goes out of the window. | | Let's say I have a list of login attempt dates, and I want all | sequences of 5+ failed login attempts followed by a success. With | a regex it's trivial, but I can't use that; I have to roll my own | loop, flags, and temporary lists. | | Sure, I could convert my list to a string, process it, and (try | to) convert it back, but the downsides are obvious. | | Even if the performance is not as good as string-based regex, why | shouldn't we have regexes for arbitrary list types? | | --- | | Edit: I realized this is not the first time I had this idea, and | found an old Python prototype: | https://github.com/boppreh/listregex | | It's horribly slow, but as an API experiment I'm quite happy. It | also gives some tools not available to regexes, like inverting | and intersect patterns, and matched pairs. | saagarjha wrote: | You're going to have to define some sort of API for matching on | a sliding window of values. This isn't impossible, but most | languages don't really have great interfaces for it. | BoppreH wrote: | Parser combinators[1] seem perfectly suited for this. | # A sequence of 5 or more failed login attempts followed by a | successful one. obj_regex.search(repeat(is_failed, | min=5)+is_success, my_list) | | [1]: https://en.wikipedia.org/wiki/Parser_combinator | kccqzy wrote: | The C++ standard library std::basic_regex attempts to do this | by exposing a templated class over a custom character type. | https://en.cppreference.com/w/cpp/regex/basic_regex You can | provide your own trait class to define things for your custom | "characters". | | Unfortunately performance goes out of the window. And it's | likely to work as well as storing arbitrary non-character | things into a custom std::basic_string. | burntsushi wrote: | Nope. The regex library is tightly coupled with searching | strings. That's an intentional design decision. Making a regex | engine like this have a generic alphabet is a total non- | starter. It's impractically difficult for me to do. Especially | the API design and doing it in a way that doesn't impact the | perf of the primary use case. | | The kind of regex engine you want, especially one where you | don't care about perf, is not hard to build. You could take the | `regex-lite` crate I published, for example, and code it up to | be as generic as you want. In so doing, I expect you'll run | into a number of interesting challenges. | | Anyway, it's not like these things don't exist. They do. People | attempt to build them[1]. They just usually don't gain much | traction because I suspect you're overstating their general | utility. :-) | | [1]: | https://docs.rs/automata/latest/automata/trait.Alphabet.html | schuyler2d wrote: | That said it would be a really cool feature to have some of | the more complicated but common regexes (urls, emails, dates) | be some kind of "template" in the syntax. Eg what if I could | do '(?D<h=22>)' or something like that to search for all | dates in any format that had some version of 8pm represented. | | Anyway, super appreciate your work and your writing! | burntsushi wrote: | It's an interesting idea, but I think it would be worth | trying to actually prototype it as its own library first | and see if it catches on. My suspicion is that it probably | won't work as nicely as you think it will unfortunately. | IshKebab wrote: | I've also been looking for the same thing for something like | SystemVerilog Assertions (SVA) which is very regex-like, so | it isn't just him! | burntsushi wrote: | Indeed. I've been responding to questions like this for a | long time. :-) https://old.reddit.com/r/rust/comments/4a8pb | v/the_regex_crat... | | Someone should go out and built it. It won't be me though, | that's for sure. Believe it or not, such a library wouldn't | leverage much of my experience. The vast majority of the | complexity inside the regex crate is about optimizing for | sequences of bytes. | IshKebab wrote: | Yeah I am slowly working on it. I think even custom | alphabets wouldn't work for me. I'm trying to build | something so this works: let | is_even = |x: &i32| *x % 2 == 0; let is_odd = | |x: &i32| *x % 2 == 0; let pattern = | sequence(&[ is_even, | is_odd, repeat(or(is_even, is_odd), 2, | 4), ]); let mut matcher = | Matcher::compile(pattern); let elements = [4, | 1, 23, 5, 6]; for element in elements { | matcher.step(element); } | assert!(matcher.matches()); | | I'll get there one day. | | Amazing work on the Regex crate by the way! | nerdponx wrote: | Their utility is overstated until the one time when you | actually need/want it. | | Last time I tried to figure out if there were existing tools | to solve a problem like this, I came across Event Calculus: | https://en.wikipedia.org/wiki/Event_calculus | | I'm sure there's some interesting CS theory to be uncovered | here. | burntsushi wrote: | Oh it's definitely overstated by suggesting it should live | alongside a general purpose regex engine. :-) | | I have no doubt that its utility can be great in niche use | cases. I've never come across one in my decades of | programming that I know of, but I'm sure they exist. | BoppreH wrote: | > I've never come across one in my decades of programming | that I know of, but I'm sure they exist. | | The login attempt example was convoluted, so here are | more common scenarios: | | - Grouping a list into pairs by matching /../ | | - Finding or collapsing repeated sequences with /(.)\1+/ | | - Parsing tag+value binary formats. | | - Searching structured logs. | | - DSL's for unit tests. | | Are there better ways to do each of those? Yes, but | either because someone implemented those specific | functions, or it's a much longer solution. | | Though I'm not recommending list-regexes for production | code anytime soon. Prototypes and code golfing, sure, but | not until the community understands it better. | burntsushi wrote: | Yeah I just wouldn't make use of regexes with arbitrary | alphabets for any of those tasks personally. Doesn't | really seem like a win to me. | | But like I said, someone should go build it. | LoganDark wrote: | > They just usually don't gain much traction because I | suspect you're overstating their general utility. :-) | | I find myself desiring a DFA over LLM tokens right now, which | are 16 bits each, at least for this model. | | I, personally, am very pleased with the idea of attempting to | build a dense DFA with that vocabulary. Maybe your computer | would explode! | telotortium wrote: | Could you not encode the tokens into 8-bit bytes? Rust | regex supports byte strings: | https://docs.rs/regex/latest/regex/bytes/struct.Regex.html | otterpro wrote: | I use Ripgrep daily for searching anything in code or in text | files, and I'm grateful every time I use it (on windows, linux, | mac, vscode, vim, etc...). It's one of those software that has | changed my life and how I work. Whenever I'm forced to use | `grep`, it feels like I've travelled back in time when everything | ran on a single core CPU and data were on slow spinning hard disk | on PATA/IDE. Anyway, burntsushi definitely deserves honor amongst | other great programmers. | kzrdude wrote: | ripgrep comes out of a "lineage", before it there was ag and | before that ack, that did similar things to provide a much | better interface than just grep. | LoganDark wrote: | I was in the middle of writing some regex-automata code (using | the early 0.2.0 release) when this post dropped. Welp, time to | see if I'm going to have to start all over digging into some | brand new internals~ | | (I haven't read the post yet because I have an important call in | a few minutes, but it looks like a _very_ interesting and also | _conveniently-timed_ blog post.) | | (Edit a few minutes later: looks like the answer might be yes, | but since this is a polished release, I might be able to simplify | my code massively. Wish me luck~) | | (Edit 2 about 10 minutes later: well that was pretty painless and | the new Builder::patch method is a total upgrade. Awesome~) | | P.S. I'm still blocked from all your GitHub repositories and I | think that's kind of unfair considering how widespread a lot of | your crates are. I don't remember the original incident anymore. | I believe the regex crate itself is under the rust-lang | organization now, but there are still others I can't interact | with. | burntsushi wrote: | The regex-automata 0.2.0 docs did have a giant warning about | this, and strongly recommended going with 0.1: | https://docs.rs/regex-automata/0.2.0/regex_automata/ | | > I'm still blocked from all your GitHub repositories and I | think that's kind of unfair considering how widespread a lot of | your crates are. I don't remember the original incident | anymore. | | I don't either. I block a lot of people for a lot of reasons. I | unblocked you. | LoganDark wrote: | > The regex-automata 0.2.0 docs did have a giant warning | about this | | I didn't listen, and it paid off, because a bunch of the | knowledge I learned on it is still perfectly applicable to | 0.3.0; I already have the same NFA as before. :) | | Now my appointment got canceled so I get to read your blog | post _and_ play with the new version. Fun! | celeritascelery wrote: | My biggest wish for rust regex was support for ARM SIMD. | Currently the literal optimization is only vectorized on x86. It | seems that the author doesn't own an ARM machine, and therefore | can't test implementation and so doesn't want to support it. | AceJohnny2 wrote: | > _It seems that the author doesn't own an ARM machine, and | therefore can't test implementation and so doesn't want to | support it._ | | Tangential (since he answered already), but in this era of | cheap cloud instances, I'd be surprised if "not owning a | <blank>" is any hindrance to support. | cesarb wrote: | > but in this era of cheap cloud instances, I'd be surprised | if "not owning a <blank>" is any hindrance to support. | | When you own a device, it doesn't cost you more (other than | electricity) if you take longer while playing with it, and | there's no extra cost (other than space) to having it ready | to be used when necessary. When you rent a cloud instance, | there's a direct monetary cost for every extra minute you | take while investigating an issue, and it's less guaranteed | that it'll be available when you need it. | | Or, to put it more simply: unless he has his own personal | device to play with, he'll have to spend extra money for | every issue he investigates and every change he makes to the | ARM support. | burntsushi wrote: | Yes, exactly. There is a ton of friction. | burntsushi wrote: | I actually have an M2 mac mini in the mail from Apple for | exactly this purpose. | | My time horizon is very long. It takes me a long time to do | things these days. | | It has never been true that I don't want to support it. Merely | that it is difficult to verify and test if I can't do it | myself. There is also the problem that the port from x86 to arm | is not straight-forward, due to both my own ignorance and what | I believe are important missing vector operations such as | movemask. | | This is discussed a bit more here (including the bit about | movemask): https://github.com/BurntSushi/memchr/issues/76 | [deleted] ___________________________________________________________________ (page generated 2023-07-05 23:00 UTC)