[HN Gopher] Jd ___________________________________________________________________ Jd Author : tosh Score : 1015 points Date : 2022-04-04 12:04 UTC (10 hours ago) (HTM) web link (code.jsoftware.com) (TXT) w3m dump (code.jsoftware.com) | pastaking wrote: | Serious question: Why does this have so many upvotes? As a layman | I have never heard of J or Jd before, someone please provide some | context? | [deleted] | guidoism wrote: | J is an APL language. APL is the coolest language you've never | heard of. It's mind blowing in the same way people talk about | Lisp, but more so since the concepts are so alien to most | programmers. | upwardbound wrote: | bla3 figured out that it's because the link text "Jd" is so | short, and people are clicking the upvote button by mistake. | https://news.ycombinator.com/item?id=30906989 | jpf0 wrote: | I did some work to compare Jd to data.tables and found that it | was more performant in some instances such as on derived columns, | and approximately equally performant on aggregations and queries. | Jd is currently single-threaded, whereas multiple threads are | important on some types of queries. I tried to further compare | with Julia DB at the same time (maybe a year ago) and found that | was incorrectly benchmarked by the authors and far slower than | both; that might be different now. Jd is more equivalent to | data.tables on disk; Clickhouse is far better at being a large- | scale database. | | Rules of thumb on memory usage: Python/Pandas (not memory- | mapped): "In Pandas, the rule of thumb is needing 5x-10x the | memory for the size of your data." R (not memory-mapped): "A | rough rule of thumb is that your RAM should be three times the | size of your data set." Jd: "In general, performance will be good | if available ram is more than 2 times the space required by the | cols typically used in a query." | | Re CSV reading, Jd has a fast CSV reader whereas J itself does | not. I have written an Arrow integration to enable J to get to | that fast CSV reader and read Parquet. | mlochbaum wrote: | The environment around Jd has changed a bit since it was young! | Jsoftware[0] announced it in 2012, and this particular page has | been effectively the same since it was created in 2017 (I suspect | this was a page move, and the content is somewhat older). In | these early days the column-oriented database was quickly gaining | popularity but still obscure, which is why there's this | "Columnar" section that goes to so much trouble to explain the | concept. Now the idea is well known among database users and | there are lots of other options[1]. | | The history goes back further, because column-oriented is the | natural way to build a database in an array language (making a | performant row-oriented DBMS would be basically impossible). This | is because a column can be seen as a vector where every element | has the same type. A row groups values of different types, and | array languages don't have anything like C structs to handle | this. In J, Jd comes from Chris Burke's JDB proof-of-concept | (announced[2] 2008, looks like), and the linked page mentions | kdb+ (K) and Vstar (APL). KDB, first released in 1993, is | somewhat famous and gets a mention on Wikipedia's history of | column-oriented databases[3]. | | [0] Company history: https://aplwiki.com/wiki/Jsoftware | | [1] https://en.wikipedia.org/wiki/List_of_column-oriented_DBMSes | | [2] https://code.jsoftware.com/wiki/JDB/Announcement | | [3] https://en.wikipedia.org/wiki/Column-oriented_DBMS#History | moonchild wrote: | > Vstar (APL) | | Vstar was based on j, not apl. | mlochbaum wrote: | Right, I'm getting names mixed up. The Dyalog APL is vecdb, | but it's more recent than Jd and I don't think it's | progressed past being a toy. | michaelmcmillan wrote: | This section was interesting! Somehow I've never realized that | row oriented storage is orthogonal to how disks work... | Jd is a columnar (column oriented) RDBMS. Most RDBMS | systems are row oriented. Ages ago they fell into the trap of | thinking of tables as rows (records). You can see how this | happened. The end user wants the record that has a first name, | last name, license, make, model, color, and date. So a row was | the unit of information and rows were stored sequentially on | disk. Row orientation works for small amounts of data. But think | about what happens when there are lots of rows and the user wants | all rows where the license starts with 123 and the color is blue | or black. In a naive system the application has to read every | single byte of data from the disk. There are lots of bytes and | reading from disk is, by orders of magnitude, the slowest part of | the performance equation. To answer this simple question all the | data had to be read from disk. This is a performance disaster and | that is where decades of adding bandages and kludges started. | Jd is columnar so the data is 'fully inverted'. This means all of | the license numbers are stored together and sequentially on disk. | The same for all the other columns. Think about the earlier query | for license and color. Jd gets the license numbers from disk (a | tiny fraction of the database) and generates a boolean mask of | rows that match. It then gets the color column from disk (another | small fraction of the data) and generates a boolean mask of | matches and ANDS that with the other mask. It can now directly | read just the rows from just the columns that are required in the | result. Only a small fraction of the data is read. In J, columns | used in queries are likely already in memory and the query runs | at ram speed, not the sad and slow disk speed. Both | scenarios above are simplified, but the point is strong and | valid. The end user thinks in records, but the work to get those | records is best organized by columns. Row oriented | is slavishly tied to the design ideas of filing cabinets and | manila folders. Column oriented embraces computers. | A table column is a mapped file. | thethimble wrote: | If you're interested in this thought, check out Martin | Kleppman's book DDIA where he explains storage concepts like | this and many more. One of the best architecture books out | there! | derefr wrote: | I would not that this query behavior (sorted data columns | bitmasked together) is further orthogonal to primary-data | storage representation. For example, Postgres can give you this | same behavior if you declare a multi-column GIN index across | the columns you want to be searchable. | hodgesrm wrote: | > Somehow I've never realized that row oriented storage is | orthogonal to how disks work... | | The section you posted is very misleading. Storage is arranged | in blocks. The secret to database performance is how you lay | out data in those blocks and how well your access patterns to | the blocks match the capabilities of the device. This choice is | the fundamental key to database performance. | | If your database stores shopping baskets for an eCommerce site, | you want each basket in the smallest number of blocks, ideally | 1. It makes inserting, updating, and reading single baskets | very fast on most modern storage devices. | | If your database stores data for analytic queries, it's better | (in general) to store each column as an array of values. That | makes compression far better, and also makes scanning single | columns very efficient. | | To say as the article does that "row oriented is slavishly tied | to design ideas of filing cabinets and manila folders" is | nonsense. Plus there are _many_ other choices about how to | access data that include parallelization, alignment with | processor caches, trading off memory vs. storage, whether you | have a cost-base query optimizer, etc. Even within column | stores there are big differences in performance because of | these. | | (Disclaimer: I work on ClickHouse and love analytic systems. | They are great but not for everything.) | akersten wrote: | Isn't that just a weirdly detailed way to say "every column is | indexed, whether you like it or not"? | ghshephard wrote: | Ironically - He didn't even mention indexes in his | description (which he admitted was simplified) - a good query | optimizer will do wonders for not only coming up with the | appropriate hints for the query plan, but will also | _dynamically adjust_ those hints based on the underlying data | patterns. | | The example he provided, | | "So a row was the unit of information and rows were stored | sequentially on disk. Row orientation works for small amounts | of data. But think about what happens when there are lots of | rows and the user wants all rows where the license starts | with 123 and the color is blue or black. In a naive system | the application has to read every single byte of data from | the disk." | | Is something no modern database would ever do. The real | challenge is not to only read the records starting with 123, | or having blue/black - that part is trivially handled by | every Database engine I'm familiar with. The query challenge | is *do you filter on license # or color first? (If there are | 1k records starting with 123 and 5mm blue/black vehicles, the | order is pretty critical for performance) - that's one of the | features that distinguishes query optimizers. | | Columnar databases are awesome when you have columnar data to | work with - I've seen 20-30x reductions in disk storage in | the wild (and you can obviously create synthetic examples | that go way north of that), but a well indexed SQL database | backed by a solid query optimizer/planner can probably stand | it's own with a columnar database in terms of lookup | performance, particularly if your data is row-oriented to | begin with. | jve wrote: | I know nothing about J and JSoftware, but this reads like an | Aprils fools joke. Is it? | | > In a naive system the application has to read every single byte | of data from the disk. ... To answer this simple question all the | data had to be read from disk. This is a performance disaster and | that is where decades of adding bandages and kludges started. | .... Think about the earlier query for license and color. Jd gets | the license numbers from disk (a tiny fraction of the database) | | Ofcourse that data has to be read from disk. Well, for simple or | aggregate queries he may gain performance. Moreover, as other | commenter has commented, you can organize data in columns in | MSSQL too for aggregations: https://docs.microsoft.com/en- | us/sql/relational-databases/in... | | > columns used in queries are likely already in memory and the | query runs at ram speed, not the sad and slow disk speed. ... Jd | performance is affected primarily by ram. Lots of ram allows lots | of rows | | Any other RDBMS can have sensible indexes that satisfy your | queries. And, surprise, your data also lives in RAM once you read | it. | | > You can backup a database or a table with standard host shell | scripts, file copy commands, and tools such as tar/gzip/zip.... | If you understand backing up file folders, then you pretty much | understand backing up Jd databases. | | And... throw data consistency out of the window? | | I'm reading and I'm "not getting" the selling point - why is this | better? | | Okay, I read that things are files. SQLite is also a file if | physical format is a concern. | OskarS wrote: | > Ofcourse that data has to be read from disk. Well, for simple | or aggregate queries he may gain performance. | | Lets say you want to access 2 columns out of 100 in a | particular table. In a row-oriented database, you have to read | the full rows off the disk, which means that you have to read | 98 pieces of data off the disk that you have no use for, a | total waste of I/O. In a columnar database, you don't have to | do that, you just read off the relevant columns. This is VERY | similar to the "array of structs"/"struct of arrays" argument | in gamedev (and related high performance fields), it's the same | kind of tradeoff: slightly more complicated data layout traded | in for much more efficient reads. | | In addition: if you have a columnar database, you can employ | compression in a much more efficient manner. If you have 10 | million rows with the same (or very similar) data in a column, | you can compress that to a fraction of the size. This messes | with indexes, but it's often worth it because it VASTLY speeds | up aggregate calculations. | | Row-based and column-based databases have different tradeoffs | and advantages, and it's not quite as clear-cut as the article | makes it seem. But it's certainly no April fools joke: columnar | databases (for many tasks, particularly aggregates) can vastly | outperform row-oriented databases. This is why Google BigQuery | is columnar, for instance. Another good example is kdb+ (which | this is clearly based off of), which is widely used in places | which value quick time-series aggregates (Wall Street, being | the obvious example). | | The article is a bit over the top and one-sided, but it doesn't | say anything that is particularly controversial. You might | wanna read up on columnar database systems: | https://en.wikipedia.org/wiki/Column-oriented_DBMS | brianwawok wrote: | > slightly more complicated data layout traded in for much | more efficient reads. | | Depending on read patterns. The classic example is address. | Sure, you can store an address as column. Name here, city | there, street 1 there, street 2 there. How useful is 1/5th of | an address, and how often are you pulling it like that? For | something like address that you generally read all or none, | you generally are better served by a row oriented database. | | You also have FKs to kind of do this in a row oriented | database. If some part of the data is not read nearly as much | as another, it can be a foreign key sitting in another table. | OskarS wrote: | Yeah, exactly: there are tradeoffs to both models, neither | is strictly superior. You would never want to do aggregates | on addresses anyway, so that advantage is out the door. You | do, however, want to very easily index a table of | addresses, so you could quickly look them up for a | particular user, which a columnar database is (arguably) | worse at. BigQuery, in particular, does not use indexes at | all. | | (EDIT: I guess you do might want to do aggregates on | addresses, actually. "How many customers do we have in | NYC?", that kinda thing.) | vidarh wrote: | Hybrids are straightforward enough. A "simple" way of | achieving that is to support using the indexes to | directly answer queries, as quite a few databases do. Now | an index on a single column is _also_ a columnar store of | the contents of that column, yet you still have the full | row to query if you need lots of data from individual | rows. A more sophisticated option would be to reduce | duplication of column data. | | (EDIT: How well a usually row-oriented database optimises | this, is another question, and will differ by database) | michelpp wrote: | > I know nothing about J and JSoftware, but this reads like an | Aprils fools joke. Is it? | | To me it reads as being colored by a very specific tool bias. | For example: | | >> The key difference between Jd and most other database | systems is that Jd comes with a fully integrated and mature | programming language. | | Most major database systems come with a fully integrated and | mature programming language. | | >> Row orientation works for small amounts of data. | | It works for a _different data access pattern_. Row vs column | is a tradeoff spectrum. Data size is just one dimension of the | analysis. | | >> Row oriented is slavishly tied to the design ideas of filing | cabinets and manila folders. Column oriented embraces | computers. | | Pretty hyperbolic. | moonchild wrote: | > Most major database systems come with a fully integrated | and mature programming language. | | Like ... pl/[pg]sql? Not exactly a joy to write. | nostoc wrote: | Still is a fully integrated and mature programming language | though. | | And I do believe you wouldn't have any issue finding people | who think the same of J. | moonchild wrote: | How integrated? Most of jd is _written_ in j. It is also | expected that the app performing--or at least handling-- | the queries be written in j. | | And regarding maturity--j has libraries, debugger, etc. | kokizzu2 wrote: | so any benchmark against clickhouse? | simonpure wrote: | There's a wonderful podcast about array languages - | | https://www.arraycast.com/ | | Lots of great stories about software engineering besides talking | about the different dialects of array languages. | stefan_ wrote: | An example J file because this link doesn't say much: | | https://github.com/jsoftware/data_jd/blob/master/csv/csv.ijs | diarrhea wrote: | s=. 0 2}.each _3 _1{<;._2 (;i{ccfiles),'/' | | That is... not pretty. | bryanrasmussen wrote: | I would say it is not knowledge leaking, most languages leak | knowledge so that if you are not familiar with the language | but you do know some other programming languages you can sort | of figure out what they do. | | But some languages do not leak knowledge in this way. | | There is the concept of beauty in programming languages that | the expression of an idea should be succinct. This J code | might be beautiful, but unsure. | 0des wrote: | You're going too meta. Does it make you happy to write it? | Does it fulfill its purpose? If yes, don't worry how it | looks, it is fine. | vidarh wrote: | To some of us the "does it make you happy" and how it | looks are intrinsically linked. | | One of the things that makes me happy is to write | beautiful code. | jollybean wrote: | It's meant to be efficient, not pretty. | | It's catching on within the AI community because the syntax | matches well to the kinds of matrix operations common in that | field. | | I think Nvidia's next chip is going to have a compiler for | jlang. | | Also, I think they are starting to use it as an 'entry level' | language for kids, you know, like grade school. | razetime wrote: | J isn't really made to be pretty. It's made to be terse and | simple to read once given enough learning effort, and it's | made to be a consistent keyboard typable notation. | forgotpwd16 wrote: | Tbh that file says even less. It's like in discussion about | pandas giving a link to read_csv.py in pandas source. | jenny91 wrote: | It tells me everything I need to know about this language! | mlochbaum wrote: | Here's one of the more central files that ties into how a Jd | database is laid out: | | https://github.com/jsoftware/data_jd/blob/master/base/common... | | Not that I claim anyone in particular can read it of course. Jd | uses a hierarchy of folder, database, table, column that's | handled with an object system to share code between them. A | folder is just a place to put databases and hardly needs to add | anything, while the other levels have a lot of extra | functionality. As an inverted database, Jd stores each column | in a file, and accesses it using memory mapping. | | https://github.com/jsoftware/data_jd/blob/master/base/folder... | | https://github.com/jsoftware/data_jd/blob/master/base/table.... | | (I designed this system when I did some of the early work to | turn JDB into Jd as a summer intern) | hnrj95 wrote: | are there benchmarks against kdb+ and/or shakti? | bla3 wrote: | Meta comment: I tried to click the link but since the title is so | short and my mousing not very precise, I accidentally clicked the | upvote arrow. I then clicked "unvote" and tried again, but the | same thing happened. The third time round, I managed to click the | link. | | Takeaway: Very short titles might get you some upvotes from | clumsy users :) | raphaelj wrote: | The same happened with me... | version_five wrote: | Yes! I came here to post the same thing - the link didn't work | for me, and I inadvertently upvoted as well. I assume the | article has merits of it's own, but I do notice a huge ratio of | votes to comments, I guess some are inadvertent. It would be | interesting to search through other very short titles and look | at the ratio of comments to votes vs others above some vote | threshold... (I'm on my phone or I'd try) - edit, is there a | regex search for HN anywhere? It looks like algolia doesn't | support them | rkalla wrote: | lol same and would have never thought to comment on it because | obviously user-error, then I saw this and thought twice. | legalcorrection wrote: | I'm intrigued but skeptical of this bit: | | _Jd is a columnar (column oriented) RDBMS. | | Most RDBMS systems are row oriented. Ages ago they fell into the | trap of thinking of tables as rows (records). You can see how | this happened. The end user wants the record that has a first | name, last name, license, make, model, color, and date. So a row | was the unit of information and rows were stored sequentially on | disk. Row orientation works for small amounts of data. But think | about what happens when there are lots of rows and the user wants | all rows where the license starts with 123 and the color is blue | or black. In a naive system the application has to read every | single byte of data from the disk. There are lots of bytes and | reading from disk is, by orders of magnitude, the slowest part of | the performance equation. To answer this simple question all the | data had to be read from disk. This is a performance disaster and | that is where decades of adding bandages and kludges started. | | Jd is columnar so the data is 'fully inverted'. This means all of | the license numbers are stored together and sequentially on disk. | The same for all the other columns. Think about the earlier query | for license and color. Jd gets the license numbers from disk (a | tiny fraction of the database) and generates a boolean mask of | rows that match. It then gets the color column from disk (another | small fraction of the data) and generates a boolean mask of | matches and ANDS that with the other mask. It can now directly | read just the rows from just the columns that are required in the | result. Only a small fraction of the data is read. In J, columns | used in queries are likely already in memory and the query runs | at ram speed, not the sad and slow disk speed. | | Both scenarios above are simplified, but the point is strong and | valid. The end user thinks in records, but the work to get those | records is best organized by columns. | | Row oriented is slavishly tied to the design ideas of filing | cabinets and manila folders. Column oriented embraces computers. | | A table column is a mapped file._ | | What's the other side of this argument? | drc500free wrote: | If you're using J, you are probably doing analytics and stats. | That means you are looking for patterns in a handful of | attributes across a large population - i.e. columnar. | | As others have said, row-based makes sense for most OLTP / app | databases. You're probably not writing those products in J. | lostgame wrote: | This seems... _highly_ impractical for...90% of operations I 'd | be wanting to do with a DBMS. | | It kinda seems like a 'different way for different's sake' | kinda solution? :/ | | I understand there must be a minority of operations that can | benefit from this, but overall I can't imagine this being | popular for most DB operations. | vidarh wrote: | It tend to include a large proportion of the large, expensive | reporting queries your business people want to do. Whether or | not those kinds of queries dominates for your system will | depend greatly on your system. | | You also need to reach a certain scale before the choice | (either way) will affect you enough to matter. | | But when you reach that scale it can be the difference | between reporting queries taking seconds vs. hours in some | cases. | | For some systems you'll end up wanting _both_ , and stream | updates from the transaction focused db (row oriented) into a | separate reporting database that uses a column store. | cmrdporcupine wrote: | Yeah, it's a simplification and one-sided. | | The general consensus as I understand it is: column-oriented | indices/storage options are good for OLAP, large scale | analytics, bulk data analysis. Row-oriented indices are suited | more for OLTP, individual "record processing." | | Both are just techniques and there's nothing stopping a single | db product from offering both. | Semaphor wrote: | > Both are just techniques and there's nothing stopping a | single db product from offering both. | | e.g. for MS SQL there are columnstore indexes. | iamwil wrote: | Wouldn't column stores be better for the cache? | cmrdporcupine wrote: | I think the answer to that is just: it depends. | | Again comes back to usage patterns. Yes, if you're doing | aggregation operations on a small number of columns then I | expect locality of reference could be better with a column- | store, rather than thrashing through row-retrievals one | after another (and then just throwing them away after | aggregating). | | But if you're frequently doing "look up this customer and | others like them" and then using the bulk of the | information there? I'd expect better cache behaviour out of | row oriented storage. | | But these days it's so unclear what's happening inside the | actual "black box" that is our hardware that it's hard to | make generalizations. | tormeh wrote: | It all depends on access pattern. Do you tend to select entire | rows? Use a row-oriented DB. Do you tend to select entire | columns? A column-oriented database might be for you. That's | it, really. None of the designs are superior, afaik. | hoosieree wrote: | Just to add, because J is an array-oriented language, it | makes some kinds of column-oriented access patterns easier. | | For example, it's trivial to sort one array by the values of | another array: x /: y | | To me, it's much easier to read than the equivalent in NumPy: | x[np.argsort(y)] | | Or get pairs of (unique value; count) from an array using the | key operator (/.): (~.;#)/.~ y | | Column db's make sense for array-oriented languages, because | there's much less of a mismatch compared to OOP with | relational. | legalcorrection wrote: | All of that syntax is awful. Why not just x.sortBy(y) ? Did | all of the advances in software legibility fail to make | their way to the modern scientific computing world? | moonchild wrote: | https://www.jsoftware.com/papers/tot.htm | avmich wrote: | Hyperbolically, because you don't write math with | variables in camel case. | | J traces its roots from a notation for math, used on | whiteboards. That awful syntax you see - it's the same as | in some formulas in, say, general relativity, only J is | Turing complete and not a Turing tarpit. When you work on | a formula, in case of J you have ability to execute it, | and if you see it's wrong you can update the formula and | try again. This could also be done in other languages, | but in J (I mean, APL family of languages) it's more | focused. | | In defense of J, I had a professional example of a | problem which wasn't clearly specified, which needed some | experimentation - that took, if I remember correctly, | some 45 minutes of attempts in J, and then the prototype | was re-written in C#, when if was already producing | desired outcomes. Rewriting took somewhat longer. | moonchild wrote: | You might be selecting entire rows, but you are probably not | selecting _all_ of the rows, and your selection criteria | probably do not depend on all of the columns. | WJW wrote: | Yeah, row-oriented is good for WHERE queries and column- | oriented is good for SUM (or other aggregation) queries. | [deleted] | rileyphone wrote: | A lot of modern, data-oriented ECS frameworks for game dev | follow a similar philosophy, wherein components are stored in | linear collections that optimize memory layout for caches and | parallelism. Given how rarely you need 'SELECT *' this makes | sense for a relational DB as well, though modern SQL DBs have a | lot of sweat put in to their performance. | giaour wrote: | In many OLTP systems, almost all work operates on multiple | attributes of a single record. E.g., when logging a user in, an | authentication system cares about multiple attributes of a | single user record, not facts about the aggregate pool of | users. | | Column oriented stores are extremely efficient for aggregate | queries, but they make writes and single-row reads more | expensive and are thus not suitable for every workload. There's | an excellent overview in Martin Kleppmann's Designing Data | Intensive Applications. | 0des wrote: | > Early adopters of Jd are assumed to have a J background and | documentation and tutorials depend on that background. | | All 12 of us are jumping up and down saying "it's our time, | finally the day has come" | [deleted] | jdshupe wrote: | Seeing J at the top was indeed a jump up and down moment. | 0des wrote: | We should have a secret handshake or some type of insignia to | better signal to our peers. I've tried draping a J colored | kerchief out of my back pocket but the results so far are not | great, it appears there is more anti-J sentiment than I'd | imagined, as I get harassed unduely in certain areas of town. | May have to switch to maybe a hand gesture based signaling | that can be done on the fly to signal allegiance. | recuter wrote: | https://www.atlasobscura.com/articles/hobo-code | plibither8 wrote: | 763 (and counting) votes, no. 1 on the frontpage, ...and only 83 | comments? This is one of the most skewed ratios I've seen on HN. | upwardbound wrote: | bla3 figured out that it's because the link text "Jd" is so | short, and people are clicking the upvote button by mistake. | https://news.ycombinator.com/item?id=30906989 | marcodiego wrote: | "Jd source is largely J code and that code is open and available | to licensed users." | | License? | anonu wrote: | It's sort of misleading because J is closed source | 0des wrote: | Curious how you arrived at this conclusion | moonchild wrote: | J is fully opensource: https://github.com/jsoftware/jsource | | Most of jd's source is publicly available: | https://github.com/jsoftware/data_jd | SparkyMcUnicorn wrote: | Is it? | | https://github.com/jsoftware/jsource/blob/master/license.tx | t | moonchild wrote: | > J SOURCE can be used under a commercial license from | Jsoftware, in which case the terms and conditions of that | license apply. | | > OR | | > J Source can be used under GNU General Public License | version 3, in which case the terms and conditions of that | license apply. | | Seems pretty clear to me. | Shared404 wrote: | As a side note, I really love the choice to dual license, | and wish it were offered more often. | jenny91 wrote: | It's extremely common: license it under GPL/AGPL or some | other very copyleft license; get contributors to sign a | CLA, then offer the library with hefty license fees for | non-FOSS projects. | misnome wrote: | Because it's commercially available without GPL? | jollybean wrote: | I wonder when Java or Swift will finally get around to adopting | 'self effacing references'. It's 2022. | | [1] | https://code.jsoftware.com/wiki/Vocabulary/SpecialCombinatio... | anonu wrote: | Jd has been around for a while. Buy is it production ready? | | I'm still looking for an open source replacement to kdb, that | matches kdb's speed and featureset. | kokizzu2 wrote: | clickhouse '__') | swasheck wrote: | i really dislike clickhouse for anything less than | rudimentary analysis, but appreciate that it's fast for that. | nimrody wrote: | Can you give some tips on what do you mean by "less than | rudimentary analysis"? Considering adopting Clickhouse and | wondering whether we will encounter problems down the road. | swasheck wrote: | the biggie for me was that analytic window functions are | either non-existent or experimental and must be achieved | with array function hacks. | | it does have nice built-in skew and kurtosis functions, | though. | yiyus wrote: | Jd is not open source. | vmchale wrote: | Nothing matches kdb's speed except GPU-accelerated DBs: | https://tech.marksblogg.com/benchmarks.html | ZeroCool2u wrote: | Yeah, KDB is... Not super fun. I've been looking at TimeScaleDB | recently, because it's just a PostgreSQL plug-in it seems nice | and simple, but I haven't actually compared them directly yet. | LoriP wrote: | If you want some intro info - and you may have found it | already - the YouTube channel is a great place to start for | TimescaleDB youtube.com/TimescaleDB (for tranparency: I work | for Timescale...) | mritchie712 wrote: | If you were looking at pg because you need: | | - open source | | - SQL based | | - analytics data warehouse | | Then check out Clickhouse. I've been really happy with it and | it checks all those boxes. | | ps - if you're interested in working with clickhouse and open | source data tools, I'm hiring: mike@luabase.com | nathan_compton wrote: | I've programmed in J professionally (admittedly not for all that | long) as a data scientist and, coincidentally, have just | completed a small analysis using J as part of an internal | workshop about data analysis I am planning. I typically work in R | and Python and I have to say that at this stage there is almost | no reason I would pick up J to do any work. Unless code-golf | level conciseness is your only goal, these other platforms offer | superior performance, clarity, ease of use, access to libraries | and are, as programming languages, substantially better designed. | | I say this as a great lover of function-level programming and as | a J enthusiast. I would say I am quite familiar with J's | programming paradigm and conceptual widgets and doodads (I know | the verbs, nouns, adverbs and conjunctions and can use them | appropriately). I even remembered a pretty good portion of the | Nuvoc. But doing even the simplest analysis in J was | _excrutiatingly_ slow and inconvenient compared to using R and | the tidyverse (in particular, I missed dplyr and ggplot). The | tidyverse CSV readers are, for example, much faster and smarter | and more convenient and informative than anything you'll get from | the J universe. | | I love vector languages but at this point J can't compete with | the major platforms for data analysis. Its less convenient, often | _slower_, much more low level, strange, and its library situation | is anemic at best. I recommend learning J because it will expand | your mind, but I can't imagine picking it up for real work. | moonchild wrote: | The ecosystem problems are genuine. Though I do not think they | are so great as you make them out to be. But with respect to | semantics, numpy et al are but pale imitations. With respect to | syntax, too (https://www.jsoftware.com/papers/tot.htm). | nathan_compton wrote: | I sort of agree with you, especially about numpy. Nothing in | the data science space in Python feels right to me. But you | can't beat the network effects. Its still easier to actually | do data analysis in Python than in J. | user3939382 wrote: | > I do not think they are so great as you make them out to be | | There's a dynamic with ecosystem problems I believe applies | to all languages. You only need one missing or bad library | that's critical to your project to make the whole language | useless. | | An anecdotal example: I remember many years ago trying to | give Python a go and within 15 minutes ran into a problem | parsing XML. A search revealed this was a known issue that | was being worked on with the foremost tool in Python for this | job. You couldn't have credibly argued that Python had an | ecosystem problem even at the time, but for me in that | particular scenario Python had a show-stopping ecosystem | problem. There were ways around this, but the most convenient | way around it at the time was switching back to a more | familiar language. | | My greater point is that, we can definitely make | generalizations about a language's ecosystem health, but keep | in mind there is a very context-sensitive, practical | dimension to that type of language assessment. | moonchild wrote: | > You only need one missing or bad library that's critical | to your project to make the whole language useless | | ...no? If there is functionality I need, and no library | implements it, I will implement it myself. That goes for | any language. Otherwise, the job of a programmer would | simply be to string together existing libraries, not | writing anything meaningful. | user3939382 wrote: | > I will implement it myself | | Are you saying in the scenario described, your solution | would have been to write an XML parser from scratch? | moonchild wrote: | If I need one, and I cannot find one, then yes. | mlochbaum wrote: | It's not ideal, but I've done this in BQN and it took | about 15 lines. I didn't need to handle comments or | escapes, which would add a little complexity. See | functions ParseXml and ParseAttr here: https://github.com | /mlochbaum/Singeli/blob/master/data/iintri... | | XML is particularly simple though, dealing with something | like JPEG would be an entirely different experience. | RexM wrote: | Yeah. | | It can't be that hard.(tm) | recuter wrote: | The job of a programmer is to glue together existing | libraries in the most convoluted manner possible and | collect rent on maintenance. Perhaps even graduate to | consulting. Grow a pointy haircut. | | Who the hell wants to be a programmer, dismal profession. | VHRanger wrote: | The fact there are vector languages in subsets of python | (numpy, pandas, etc.) and R. | | And these already have great large columnar dataset support | (eg. Apache Arrow) | | And an open source community intent on developing and | maintaining the ecosystem. | nathan_compton wrote: | One of the nicest thing about J is the notion of verb rank. | For non-J-programmers, you can apply a rank to a verb and | this effects how the verb operates on its vector operands. A | rank of zero means "operate on the entire object" whereas a | rank of 1 means "operate on the (1) elements of the operands. | Other ranks change the meaning of what counts as "an | element." | | However, like most things in J, support for this excellent | idea (which eliminates the need for most looping constructs | and can be very performant) is irregular: it is limited to | monadic and dyadic verbs. Nothing about verb rank forbids | functions which accept more than two arguments, but the idea | of a function which accepts more than 2 arguments is poorly | supported in J (the idiom is to pass a boxed array to a | monad, but the boxing of the items to be passed makes | supporting rank behavior for the "arguments" impossible or | absurdly complicated. | | Other beefs with J: J doesn't have first class functions as | such. While you can represent functions as "nouns" in a few | ways, you cannot have (for example) an anonymous reference to | a function as a thing unto itself (you may denote a verb | tacitly in a context where you need a verb, however, but this | is not the same thing). If you want to pass around verbs in a | way familiar to you as a contemporary programmer you have to | use "adverbs" and "conjunctions" which are just higher order | functions which (more or less) return verbs. But adverbs and | conjunctions have their own peculiarities and restrictions | (not the least of which is that they are not themselves verbs | or nouns and thus cannot be passed around either). In | contemporary programming languages the | verb/adverb/conjunction space would just be represented by | "functions" and to great effect. As a functional programmer | and Lisp guy, I find the limitations on "verbs" very | frustrating in J. | | J's error messages are also bad, never more than a few words. | | There are some great ideas in the language, but it feels very | old-fashioned and out of touch. | | What I would like to see is a "array scheme." A lexically | scoped Scheme-like language where every object is an array | and function argument slots can be independently "ranked" to | support the elimination of loops over array arguments. I'm | too busy to put this together, but it would be great to have | if you wanted to fiddle with arrays for some reason but could | do without any library support for actually doing data | analysis. | beagle3 wrote: | I haven't used R recently (10 years or so), but when I did, the | speed with which K/kdb+ could scan through and summarize | terabytes of data was orders of magnitude faster than R or any | other system. Once the data was summarized into (say) a | gigabyte or so, analyzing it with R or even Python was much | easier thanks to the ecosystem and reasonable time (probably | 10-100 times slower, but the time saved by using well tested | stat code is more than worth it) | 0des wrote: | > substantially better designed | | Hey Siri, please remove Nathan_Compton from the Christmas card | list. | recuter wrote: | Thank you for this. | | What do you make of BQN? https://aplwiki.com/wiki/BQN | | I get enamored with apl/k/j every time I see it and was looking | for excuses to use it despite everything. | | I understand that due to the much smaller community the tooling | and ecosystem is much weaker but there must be a reason why | some people keep reaching for it, especially the guys in | finance. I don't get the Cobol vibes from it like it is some | sort of legacy burden. While the use case is narrow there must | be an edge. | | This is HN after all. You wouldn't tell people not to mess with | lisp and just reach for python now would you? *puppy eyes | stare* | all2 wrote: | When I want to just "Get stuff done" TM, I reach for Python. | Except that I've stopped doing that because setting up | package versioning and venvs is a nightmare that gets more | frustrating every time I try to do it. | | Now I'm looking for a "better" TM way to get my scripting | needs met. I'm looking at Nim, specifically. I may also try | to lean on a Scheme or a Lisp. My problem with the latter is | lack of decent docs for getting stuff done. Maybe I'm missing | something, but being productive in those languages for me is | like a high jump when I can't even step up on a curb. | jrapdx3 wrote: | Some Scheme/Lisp implementations are capable enough to | accomplish daily work. Common Lisp is one option, and I've | used Chicken Scheme effectively for some projects. | | You're right though, there's a significant learning curve | with any language in a different paradigm. Forth-like | languages are an example, and yeah, J/K and cousins are | hard to grasp. I've dabbled in these but never quite got | there. | | IMO Lisp-like languages aren't quite as "foreign" since the | syntax is a variation on 'function parameters body' used in | "normal" (Algol-like) languages. I guess it comes down to | what we get used to, and really for many purposes choice of | language isn't all that critical, assuming of course it | supports the task at hand. | rscho wrote: | Racket has been rated as 'an acceptable python' by a famous | programmer. Well deserved, I think. | beagle3 wrote: | Nimpy makes it possible to move from Python to Nim | gradually. It's magical, and while it doesn't solve | python's own venv problems, it would only need the DLL from | Python - whether it was 2.5 or 3.4 or 3.8, it would just | work - they probably removed the python2 support by now, | but it was just magic. | nathan_compton wrote: | J feels a lot like Smalltalk and Lisp to me. If you got on | board early, you could do all sorts of stuff other languages | struggled to make easy and performant. Hence the set of | dedicated users. And there are some genuinely interesting | conceptual things going on in array languages which have real | appeal. But in the end I think J reflects a previous era and | hasn't caught up to really useful ideas in more contemporary | languages, probably because its user base is too | conservative. | | I wouldn't recommend people use XLisp or run Genera in a VM | to solve real problems. Recommending J feels like that to me. | recuter wrote: | I see your point. You dream crushing bastard. :) | | For no reason whatsoever here is a link to a guy building a | Korean style wooden house by hand without using nails: | https://www.youtube.com/watch?v=hvsvMzgiq6s | | What are "real problems" anyway? | | Sigh. You're right, I know you're right. Somehow this field | is losing appeal over time. I'm going for a walk. | mlochbaum wrote: | > Somehow this field is losing appeal over time. | | Not true, at all! Since 2010 or so, the APL family has | only improved its reputation and grown in popularity. I | listed some developments of the past two years at | https://news.ycombinator.com/item?id=28930064. Now, it's | not much relative to the huge growth of array frameworks | like TensorFlow with more mainstream language design, but | it is definitely not losing appeal. | recuter wrote: | Oh no, Marshall, I was being far more despondent and was | referring to programming as a whole. Thank you very much | for your efforts on BQN. | | Speaking of TensorFlow, I was looking at tinygrad the | other day: https://github.com/geohot/tinygrad/blob/master | /tinygrad/tens... | | Very tempted to port it to BQN. I could be wrong but I | bet it would shine for that. You could print the whole | thing on a t-shirt. | mlochbaum wrote: | Oh, thanks for clarifying, since it occurred to me that | you might mean just the appeal to you, but not that you | meant the field of programming! I'm no NN expert, but | tinygrad looks very approachable in BQN. You might be | interested in some other initial work along those lines: | https://github.com/loovjo/BQN-autograd with automatic | differentiation, and the smaller | https://github.com/bddean/BQNprop using backprop. | hvs wrote: | TBF, that guy isn't doing it for the fun of it (OK, | partly for the fun of it) but because Mr. Chickadee is a | content creator. Sure, it's a lifestyle choice, but he | also makes his living do it. I love his channel, but his | lifestyle is as much a product of our modern world as the | Java programming language is. | moonchild wrote: | > I wouldn't recommend people use XLisp or run Genera in a | VM to solve real problems. Recommending J feels like that | to me. | | Genera and interlisp are great. I wouldn't deploy them | because: | | 1) slow | | 2) no multithreading | | 3) incompatible with modern cls | | Point 3 is being worked on (for genera at least, and | possibly also for interlisp). But none of these seems | significant wrt j. | jonahx wrote: | >I get enamored with apl/k/j every time I see it and was | looking for excuses to use it despite everything. | | You should do it. Nothing in my programming career has | changed the way I thought so much as learning J to the point | of real fluency. Though you could swap out APL, k, or BQN for | the same effect. | agumonkey wrote: | How do you feel about the J/APL syntax in live coding sessions | ? does it help iterating a bit faster than R/python ? or was it | a totally irrelevant aspect ? ___________________________________________________________________ (page generated 2022-04-04 23:00 UTC)