[HN Gopher] Jd
       ___________________________________________________________________
        
       Jd
        
       Author : tosh
       Score  : 1015 points
       Date   : 2022-04-04 12:04 UTC (10 hours ago)
        
 (HTM) web link (code.jsoftware.com)
 (TXT) w3m dump (code.jsoftware.com)
        
       | pastaking wrote:
       | Serious question: Why does this have so many upvotes? As a layman
       | I have never heard of J or Jd before, someone please provide some
       | context?
        
         | [deleted]
        
         | guidoism wrote:
         | J is an APL language. APL is the coolest language you've never
         | heard of. It's mind blowing in the same way people talk about
         | Lisp, but more so since the concepts are so alien to most
         | programmers.
        
         | upwardbound wrote:
         | bla3 figured out that it's because the link text "Jd" is so
         | short, and people are clicking the upvote button by mistake.
         | https://news.ycombinator.com/item?id=30906989
        
       | jpf0 wrote:
       | I did some work to compare Jd to data.tables and found that it
       | was more performant in some instances such as on derived columns,
       | and approximately equally performant on aggregations and queries.
       | Jd is currently single-threaded, whereas multiple threads are
       | important on some types of queries. I tried to further compare
       | with Julia DB at the same time (maybe a year ago) and found that
       | was incorrectly benchmarked by the authors and far slower than
       | both; that might be different now. Jd is more equivalent to
       | data.tables on disk; Clickhouse is far better at being a large-
       | scale database.
       | 
       | Rules of thumb on memory usage: Python/Pandas (not memory-
       | mapped): "In Pandas, the rule of thumb is needing 5x-10x the
       | memory for the size of your data." R (not memory-mapped): "A
       | rough rule of thumb is that your RAM should be three times the
       | size of your data set." Jd: "In general, performance will be good
       | if available ram is more than 2 times the space required by the
       | cols typically used in a query."
       | 
       | Re CSV reading, Jd has a fast CSV reader whereas J itself does
       | not. I have written an Arrow integration to enable J to get to
       | that fast CSV reader and read Parquet.
        
       | mlochbaum wrote:
       | The environment around Jd has changed a bit since it was young!
       | Jsoftware[0] announced it in 2012, and this particular page has
       | been effectively the same since it was created in 2017 (I suspect
       | this was a page move, and the content is somewhat older). In
       | these early days the column-oriented database was quickly gaining
       | popularity but still obscure, which is why there's this
       | "Columnar" section that goes to so much trouble to explain the
       | concept. Now the idea is well known among database users and
       | there are lots of other options[1].
       | 
       | The history goes back further, because column-oriented is the
       | natural way to build a database in an array language (making a
       | performant row-oriented DBMS would be basically impossible). This
       | is because a column can be seen as a vector where every element
       | has the same type. A row groups values of different types, and
       | array languages don't have anything like C structs to handle
       | this. In J, Jd comes from Chris Burke's JDB proof-of-concept
       | (announced[2] 2008, looks like), and the linked page mentions
       | kdb+ (K) and Vstar (APL). KDB, first released in 1993, is
       | somewhat famous and gets a mention on Wikipedia's history of
       | column-oriented databases[3].
       | 
       | [0] Company history: https://aplwiki.com/wiki/Jsoftware
       | 
       | [1] https://en.wikipedia.org/wiki/List_of_column-oriented_DBMSes
       | 
       | [2] https://code.jsoftware.com/wiki/JDB/Announcement
       | 
       | [3] https://en.wikipedia.org/wiki/Column-oriented_DBMS#History
        
         | moonchild wrote:
         | > Vstar (APL)
         | 
         | Vstar was based on j, not apl.
        
           | mlochbaum wrote:
           | Right, I'm getting names mixed up. The Dyalog APL is vecdb,
           | but it's more recent than Jd and I don't think it's
           | progressed past being a toy.
        
       | michaelmcmillan wrote:
       | This section was interesting! Somehow I've never realized that
       | row oriented storage is orthogonal to how disks work...
       | Jd is a columnar (column oriented) RDBMS.              Most RDBMS
       | systems are row oriented. Ages ago they fell into the trap of
       | thinking of tables as rows (records). You can see how this
       | happened. The end user wants the record that has a first name,
       | last name, license, make, model, color, and date. So a row was
       | the unit of information and rows were stored sequentially on
       | disk. Row orientation works for small amounts of data. But think
       | about what happens when there are lots of rows and the user wants
       | all rows where the license starts with 123 and the color is blue
       | or black. In a naive system the application has to read every
       | single byte of data from the disk. There are lots of bytes and
       | reading from disk is, by orders of magnitude, the slowest part of
       | the performance equation. To answer this simple question all the
       | data had to be read from disk. This is a performance disaster and
       | that is where decades of adding bandages and kludges started.
       | Jd is columnar so the data is 'fully inverted'. This means all of
       | the license numbers are stored together and sequentially on disk.
       | The same for all the other columns. Think about the earlier query
       | for license and color. Jd gets the license numbers from disk (a
       | tiny fraction of the database) and generates a boolean mask of
       | rows that match. It then gets the color column from disk (another
       | small fraction of the data) and generates a boolean mask of
       | matches and ANDS that with the other mask. It can now directly
       | read just the rows from just the columns that are required in the
       | result. Only a small fraction of the data is read. In J, columns
       | used in queries are likely already in memory and the query runs
       | at ram speed, not the sad and slow disk speed.              Both
       | scenarios above are simplified, but the point is strong and
       | valid. The end user thinks in records, but the work to get those
       | records is best organized by columns.              Row oriented
       | is slavishly tied to the design ideas of filing cabinets and
       | manila folders. Column oriented embraces computers.
       | A table column is a mapped file.
        
         | thethimble wrote:
         | If you're interested in this thought, check out Martin
         | Kleppman's book DDIA where he explains storage concepts like
         | this and many more. One of the best architecture books out
         | there!
        
         | derefr wrote:
         | I would not that this query behavior (sorted data columns
         | bitmasked together) is further orthogonal to primary-data
         | storage representation. For example, Postgres can give you this
         | same behavior if you declare a multi-column GIN index across
         | the columns you want to be searchable.
        
         | hodgesrm wrote:
         | > Somehow I've never realized that row oriented storage is
         | orthogonal to how disks work...
         | 
         | The section you posted is very misleading. Storage is arranged
         | in blocks. The secret to database performance is how you lay
         | out data in those blocks and how well your access patterns to
         | the blocks match the capabilities of the device. This choice is
         | the fundamental key to database performance.
         | 
         | If your database stores shopping baskets for an eCommerce site,
         | you want each basket in the smallest number of blocks, ideally
         | 1. It makes inserting, updating, and reading single baskets
         | very fast on most modern storage devices.
         | 
         | If your database stores data for analytic queries, it's better
         | (in general) to store each column as an array of values. That
         | makes compression far better, and also makes scanning single
         | columns very efficient.
         | 
         | To say as the article does that "row oriented is slavishly tied
         | to design ideas of filing cabinets and manila folders" is
         | nonsense. Plus there are _many_ other choices about how to
         | access data that include parallelization, alignment with
         | processor caches, trading off memory vs. storage, whether you
         | have a cost-base query optimizer, etc. Even within column
         | stores there are big differences in performance because of
         | these.
         | 
         | (Disclaimer: I work on ClickHouse and love analytic systems.
         | They are great but not for everything.)
        
         | akersten wrote:
         | Isn't that just a weirdly detailed way to say "every column is
         | indexed, whether you like it or not"?
        
           | ghshephard wrote:
           | Ironically - He didn't even mention indexes in his
           | description (which he admitted was simplified) - a good query
           | optimizer will do wonders for not only coming up with the
           | appropriate hints for the query plan, but will also
           | _dynamically adjust_ those hints based on the underlying data
           | patterns.
           | 
           | The example he provided,
           | 
           | "So a row was the unit of information and rows were stored
           | sequentially on disk. Row orientation works for small amounts
           | of data. But think about what happens when there are lots of
           | rows and the user wants all rows where the license starts
           | with 123 and the color is blue or black. In a naive system
           | the application has to read every single byte of data from
           | the disk."
           | 
           | Is something no modern database would ever do. The real
           | challenge is not to only read the records starting with 123,
           | or having blue/black - that part is trivially handled by
           | every Database engine I'm familiar with. The query challenge
           | is *do you filter on license # or color first? (If there are
           | 1k records starting with 123 and 5mm blue/black vehicles, the
           | order is pretty critical for performance) - that's one of the
           | features that distinguishes query optimizers.
           | 
           | Columnar databases are awesome when you have columnar data to
           | work with - I've seen 20-30x reductions in disk storage in
           | the wild (and you can obviously create synthetic examples
           | that go way north of that), but a well indexed SQL database
           | backed by a solid query optimizer/planner can probably stand
           | it's own with a columnar database in terms of lookup
           | performance, particularly if your data is row-oriented to
           | begin with.
        
       | jve wrote:
       | I know nothing about J and JSoftware, but this reads like an
       | Aprils fools joke. Is it?
       | 
       | > In a naive system the application has to read every single byte
       | of data from the disk. ... To answer this simple question all the
       | data had to be read from disk. This is a performance disaster and
       | that is where decades of adding bandages and kludges started.
       | .... Think about the earlier query for license and color. Jd gets
       | the license numbers from disk (a tiny fraction of the database)
       | 
       | Ofcourse that data has to be read from disk. Well, for simple or
       | aggregate queries he may gain performance. Moreover, as other
       | commenter has commented, you can organize data in columns in
       | MSSQL too for aggregations: https://docs.microsoft.com/en-
       | us/sql/relational-databases/in...
       | 
       | > columns used in queries are likely already in memory and the
       | query runs at ram speed, not the sad and slow disk speed. ... Jd
       | performance is affected primarily by ram. Lots of ram allows lots
       | of rows
       | 
       | Any other RDBMS can have sensible indexes that satisfy your
       | queries. And, surprise, your data also lives in RAM once you read
       | it.
       | 
       | > You can backup a database or a table with standard host shell
       | scripts, file copy commands, and tools such as tar/gzip/zip....
       | If you understand backing up file folders, then you pretty much
       | understand backing up Jd databases.
       | 
       | And... throw data consistency out of the window?
       | 
       | I'm reading and I'm "not getting" the selling point - why is this
       | better?
       | 
       | Okay, I read that things are files. SQLite is also a file if
       | physical format is a concern.
        
         | OskarS wrote:
         | > Ofcourse that data has to be read from disk. Well, for simple
         | or aggregate queries he may gain performance.
         | 
         | Lets say you want to access 2 columns out of 100 in a
         | particular table. In a row-oriented database, you have to read
         | the full rows off the disk, which means that you have to read
         | 98 pieces of data off the disk that you have no use for, a
         | total waste of I/O. In a columnar database, you don't have to
         | do that, you just read off the relevant columns. This is VERY
         | similar to the "array of structs"/"struct of arrays" argument
         | in gamedev (and related high performance fields), it's the same
         | kind of tradeoff: slightly more complicated data layout traded
         | in for much more efficient reads.
         | 
         | In addition: if you have a columnar database, you can employ
         | compression in a much more efficient manner. If you have 10
         | million rows with the same (or very similar) data in a column,
         | you can compress that to a fraction of the size. This messes
         | with indexes, but it's often worth it because it VASTLY speeds
         | up aggregate calculations.
         | 
         | Row-based and column-based databases have different tradeoffs
         | and advantages, and it's not quite as clear-cut as the article
         | makes it seem. But it's certainly no April fools joke: columnar
         | databases (for many tasks, particularly aggregates) can vastly
         | outperform row-oriented databases. This is why Google BigQuery
         | is columnar, for instance. Another good example is kdb+ (which
         | this is clearly based off of), which is widely used in places
         | which value quick time-series aggregates (Wall Street, being
         | the obvious example).
         | 
         | The article is a bit over the top and one-sided, but it doesn't
         | say anything that is particularly controversial. You might
         | wanna read up on columnar database systems:
         | https://en.wikipedia.org/wiki/Column-oriented_DBMS
        
           | brianwawok wrote:
           | > slightly more complicated data layout traded in for much
           | more efficient reads.
           | 
           | Depending on read patterns. The classic example is address.
           | Sure, you can store an address as column. Name here, city
           | there, street 1 there, street 2 there. How useful is 1/5th of
           | an address, and how often are you pulling it like that? For
           | something like address that you generally read all or none,
           | you generally are better served by a row oriented database.
           | 
           | You also have FKs to kind of do this in a row oriented
           | database. If some part of the data is not read nearly as much
           | as another, it can be a foreign key sitting in another table.
        
             | OskarS wrote:
             | Yeah, exactly: there are tradeoffs to both models, neither
             | is strictly superior. You would never want to do aggregates
             | on addresses anyway, so that advantage is out the door. You
             | do, however, want to very easily index a table of
             | addresses, so you could quickly look them up for a
             | particular user, which a columnar database is (arguably)
             | worse at. BigQuery, in particular, does not use indexes at
             | all.
             | 
             | (EDIT: I guess you do might want to do aggregates on
             | addresses, actually. "How many customers do we have in
             | NYC?", that kinda thing.)
        
               | vidarh wrote:
               | Hybrids are straightforward enough. A "simple" way of
               | achieving that is to support using the indexes to
               | directly answer queries, as quite a few databases do. Now
               | an index on a single column is _also_ a columnar store of
               | the contents of that column, yet you still have the full
               | row to query if you need lots of data from individual
               | rows. A more sophisticated option would be to reduce
               | duplication of column data.
               | 
               | (EDIT: How well a usually row-oriented database optimises
               | this, is another question, and will differ by database)
        
         | michelpp wrote:
         | > I know nothing about J and JSoftware, but this reads like an
         | Aprils fools joke. Is it?
         | 
         | To me it reads as being colored by a very specific tool bias.
         | For example:
         | 
         | >> The key difference between Jd and most other database
         | systems is that Jd comes with a fully integrated and mature
         | programming language.
         | 
         | Most major database systems come with a fully integrated and
         | mature programming language.
         | 
         | >> Row orientation works for small amounts of data.
         | 
         | It works for a _different data access pattern_. Row vs column
         | is a tradeoff spectrum. Data size is just one dimension of the
         | analysis.
         | 
         | >> Row oriented is slavishly tied to the design ideas of filing
         | cabinets and manila folders. Column oriented embraces
         | computers.
         | 
         | Pretty hyperbolic.
        
           | moonchild wrote:
           | > Most major database systems come with a fully integrated
           | and mature programming language.
           | 
           | Like ... pl/[pg]sql? Not exactly a joy to write.
        
             | nostoc wrote:
             | Still is a fully integrated and mature programming language
             | though.
             | 
             | And I do believe you wouldn't have any issue finding people
             | who think the same of J.
        
               | moonchild wrote:
               | How integrated? Most of jd is _written_ in j. It is also
               | expected that the app performing--or at least handling--
               | the queries be written in j.
               | 
               | And regarding maturity--j has libraries, debugger, etc.
        
       | kokizzu2 wrote:
       | so any benchmark against clickhouse?
        
       | simonpure wrote:
       | There's a wonderful podcast about array languages -
       | 
       | https://www.arraycast.com/
       | 
       | Lots of great stories about software engineering besides talking
       | about the different dialects of array languages.
        
       | stefan_ wrote:
       | An example J file because this link doesn't say much:
       | 
       | https://github.com/jsoftware/data_jd/blob/master/csv/csv.ijs
        
         | diarrhea wrote:
         | s=. 0 2}.each _3 _1{<;._2 (;i{ccfiles),'/'
         | 
         | That is... not pretty.
        
           | bryanrasmussen wrote:
           | I would say it is not knowledge leaking, most languages leak
           | knowledge so that if you are not familiar with the language
           | but you do know some other programming languages you can sort
           | of figure out what they do.
           | 
           | But some languages do not leak knowledge in this way.
           | 
           | There is the concept of beauty in programming languages that
           | the expression of an idea should be succinct. This J code
           | might be beautiful, but unsure.
        
             | 0des wrote:
             | You're going too meta. Does it make you happy to write it?
             | Does it fulfill its purpose? If yes, don't worry how it
             | looks, it is fine.
        
               | vidarh wrote:
               | To some of us the "does it make you happy" and how it
               | looks are intrinsically linked.
               | 
               | One of the things that makes me happy is to write
               | beautiful code.
        
           | jollybean wrote:
           | It's meant to be efficient, not pretty.
           | 
           | It's catching on within the AI community because the syntax
           | matches well to the kinds of matrix operations common in that
           | field.
           | 
           | I think Nvidia's next chip is going to have a compiler for
           | jlang.
           | 
           | Also, I think they are starting to use it as an 'entry level'
           | language for kids, you know, like grade school.
        
           | razetime wrote:
           | J isn't really made to be pretty. It's made to be terse and
           | simple to read once given enough learning effort, and it's
           | made to be a consistent keyboard typable notation.
        
         | forgotpwd16 wrote:
         | Tbh that file says even less. It's like in discussion about
         | pandas giving a link to read_csv.py in pandas source.
        
           | jenny91 wrote:
           | It tells me everything I need to know about this language!
        
         | mlochbaum wrote:
         | Here's one of the more central files that ties into how a Jd
         | database is laid out:
         | 
         | https://github.com/jsoftware/data_jd/blob/master/base/common...
         | 
         | Not that I claim anyone in particular can read it of course. Jd
         | uses a hierarchy of folder, database, table, column that's
         | handled with an object system to share code between them. A
         | folder is just a place to put databases and hardly needs to add
         | anything, while the other levels have a lot of extra
         | functionality. As an inverted database, Jd stores each column
         | in a file, and accesses it using memory mapping.
         | 
         | https://github.com/jsoftware/data_jd/blob/master/base/folder...
         | 
         | https://github.com/jsoftware/data_jd/blob/master/base/table....
         | 
         | (I designed this system when I did some of the early work to
         | turn JDB into Jd as a summer intern)
        
       | hnrj95 wrote:
       | are there benchmarks against kdb+ and/or shakti?
        
       | bla3 wrote:
       | Meta comment: I tried to click the link but since the title is so
       | short and my mousing not very precise, I accidentally clicked the
       | upvote arrow. I then clicked "unvote" and tried again, but the
       | same thing happened. The third time round, I managed to click the
       | link.
       | 
       | Takeaway: Very short titles might get you some upvotes from
       | clumsy users :)
        
         | raphaelj wrote:
         | The same happened with me...
        
         | version_five wrote:
         | Yes! I came here to post the same thing - the link didn't work
         | for me, and I inadvertently upvoted as well. I assume the
         | article has merits of it's own, but I do notice a huge ratio of
         | votes to comments, I guess some are inadvertent. It would be
         | interesting to search through other very short titles and look
         | at the ratio of comments to votes vs others above some vote
         | threshold... (I'm on my phone or I'd try) - edit, is there a
         | regex search for HN anywhere? It looks like algolia doesn't
         | support them
        
         | rkalla wrote:
         | lol same and would have never thought to comment on it because
         | obviously user-error, then I saw this and thought twice.
        
       | legalcorrection wrote:
       | I'm intrigued but skeptical of this bit:
       | 
       |  _Jd is a columnar (column oriented) RDBMS.
       | 
       | Most RDBMS systems are row oriented. Ages ago they fell into the
       | trap of thinking of tables as rows (records). You can see how
       | this happened. The end user wants the record that has a first
       | name, last name, license, make, model, color, and date. So a row
       | was the unit of information and rows were stored sequentially on
       | disk. Row orientation works for small amounts of data. But think
       | about what happens when there are lots of rows and the user wants
       | all rows where the license starts with 123 and the color is blue
       | or black. In a naive system the application has to read every
       | single byte of data from the disk. There are lots of bytes and
       | reading from disk is, by orders of magnitude, the slowest part of
       | the performance equation. To answer this simple question all the
       | data had to be read from disk. This is a performance disaster and
       | that is where decades of adding bandages and kludges started.
       | 
       | Jd is columnar so the data is 'fully inverted'. This means all of
       | the license numbers are stored together and sequentially on disk.
       | The same for all the other columns. Think about the earlier query
       | for license and color. Jd gets the license numbers from disk (a
       | tiny fraction of the database) and generates a boolean mask of
       | rows that match. It then gets the color column from disk (another
       | small fraction of the data) and generates a boolean mask of
       | matches and ANDS that with the other mask. It can now directly
       | read just the rows from just the columns that are required in the
       | result. Only a small fraction of the data is read. In J, columns
       | used in queries are likely already in memory and the query runs
       | at ram speed, not the sad and slow disk speed.
       | 
       | Both scenarios above are simplified, but the point is strong and
       | valid. The end user thinks in records, but the work to get those
       | records is best organized by columns.
       | 
       | Row oriented is slavishly tied to the design ideas of filing
       | cabinets and manila folders. Column oriented embraces computers.
       | 
       | A table column is a mapped file._
       | 
       | What's the other side of this argument?
        
         | drc500free wrote:
         | If you're using J, you are probably doing analytics and stats.
         | That means you are looking for patterns in a handful of
         | attributes across a large population - i.e. columnar.
         | 
         | As others have said, row-based makes sense for most OLTP / app
         | databases. You're probably not writing those products in J.
        
         | lostgame wrote:
         | This seems... _highly_ impractical for...90% of operations I 'd
         | be wanting to do with a DBMS.
         | 
         | It kinda seems like a 'different way for different's sake'
         | kinda solution? :/
         | 
         | I understand there must be a minority of operations that can
         | benefit from this, but overall I can't imagine this being
         | popular for most DB operations.
        
           | vidarh wrote:
           | It tend to include a large proportion of the large, expensive
           | reporting queries your business people want to do. Whether or
           | not those kinds of queries dominates for your system will
           | depend greatly on your system.
           | 
           | You also need to reach a certain scale before the choice
           | (either way) will affect you enough to matter.
           | 
           | But when you reach that scale it can be the difference
           | between reporting queries taking seconds vs. hours in some
           | cases.
           | 
           | For some systems you'll end up wanting _both_ , and stream
           | updates from the transaction focused db (row oriented) into a
           | separate reporting database that uses a column store.
        
         | cmrdporcupine wrote:
         | Yeah, it's a simplification and one-sided.
         | 
         | The general consensus as I understand it is: column-oriented
         | indices/storage options are good for OLAP, large scale
         | analytics, bulk data analysis. Row-oriented indices are suited
         | more for OLTP, individual "record processing."
         | 
         | Both are just techniques and there's nothing stopping a single
         | db product from offering both.
        
           | Semaphor wrote:
           | > Both are just techniques and there's nothing stopping a
           | single db product from offering both.
           | 
           | e.g. for MS SQL there are columnstore indexes.
        
           | iamwil wrote:
           | Wouldn't column stores be better for the cache?
        
             | cmrdporcupine wrote:
             | I think the answer to that is just: it depends.
             | 
             | Again comes back to usage patterns. Yes, if you're doing
             | aggregation operations on a small number of columns then I
             | expect locality of reference could be better with a column-
             | store, rather than thrashing through row-retrievals one
             | after another (and then just throwing them away after
             | aggregating).
             | 
             | But if you're frequently doing "look up this customer and
             | others like them" and then using the bulk of the
             | information there? I'd expect better cache behaviour out of
             | row oriented storage.
             | 
             | But these days it's so unclear what's happening inside the
             | actual "black box" that is our hardware that it's hard to
             | make generalizations.
        
         | tormeh wrote:
         | It all depends on access pattern. Do you tend to select entire
         | rows? Use a row-oriented DB. Do you tend to select entire
         | columns? A column-oriented database might be for you. That's
         | it, really. None of the designs are superior, afaik.
        
           | hoosieree wrote:
           | Just to add, because J is an array-oriented language, it
           | makes some kinds of column-oriented access patterns easier.
           | 
           | For example, it's trivial to sort one array by the values of
           | another array:                   x /: y
           | 
           | To me, it's much easier to read than the equivalent in NumPy:
           | x[np.argsort(y)]
           | 
           | Or get pairs of (unique value; count) from an array using the
           | key operator (/.):                   (~.;#)/.~ y
           | 
           | Column db's make sense for array-oriented languages, because
           | there's much less of a mismatch compared to OOP with
           | relational.
        
             | legalcorrection wrote:
             | All of that syntax is awful. Why not just x.sortBy(y) ? Did
             | all of the advances in software legibility fail to make
             | their way to the modern scientific computing world?
        
               | moonchild wrote:
               | https://www.jsoftware.com/papers/tot.htm
        
               | avmich wrote:
               | Hyperbolically, because you don't write math with
               | variables in camel case.
               | 
               | J traces its roots from a notation for math, used on
               | whiteboards. That awful syntax you see - it's the same as
               | in some formulas in, say, general relativity, only J is
               | Turing complete and not a Turing tarpit. When you work on
               | a formula, in case of J you have ability to execute it,
               | and if you see it's wrong you can update the formula and
               | try again. This could also be done in other languages,
               | but in J (I mean, APL family of languages) it's more
               | focused.
               | 
               | In defense of J, I had a professional example of a
               | problem which wasn't clearly specified, which needed some
               | experimentation - that took, if I remember correctly,
               | some 45 minutes of attempts in J, and then the prototype
               | was re-written in C#, when if was already producing
               | desired outcomes. Rewriting took somewhat longer.
        
           | moonchild wrote:
           | You might be selecting entire rows, but you are probably not
           | selecting _all_ of the rows, and your selection criteria
           | probably do not depend on all of the columns.
        
             | WJW wrote:
             | Yeah, row-oriented is good for WHERE queries and column-
             | oriented is good for SUM (or other aggregation) queries.
        
         | [deleted]
        
         | rileyphone wrote:
         | A lot of modern, data-oriented ECS frameworks for game dev
         | follow a similar philosophy, wherein components are stored in
         | linear collections that optimize memory layout for caches and
         | parallelism. Given how rarely you need 'SELECT *' this makes
         | sense for a relational DB as well, though modern SQL DBs have a
         | lot of sweat put in to their performance.
        
         | giaour wrote:
         | In many OLTP systems, almost all work operates on multiple
         | attributes of a single record. E.g., when logging a user in, an
         | authentication system cares about multiple attributes of a
         | single user record, not facts about the aggregate pool of
         | users.
         | 
         | Column oriented stores are extremely efficient for aggregate
         | queries, but they make writes and single-row reads more
         | expensive and are thus not suitable for every workload. There's
         | an excellent overview in Martin Kleppmann's Designing Data
         | Intensive Applications.
        
       | 0des wrote:
       | > Early adopters of Jd are assumed to have a J background and
       | documentation and tutorials depend on that background.
       | 
       | All 12 of us are jumping up and down saying "it's our time,
       | finally the day has come"
        
         | [deleted]
        
         | jdshupe wrote:
         | Seeing J at the top was indeed a jump up and down moment.
        
           | 0des wrote:
           | We should have a secret handshake or some type of insignia to
           | better signal to our peers. I've tried draping a J colored
           | kerchief out of my back pocket but the results so far are not
           | great, it appears there is more anti-J sentiment than I'd
           | imagined, as I get harassed unduely in certain areas of town.
           | May have to switch to maybe a hand gesture based signaling
           | that can be done on the fly to signal allegiance.
        
             | recuter wrote:
             | https://www.atlasobscura.com/articles/hobo-code
        
       | plibither8 wrote:
       | 763 (and counting) votes, no. 1 on the frontpage, ...and only 83
       | comments? This is one of the most skewed ratios I've seen on HN.
        
         | upwardbound wrote:
         | bla3 figured out that it's because the link text "Jd" is so
         | short, and people are clicking the upvote button by mistake.
         | https://news.ycombinator.com/item?id=30906989
        
       | marcodiego wrote:
       | "Jd source is largely J code and that code is open and available
       | to licensed users."
       | 
       | License?
        
         | anonu wrote:
         | It's sort of misleading because J is closed source
        
           | 0des wrote:
           | Curious how you arrived at this conclusion
        
           | moonchild wrote:
           | J is fully opensource: https://github.com/jsoftware/jsource
           | 
           | Most of jd's source is publicly available:
           | https://github.com/jsoftware/data_jd
        
             | SparkyMcUnicorn wrote:
             | Is it?
             | 
             | https://github.com/jsoftware/jsource/blob/master/license.tx
             | t
        
               | moonchild wrote:
               | > J SOURCE can be used under a commercial license from
               | Jsoftware, in which case the terms and conditions of that
               | license apply.
               | 
               | > OR
               | 
               | > J Source can be used under GNU General Public License
               | version 3, in which case the terms and conditions of that
               | license apply.
               | 
               | Seems pretty clear to me.
        
               | Shared404 wrote:
               | As a side note, I really love the choice to dual license,
               | and wish it were offered more often.
        
               | jenny91 wrote:
               | It's extremely common: license it under GPL/AGPL or some
               | other very copyleft license; get contributors to sign a
               | CLA, then offer the library with hefty license fees for
               | non-FOSS projects.
        
               | misnome wrote:
               | Because it's commercially available without GPL?
        
       | jollybean wrote:
       | I wonder when Java or Swift will finally get around to adopting
       | 'self effacing references'. It's 2022.
       | 
       | [1]
       | https://code.jsoftware.com/wiki/Vocabulary/SpecialCombinatio...
        
       | anonu wrote:
       | Jd has been around for a while. Buy is it production ready?
       | 
       | I'm still looking for an open source replacement to kdb, that
       | matches kdb's speed and featureset.
        
         | kokizzu2 wrote:
         | clickhouse '__')
        
           | swasheck wrote:
           | i really dislike clickhouse for anything less than
           | rudimentary analysis, but appreciate that it's fast for that.
        
             | nimrody wrote:
             | Can you give some tips on what do you mean by "less than
             | rudimentary analysis"? Considering adopting Clickhouse and
             | wondering whether we will encounter problems down the road.
        
               | swasheck wrote:
               | the biggie for me was that analytic window functions are
               | either non-existent or experimental and must be achieved
               | with array function hacks.
               | 
               | it does have nice built-in skew and kurtosis functions,
               | though.
        
         | yiyus wrote:
         | Jd is not open source.
        
         | vmchale wrote:
         | Nothing matches kdb's speed except GPU-accelerated DBs:
         | https://tech.marksblogg.com/benchmarks.html
        
         | ZeroCool2u wrote:
         | Yeah, KDB is... Not super fun. I've been looking at TimeScaleDB
         | recently, because it's just a PostgreSQL plug-in it seems nice
         | and simple, but I haven't actually compared them directly yet.
        
           | LoriP wrote:
           | If you want some intro info - and you may have found it
           | already - the YouTube channel is a great place to start for
           | TimescaleDB youtube.com/TimescaleDB (for tranparency: I work
           | for Timescale...)
        
           | mritchie712 wrote:
           | If you were looking at pg because you need:
           | 
           | - open source
           | 
           | - SQL based
           | 
           | - analytics data warehouse
           | 
           | Then check out Clickhouse. I've been really happy with it and
           | it checks all those boxes.
           | 
           | ps - if you're interested in working with clickhouse and open
           | source data tools, I'm hiring: mike@luabase.com
        
       | nathan_compton wrote:
       | I've programmed in J professionally (admittedly not for all that
       | long) as a data scientist and, coincidentally, have just
       | completed a small analysis using J as part of an internal
       | workshop about data analysis I am planning. I typically work in R
       | and Python and I have to say that at this stage there is almost
       | no reason I would pick up J to do any work. Unless code-golf
       | level conciseness is your only goal, these other platforms offer
       | superior performance, clarity, ease of use, access to libraries
       | and are, as programming languages, substantially better designed.
       | 
       | I say this as a great lover of function-level programming and as
       | a J enthusiast. I would say I am quite familiar with J's
       | programming paradigm and conceptual widgets and doodads (I know
       | the verbs, nouns, adverbs and conjunctions and can use them
       | appropriately). I even remembered a pretty good portion of the
       | Nuvoc. But doing even the simplest analysis in J was
       | _excrutiatingly_ slow and inconvenient compared to using R and
       | the tidyverse (in particular, I missed dplyr and ggplot). The
       | tidyverse CSV readers are, for example, much faster and smarter
       | and more convenient and informative than anything you'll get from
       | the J universe.
       | 
       | I love vector languages but at this point J can't compete with
       | the major platforms for data analysis. Its less convenient, often
       | _slower_, much more low level, strange, and its library situation
       | is anemic at best. I recommend learning J because it will expand
       | your mind, but I can't imagine picking it up for real work.
        
         | moonchild wrote:
         | The ecosystem problems are genuine. Though I do not think they
         | are so great as you make them out to be. But with respect to
         | semantics, numpy et al are but pale imitations. With respect to
         | syntax, too (https://www.jsoftware.com/papers/tot.htm).
        
           | nathan_compton wrote:
           | I sort of agree with you, especially about numpy. Nothing in
           | the data science space in Python feels right to me. But you
           | can't beat the network effects. Its still easier to actually
           | do data analysis in Python than in J.
        
           | user3939382 wrote:
           | > I do not think they are so great as you make them out to be
           | 
           | There's a dynamic with ecosystem problems I believe applies
           | to all languages. You only need one missing or bad library
           | that's critical to your project to make the whole language
           | useless.
           | 
           | An anecdotal example: I remember many years ago trying to
           | give Python a go and within 15 minutes ran into a problem
           | parsing XML. A search revealed this was a known issue that
           | was being worked on with the foremost tool in Python for this
           | job. You couldn't have credibly argued that Python had an
           | ecosystem problem even at the time, but for me in that
           | particular scenario Python had a show-stopping ecosystem
           | problem. There were ways around this, but the most convenient
           | way around it at the time was switching back to a more
           | familiar language.
           | 
           | My greater point is that, we can definitely make
           | generalizations about a language's ecosystem health, but keep
           | in mind there is a very context-sensitive, practical
           | dimension to that type of language assessment.
        
             | moonchild wrote:
             | > You only need one missing or bad library that's critical
             | to your project to make the whole language useless
             | 
             | ...no? If there is functionality I need, and no library
             | implements it, I will implement it myself. That goes for
             | any language. Otherwise, the job of a programmer would
             | simply be to string together existing libraries, not
             | writing anything meaningful.
        
               | user3939382 wrote:
               | > I will implement it myself
               | 
               | Are you saying in the scenario described, your solution
               | would have been to write an XML parser from scratch?
        
               | moonchild wrote:
               | If I need one, and I cannot find one, then yes.
        
               | mlochbaum wrote:
               | It's not ideal, but I've done this in BQN and it took
               | about 15 lines. I didn't need to handle comments or
               | escapes, which would add a little complexity. See
               | functions ParseXml and ParseAttr here: https://github.com
               | /mlochbaum/Singeli/blob/master/data/iintri...
               | 
               | XML is particularly simple though, dealing with something
               | like JPEG would be an entirely different experience.
        
               | RexM wrote:
               | Yeah.
               | 
               | It can't be that hard.(tm)
        
               | recuter wrote:
               | The job of a programmer is to glue together existing
               | libraries in the most convoluted manner possible and
               | collect rent on maintenance. Perhaps even graduate to
               | consulting. Grow a pointy haircut.
               | 
               | Who the hell wants to be a programmer, dismal profession.
        
         | VHRanger wrote:
         | The fact there are vector languages in subsets of python
         | (numpy, pandas, etc.) and R.
         | 
         | And these already have great large columnar dataset support
         | (eg. Apache Arrow)
         | 
         | And an open source community intent on developing and
         | maintaining the ecosystem.
        
           | nathan_compton wrote:
           | One of the nicest thing about J is the notion of verb rank.
           | For non-J-programmers, you can apply a rank to a verb and
           | this effects how the verb operates on its vector operands. A
           | rank of zero means "operate on the entire object" whereas a
           | rank of 1 means "operate on the (1) elements of the operands.
           | Other ranks change the meaning of what counts as "an
           | element."
           | 
           | However, like most things in J, support for this excellent
           | idea (which eliminates the need for most looping constructs
           | and can be very performant) is irregular: it is limited to
           | monadic and dyadic verbs. Nothing about verb rank forbids
           | functions which accept more than two arguments, but the idea
           | of a function which accepts more than 2 arguments is poorly
           | supported in J (the idiom is to pass a boxed array to a
           | monad, but the boxing of the items to be passed makes
           | supporting rank behavior for the "arguments" impossible or
           | absurdly complicated.
           | 
           | Other beefs with J: J doesn't have first class functions as
           | such. While you can represent functions as "nouns" in a few
           | ways, you cannot have (for example) an anonymous reference to
           | a function as a thing unto itself (you may denote a verb
           | tacitly in a context where you need a verb, however, but this
           | is not the same thing). If you want to pass around verbs in a
           | way familiar to you as a contemporary programmer you have to
           | use "adverbs" and "conjunctions" which are just higher order
           | functions which (more or less) return verbs. But adverbs and
           | conjunctions have their own peculiarities and restrictions
           | (not the least of which is that they are not themselves verbs
           | or nouns and thus cannot be passed around either). In
           | contemporary programming languages the
           | verb/adverb/conjunction space would just be represented by
           | "functions" and to great effect. As a functional programmer
           | and Lisp guy, I find the limitations on "verbs" very
           | frustrating in J.
           | 
           | J's error messages are also bad, never more than a few words.
           | 
           | There are some great ideas in the language, but it feels very
           | old-fashioned and out of touch.
           | 
           | What I would like to see is a "array scheme." A lexically
           | scoped Scheme-like language where every object is an array
           | and function argument slots can be independently "ranked" to
           | support the elimination of loops over array arguments. I'm
           | too busy to put this together, but it would be great to have
           | if you wanted to fiddle with arrays for some reason but could
           | do without any library support for actually doing data
           | analysis.
        
         | beagle3 wrote:
         | I haven't used R recently (10 years or so), but when I did, the
         | speed with which K/kdb+ could scan through and summarize
         | terabytes of data was orders of magnitude faster than R or any
         | other system. Once the data was summarized into (say) a
         | gigabyte or so, analyzing it with R or even Python was much
         | easier thanks to the ecosystem and reasonable time (probably
         | 10-100 times slower, but the time saved by using well tested
         | stat code is more than worth it)
        
         | 0des wrote:
         | > substantially better designed
         | 
         | Hey Siri, please remove Nathan_Compton from the Christmas card
         | list.
        
         | recuter wrote:
         | Thank you for this.
         | 
         | What do you make of BQN? https://aplwiki.com/wiki/BQN
         | 
         | I get enamored with apl/k/j every time I see it and was looking
         | for excuses to use it despite everything.
         | 
         | I understand that due to the much smaller community the tooling
         | and ecosystem is much weaker but there must be a reason why
         | some people keep reaching for it, especially the guys in
         | finance. I don't get the Cobol vibes from it like it is some
         | sort of legacy burden. While the use case is narrow there must
         | be an edge.
         | 
         | This is HN after all. You wouldn't tell people not to mess with
         | lisp and just reach for python now would you? *puppy eyes
         | stare*
        
           | all2 wrote:
           | When I want to just "Get stuff done" TM, I reach for Python.
           | Except that I've stopped doing that because setting up
           | package versioning and venvs is a nightmare that gets more
           | frustrating every time I try to do it.
           | 
           | Now I'm looking for a "better" TM way to get my scripting
           | needs met. I'm looking at Nim, specifically. I may also try
           | to lean on a Scheme or a Lisp. My problem with the latter is
           | lack of decent docs for getting stuff done. Maybe I'm missing
           | something, but being productive in those languages for me is
           | like a high jump when I can't even step up on a curb.
        
             | jrapdx3 wrote:
             | Some Scheme/Lisp implementations are capable enough to
             | accomplish daily work. Common Lisp is one option, and I've
             | used Chicken Scheme effectively for some projects.
             | 
             | You're right though, there's a significant learning curve
             | with any language in a different paradigm. Forth-like
             | languages are an example, and yeah, J/K and cousins are
             | hard to grasp. I've dabbled in these but never quite got
             | there.
             | 
             | IMO Lisp-like languages aren't quite as "foreign" since the
             | syntax is a variation on 'function parameters body' used in
             | "normal" (Algol-like) languages. I guess it comes down to
             | what we get used to, and really for many purposes choice of
             | language isn't all that critical, assuming of course it
             | supports the task at hand.
        
             | rscho wrote:
             | Racket has been rated as 'an acceptable python' by a famous
             | programmer. Well deserved, I think.
        
             | beagle3 wrote:
             | Nimpy makes it possible to move from Python to Nim
             | gradually. It's magical, and while it doesn't solve
             | python's own venv problems, it would only need the DLL from
             | Python - whether it was 2.5 or 3.4 or 3.8, it would just
             | work - they probably removed the python2 support by now,
             | but it was just magic.
        
           | nathan_compton wrote:
           | J feels a lot like Smalltalk and Lisp to me. If you got on
           | board early, you could do all sorts of stuff other languages
           | struggled to make easy and performant. Hence the set of
           | dedicated users. And there are some genuinely interesting
           | conceptual things going on in array languages which have real
           | appeal. But in the end I think J reflects a previous era and
           | hasn't caught up to really useful ideas in more contemporary
           | languages, probably because its user base is too
           | conservative.
           | 
           | I wouldn't recommend people use XLisp or run Genera in a VM
           | to solve real problems. Recommending J feels like that to me.
        
             | recuter wrote:
             | I see your point. You dream crushing bastard. :)
             | 
             | For no reason whatsoever here is a link to a guy building a
             | Korean style wooden house by hand without using nails:
             | https://www.youtube.com/watch?v=hvsvMzgiq6s
             | 
             | What are "real problems" anyway?
             | 
             | Sigh. You're right, I know you're right. Somehow this field
             | is losing appeal over time. I'm going for a walk.
        
               | mlochbaum wrote:
               | > Somehow this field is losing appeal over time.
               | 
               | Not true, at all! Since 2010 or so, the APL family has
               | only improved its reputation and grown in popularity. I
               | listed some developments of the past two years at
               | https://news.ycombinator.com/item?id=28930064. Now, it's
               | not much relative to the huge growth of array frameworks
               | like TensorFlow with more mainstream language design, but
               | it is definitely not losing appeal.
        
               | recuter wrote:
               | Oh no, Marshall, I was being far more despondent and was
               | referring to programming as a whole. Thank you very much
               | for your efforts on BQN.
               | 
               | Speaking of TensorFlow, I was looking at tinygrad the
               | other day: https://github.com/geohot/tinygrad/blob/master
               | /tinygrad/tens...
               | 
               | Very tempted to port it to BQN. I could be wrong but I
               | bet it would shine for that. You could print the whole
               | thing on a t-shirt.
        
               | mlochbaum wrote:
               | Oh, thanks for clarifying, since it occurred to me that
               | you might mean just the appeal to you, but not that you
               | meant the field of programming! I'm no NN expert, but
               | tinygrad looks very approachable in BQN. You might be
               | interested in some other initial work along those lines:
               | https://github.com/loovjo/BQN-autograd with automatic
               | differentiation, and the smaller
               | https://github.com/bddean/BQNprop using backprop.
        
               | hvs wrote:
               | TBF, that guy isn't doing it for the fun of it (OK,
               | partly for the fun of it) but because Mr. Chickadee is a
               | content creator. Sure, it's a lifestyle choice, but he
               | also makes his living do it. I love his channel, but his
               | lifestyle is as much a product of our modern world as the
               | Java programming language is.
        
             | moonchild wrote:
             | > I wouldn't recommend people use XLisp or run Genera in a
             | VM to solve real problems. Recommending J feels like that
             | to me.
             | 
             | Genera and interlisp are great. I wouldn't deploy them
             | because:
             | 
             | 1) slow
             | 
             | 2) no multithreading
             | 
             | 3) incompatible with modern cls
             | 
             | Point 3 is being worked on (for genera at least, and
             | possibly also for interlisp). But none of these seems
             | significant wrt j.
        
           | jonahx wrote:
           | >I get enamored with apl/k/j every time I see it and was
           | looking for excuses to use it despite everything.
           | 
           | You should do it. Nothing in my programming career has
           | changed the way I thought so much as learning J to the point
           | of real fluency. Though you could swap out APL, k, or BQN for
           | the same effect.
        
         | agumonkey wrote:
         | How do you feel about the J/APL syntax in live coding sessions
         | ? does it help iterating a bit faster than R/python ? or was it
         | a totally irrelevant aspect ?
        
       ___________________________________________________________________
       (page generated 2022-04-04 23:00 UTC)