[HN Gopher] Harlequin: DuckDB IDE for the terminal
       ___________________________________________________________________
        
       Harlequin: DuckDB IDE for the terminal
        
       Author : billowycoat
       Score  : 138 points
       Date   : 2023-09-20 19:02 UTC (3 hours ago)
        
 (HTM) web link (harlequin.sh)
 (TXT) w3m dump (harlequin.sh)
        
       | pjot wrote:
       | This looks great! I've been using Motherduck for a while now,
       | glad to see more things being built with it in mind.
        
       | NortySpock wrote:
       | This looks super handy, I will definitely take a moment to try it
       | out.
       | 
       | For those asking what DuckDB is: columnstore databases like
       | DuckDB may be slower at data ingestion, but are very quick at
       | multi-GB sums, counts, and aggregations.
        
       | tschellenbach wrote:
       | what are the benefits of DuckDB?
        
         | maxmcd wrote:
         | it's like a columnar store sqlite, better performance for
         | analytical data
        
         | pacbard wrote:
         | It's similar to sqlite but can open multiple file formats.
        
         | [deleted]
        
         | hobs wrote:
         | I like it for writing easy sql locally transforming data frames
         | without having to think too hard about it.
        
       | DesiLurker wrote:
       | I need this but for C++, I kinda miss the old borland turbo C++
       | ide. mostly what I care about is terminal access and code
       | navigation but other than some vim + plugins monstrosity I cant
       | find any.
        
         | whobre wrote:
         | There's Motor, but I don't know if the project is still being
         | maintained.
         | 
         | https://github.com/rofl0r/motor
        
       | jamestimmins wrote:
       | Absolutely love the logo. I'd like to see more projects/startups
       | choosing logos/themes with this level of personality.
       | 
       | Project looks rad too, but I'm just here to appreciatively bike
       | shed.
        
       | tconbeer wrote:
       | Hi everyone! I made this. Tried posting it to Show yesterday,
       | glad this thread is getting more momentum!
       | 
       | For the past four months I've been working (part-time, this is
       | OSS after all) on Harlequin, a SQL IDE for DuckDB that runs in
       | your terminal. I built this because I work in Data, and I found
       | myself often reaching for the DuckDB CLI to quickly query CSV or
       | Parquet data, but then hitting a wall when using the DuckDB CLI
       | as my queries got more complex and my result sets got larger.
       | 
       | Harlequin is a drop-in replacement for the DuckDB CLI that runs
       | in any terminal (even over SSH), but adds a browsable data
       | catalog, full-powered text editor (with multiple buffer support),
       | and a scrollable results viewer that can display thousands of
       | records.
       | 
       | Harlequin is written in Python, using the Textual framework. It's
       | licensed under MIT.
       | 
       | Yesterday I released v1.0.0: you can try it out with `pip install
       | harlequin`, or visit https://harlequin.sh for docs and other
       | info.
        
         | ayhanfuat wrote:
         | Lovely tool. I'll certainly try it out. The code fragments in
         | the documentation seem to be not selectable, though (or maybe
         | it is not highlighting correctly).
        
           | tconbeer wrote:
           | it's an issue with the highlight color. I'll get that fixed
           | shortly.
        
       | quadrature wrote:
       | If anyone here is using DuckDB in production i'd love to hear
       | what your stack looks like over the entire lifecycle of
       | extract->transform->load.
        
         | 0cf8612b2e1e wrote:
         | On a similar point, are people using the actual duck database
         | format or sticking with Parquet? I love everything about
         | DuckDB, but I feel more comfortable keeping things in an
         | existing format.
         | 
         | My only work with it to date has been to load-analyze-
         | usuallydelete to refresh, so I do not require any db
         | mutability. Outside of mutability, not sure if there are any
         | obvious wins with the format.
        
           | tconbeer wrote:
           | It's a bit faster and easier than parquet, but right now the
           | format is unstable, which is a huge downside and makes it
           | unsuitable for medium/long-term storage. After DuckDB v1,
           | they'll keep the format stable and then I think its
           | popularity will increase dramatically.
        
         | thenipper wrote:
         | I've been using it for taking output from our data validation
         | steps and bundling that up with the data that was validated
         | into one neat artifact we can download if there is an issue and
         | explore manually.
        
         | zlurker wrote:
         | We orchestrate our ETL pipelines with dagster. We only use
         | duckdb in a few of them but are slowly replacing pandas etls
         | with it. For some of our bigger jobs we use spark instead.
         | 
         | Essentially it's: 1. Data sources from places such as s3, sftp,
         | rds 2. Use duckdb to load most of these with only extensions (I
         | dont believe there's one for sftp, so we just have some python
         | code to pull the files out.) 3. transform the data however we'd
         | like with duckdb. 4. convert the duckdb table to pyarrow 5.
         | Save to s3 with delta-rs
         | 
         | FWIW, we also have this all execute externally from our
         | orchestration on an EC2 instance. This allows us to scale
         | vertically.
        
           | quadrature wrote:
           | This is very cool!.
           | 
           | Last time I checked duckdb didn't have the concept of a
           | metastore so do you have an internal convention for table
           | locations and folder structure ?.
           | 
           | What do you use for reports/visualizations? notebooks ?.
        
             | zlurker wrote:
             | Yeah, dagster has a concept of metadata and assets so we
             | have some code that'll map dagster's own logical
             | representation to physical s3 locations.
             | 
             | Reports and viz varies a lot, the finance department uses
             | tableau where as for more 'data sciencey' stuff we normally
             | just use notebooks.
        
         | tconbeer wrote:
         | It's great as: 1. An ephemeral processing engine. For example,
         | I have a machine learning pipeline where I load data into a
         | DataFrame, and then I can use DuckDB to execute SQL on my
         | DataFrame (I prefer both the syntax and performance to Pandas).
         | 2. A data lake processing engine. DuckDB makes it very easy to
         | interact with partitioned files. 3. A lightweight datastore. I
         | have one ETL pipeline where I need to cache the data if an API
         | is unavailable. I just write the DataFrame to a DuckDB database
         | that is on a mounted network filesystem, and read it back when
         | I need it.
        
       | zokier wrote:
       | I'm not in love with this style of UI design in terminals:
       | 
       | https://harlequin.sh/_app/immutable/assets/export.a0e81d27.p...
       | 
       | Every item in the form takes 4 lines (I think?) whereas in more
       | traditional curses UI they would be packed to one line per item,
       | the scrollbar could have easily been avoided here. Smaller
       | nitpick but that style of toggle switches is also form over
       | function; I'd find traditional [X] far more clear/less ambiguous
        
         | tconbeer wrote:
         | fair feedback!
        
       | Scarbutt wrote:
       | Is there something like this for XML?
        
       ___________________________________________________________________
       (page generated 2023-09-20 23:00 UTC)