[HN Gopher] A Jupyter Kernel for SQLite
       ___________________________________________________________________
        
       A Jupyter Kernel for SQLite
        
       Author : Tomte
       Score  : 201 points
       Date   : 2020-06-16 14:31 UTC (8 hours ago)
        
 (HTM) web link (blog.jupyter.org)
 (TXT) w3m dump (blog.jupyter.org)
        
       | seemslegit wrote:
       | Cool hack, but probably wouldn't install a separate kernel or run
       | an entire notebook just for sqlite work.
       | 
       | What would be useful is for the kernel of your language of choice
       | to provide a magic for sqlite and return results of queries in
       | language-native data structures.
       | 
       | Something like this: https://pypi.org/project/ipython-sql/
        
         | santafen wrote:
         | I like this bit a lot. Might have to extend it for QuestDB. I
         | did a Notebook for QuestDB earlier this week, but having this
         | part would be great!
        
         | seemslegit wrote:
         | Alternatively: a new type of cell in jupyter itself alongside
         | 'code' and 'markdown' for sql work available regardless of
         | kernel choice.
        
           | pletnes wrote:
           | There's already the %%bash cell magic, which does that for
           | bash.
        
           | jedimastert wrote:
           | What's the distinction between "SQL" and "code"?
        
             | seemslegit wrote:
             | 'code' would be the code of whatever kernel is in use and
             | 'sql' would a notebook-level feature like 'markdown' is
             | today.
        
           | 3l3ktr4 wrote:
           | Yeah, I had this idea at first, but it's not trivial to run
           | more than one kernel at the same time. I know some examples
           | of people who did it, and some projects that I could plug in
           | and try to make it work with my kernel, but this was a first
           | version to see how the community responds! Let's see, might
           | do something like it in the future. I think it'd be really
           | cool and powerful to make it interact with Python for
           | example.
        
         | thom wrote:
         | org-babel in Emacs has some level of support for passing data
         | between different languages in different cells etc.
         | 
         | https://orgmode.org/worg/org-contrib/babel/intro.html#meta-p...
        
         | TallGuyShort wrote:
         | I wonder if this is closer to what you're suggesting, as it
         | already allows sharing of simple data structures between cells
         | of distinct languages: https://polynote.org/. One cell: SQLite
         | kernel to get data. Next cell: operates on that data.
        
       | gepoch wrote:
       | Excellent! I usually just type up my SQL in triple quotes and
       | execute it from python, which gives me a little bit of templating
       | too. Excited to give this a try (and start using more views.)
        
       | amasad wrote:
       | This is very cool. SQL is a great interactive language, we added
       | it to repl.it too: https://repl.it/languages/sqlite
        
         | 3l3ktr4 wrote:
         | Oh my god! That's so cool! Thanks for sharing it!
        
       | aghillo wrote:
       | I like this. Previously I've used notebooks to explain a data
       | pipeline from different perspectives. One notebook showing the
       | ETL process going from raw data to RDF triples in a store; one
       | SPARQL notebook showing the raw queries; and then a final
       | decision support notebook using a Python binding to the
       | underlying SPARQL query library. It seemed to work well.
        
       | justinclift wrote:
       | Interesting. Wonder if it'd be possible to embed it for
       | visualisation in a Go web app?
       | 
       | Was adding basic online chart capabilities to our SQLite
       | publishing website a few weeks ago (eg:
       | 
       | https://dbhub.io/vis/justinclift/Marine%20Litter%20Survey%20... )
       | 
       | But if people could do Jupyter notebooks and visualise them like
       | this too, that could be useful.
       | 
       | Hmmm, should probably set up some kind of survey on our website
       | to ask... :)
        
         | 3l3ktr4 wrote:
         | Hit me up on https://gitter.im/QuantStack/Lobby if you think I
         | can be useful. I'm @marimeireles there and on Github too! :)
        
       | Keyframe wrote:
       | How's this different from let's say BeakerX or Zeppelin?
        
       | reallymental wrote:
       | Incredible, but wasn't this kind of available though psycopg2
       | (postgres connector), SQLAlchemy or any other kind of database
       | connection library ?
       | 
       | I realise the difference between the kernels altogether, but how
       | is one better than the other?
        
         | jedimastert wrote:
         | Weird as it might sound, I can imagine a scenario where I
         | wouldn't need anything past SQL in some form. I've got plenty
         | of python scripts sitting around my hard drive that are just
         | fronts for data manipulation. Especially with the cvs import
         | mode, I could see myself using just sqlite or whatever
        
         | yellowapple wrote:
         | Well, as someone who strongly dislikes Python and strongly
         | likes SQLite, this might give me a reason to willingly use
         | Jupyter (granted, I guess I could do it through Julia, too,
         | which I also strongly like, but... eh).
         | 
         | A lot of the time when I need to work with a bunch of data I'll
         | fire up the SQLite CLI, snarf the data with the built-in CSV
         | import commands, and do whatever queries I wanna do. This looks
         | like it can provide a nicer UI around that workflow, and I'm
         | all for it.
         | 
         | Datasette looks interesting for this, too; I'll have to give
         | that a look.
        
         | mritchie712 wrote:
         | Yes and when paired with Pandas `read_sql`[0] it's easy to
         | quickly query, view and plot.
         | 
         | There's also pandasql[1] which is based on SQLite. We use this
         | behind the scenes at SeekWell[2] to power cross database /
         | cross source joins.
         | 
         | 0 - https://pandas.pydata.org/pandas-
         | docs/stable/reference/api/p...
         | 
         | 1 - https://pypi.org/project/pandasql/
         | 
         | 2 - https://seekwell.io/
        
         | tejtm wrote:
         | PSA: SQLite is embedded as a standard builtin python library.
         | 
         | no pip install anything just import sqlite
         | 
         | https://docs.python.org/3.7/library/sqlite3.html
         | 
         | But I will be trying the Jupyter kernel in the article
        
         | dataminded wrote:
         | Not really. You could use the python kernel and embed your SQL
         | code within your python code but you couldn't run cells with
         | just SQL. It made for a very poor analyst experience.
         | 
         | Microsoft got it right with Azure Data Studio.
        
           | yellowapple wrote:
           | Does ADS support SQLite? I already use it for SQL Server, and
           | I'd tried the PostgreSQL add-on but couldn't get it to work
           | for some reason (but that was when it was first released so
           | it might've stabilized since then).
        
             | dataminded wrote:
             | No. I've seen success with Microsoft SQL and PostgreSQL.
             | 
             | I've also experienced instability issues, it needs that VS
             | Code level polish.
        
       | llimllib wrote:
       | If you're interested in this, you might be interested in
       | Datasette: https://datasette.readthedocs.io/en/stable/
       | 
       | Which seems to me to be farther along in providing advanced
       | querying/faceting/visualization/sharing capabilities on top of
       | sqlite.
       | 
       | (I love jupyter, and this kernel seems neat; not trying to throw
       | stones at anybody, just to link a project in a similar domain)
        
         | seemslegit wrote:
         | Having to rerun the query and reload the page just to change
         | the sort order on a small dataset is very 1997
        
           | simonw wrote:
           | You gotta re-run the query or you're limited to sorting just
           | the visible results.
           | 
           | I guess I could let it spot when there are less than the
           | page-size of results, but then I'd need to be confident that
           | the JavaScript sorting algorithm exactly matches the
           | underlying SQLite sorting algorithm - taking into account
           | character sets and custom collations and suchlike.
           | 
           | I've been deliberately avoiding adding any JavaScript to core
           | Datasette almost to make a point: you don't need it. HTML
           | pages load faster than so-called "modern" SPA monstrosities.
           | They work really well. They don't require a bunch of extra
           | steps to avoid breaking the back button etc.
           | 
           | But... Datasette provides a JSON API for everything, and
           | supports plugins. There's nothing to stop someone who really
           | wants no-page-reload query execution from implementing it as
           | a plugin.
           | 
           | I myself have built JavaScript plugins for things like bar
           | charts and line charts ( https://github.com/simonw/datasette-
           | vega ) and map visualizations (
           | https://github.com/simonw/datasette-cluster-map )
           | 
           | So yeah, my personal bias here is that building websites like
           | we did in 1997 is deeply undervalued.
        
             | seemslegit wrote:
             | The UI needs to be such so as to make the users understand
             | that by clicking a column header they would be sorting the
             | results of a previous query rather than re-running it with
             | a different sort column and that the two are not generally
             | interchangeable, likewise with instant in-results search.
             | 
             | At least with the server being on the internet and the
             | example setup I disagree that the current way is faster or
             | even near a modern webui
        
               | simonw wrote:
               | Can you think of a way I could make that clear in the UI
               | ("this will sort all of the data including the rows you
               | can't see" v.s. "this will sort the rows that are visible
               | to you right now")?
        
               | seemslegit wrote:
               | I think this is the natural expectation of anyone
               | familiar with spreadsheets, reporting tools or data grids
               | with sortable column headers, but a message along the
               | lines of
               | 
               | "Sorting results of the previous query, click 'Run SQL'
               | button again to query with ${column_name} as the ORDER BY
               | column"
               | 
               | and a "got it" hyperlink to be stored in your session
               | state store of choice upon first click on a column header
               | would probably make it clear.
        
         | 3l3ktr4 wrote:
         | None taken! This is a much more developed project indeed.
         | 
         | I think the SQLite kernel for Jupyter has a nice foreseeable
         | future with graph visualizations for the query results and the
         | whole integration with conda, mamba and pip environments,
         | though.
        
       | crazygringo wrote:
       | Wow. It never even occurred to me that this was missing from
       | Jupyter -- but in hindsight seems _so_ obvious.
       | 
       | Congrats to the Jupyter team on this!
        
         | 3l3ktr4 wrote:
         | Thanks! <3
        
       ___________________________________________________________________
       (page generated 2020-06-16 23:00 UTC)