[HN Gopher] A Jupyter Kernel for SQLite ___________________________________________________________________ A Jupyter Kernel for SQLite Author : Tomte Score : 201 points Date : 2020-06-16 14:31 UTC (8 hours ago) (HTM) web link (blog.jupyter.org) (TXT) w3m dump (blog.jupyter.org) | seemslegit wrote: | Cool hack, but probably wouldn't install a separate kernel or run | an entire notebook just for sqlite work. | | What would be useful is for the kernel of your language of choice | to provide a magic for sqlite and return results of queries in | language-native data structures. | | Something like this: https://pypi.org/project/ipython-sql/ | santafen wrote: | I like this bit a lot. Might have to extend it for QuestDB. I | did a Notebook for QuestDB earlier this week, but having this | part would be great! | seemslegit wrote: | Alternatively: a new type of cell in jupyter itself alongside | 'code' and 'markdown' for sql work available regardless of | kernel choice. | pletnes wrote: | There's already the %%bash cell magic, which does that for | bash. | jedimastert wrote: | What's the distinction between "SQL" and "code"? | seemslegit wrote: | 'code' would be the code of whatever kernel is in use and | 'sql' would a notebook-level feature like 'markdown' is | today. | 3l3ktr4 wrote: | Yeah, I had this idea at first, but it's not trivial to run | more than one kernel at the same time. I know some examples | of people who did it, and some projects that I could plug in | and try to make it work with my kernel, but this was a first | version to see how the community responds! Let's see, might | do something like it in the future. I think it'd be really | cool and powerful to make it interact with Python for | example. | thom wrote: | org-babel in Emacs has some level of support for passing data | between different languages in different cells etc. | | https://orgmode.org/worg/org-contrib/babel/intro.html#meta-p... | TallGuyShort wrote: | I wonder if this is closer to what you're suggesting, as it | already allows sharing of simple data structures between cells | of distinct languages: https://polynote.org/. One cell: SQLite | kernel to get data. Next cell: operates on that data. | gepoch wrote: | Excellent! I usually just type up my SQL in triple quotes and | execute it from python, which gives me a little bit of templating | too. Excited to give this a try (and start using more views.) | amasad wrote: | This is very cool. SQL is a great interactive language, we added | it to repl.it too: https://repl.it/languages/sqlite | 3l3ktr4 wrote: | Oh my god! That's so cool! Thanks for sharing it! | aghillo wrote: | I like this. Previously I've used notebooks to explain a data | pipeline from different perspectives. One notebook showing the | ETL process going from raw data to RDF triples in a store; one | SPARQL notebook showing the raw queries; and then a final | decision support notebook using a Python binding to the | underlying SPARQL query library. It seemed to work well. | justinclift wrote: | Interesting. Wonder if it'd be possible to embed it for | visualisation in a Go web app? | | Was adding basic online chart capabilities to our SQLite | publishing website a few weeks ago (eg: | | https://dbhub.io/vis/justinclift/Marine%20Litter%20Survey%20... ) | | But if people could do Jupyter notebooks and visualise them like | this too, that could be useful. | | Hmmm, should probably set up some kind of survey on our website | to ask... :) | 3l3ktr4 wrote: | Hit me up on https://gitter.im/QuantStack/Lobby if you think I | can be useful. I'm @marimeireles there and on Github too! :) | Keyframe wrote: | How's this different from let's say BeakerX or Zeppelin? | reallymental wrote: | Incredible, but wasn't this kind of available though psycopg2 | (postgres connector), SQLAlchemy or any other kind of database | connection library ? | | I realise the difference between the kernels altogether, but how | is one better than the other? | jedimastert wrote: | Weird as it might sound, I can imagine a scenario where I | wouldn't need anything past SQL in some form. I've got plenty | of python scripts sitting around my hard drive that are just | fronts for data manipulation. Especially with the cvs import | mode, I could see myself using just sqlite or whatever | yellowapple wrote: | Well, as someone who strongly dislikes Python and strongly | likes SQLite, this might give me a reason to willingly use | Jupyter (granted, I guess I could do it through Julia, too, | which I also strongly like, but... eh). | | A lot of the time when I need to work with a bunch of data I'll | fire up the SQLite CLI, snarf the data with the built-in CSV | import commands, and do whatever queries I wanna do. This looks | like it can provide a nicer UI around that workflow, and I'm | all for it. | | Datasette looks interesting for this, too; I'll have to give | that a look. | mritchie712 wrote: | Yes and when paired with Pandas `read_sql`[0] it's easy to | quickly query, view and plot. | | There's also pandasql[1] which is based on SQLite. We use this | behind the scenes at SeekWell[2] to power cross database / | cross source joins. | | 0 - https://pandas.pydata.org/pandas- | docs/stable/reference/api/p... | | 1 - https://pypi.org/project/pandasql/ | | 2 - https://seekwell.io/ | tejtm wrote: | PSA: SQLite is embedded as a standard builtin python library. | | no pip install anything just import sqlite | | https://docs.python.org/3.7/library/sqlite3.html | | But I will be trying the Jupyter kernel in the article | dataminded wrote: | Not really. You could use the python kernel and embed your SQL | code within your python code but you couldn't run cells with | just SQL. It made for a very poor analyst experience. | | Microsoft got it right with Azure Data Studio. | yellowapple wrote: | Does ADS support SQLite? I already use it for SQL Server, and | I'd tried the PostgreSQL add-on but couldn't get it to work | for some reason (but that was when it was first released so | it might've stabilized since then). | dataminded wrote: | No. I've seen success with Microsoft SQL and PostgreSQL. | | I've also experienced instability issues, it needs that VS | Code level polish. | llimllib wrote: | If you're interested in this, you might be interested in | Datasette: https://datasette.readthedocs.io/en/stable/ | | Which seems to me to be farther along in providing advanced | querying/faceting/visualization/sharing capabilities on top of | sqlite. | | (I love jupyter, and this kernel seems neat; not trying to throw | stones at anybody, just to link a project in a similar domain) | seemslegit wrote: | Having to rerun the query and reload the page just to change | the sort order on a small dataset is very 1997 | simonw wrote: | You gotta re-run the query or you're limited to sorting just | the visible results. | | I guess I could let it spot when there are less than the | page-size of results, but then I'd need to be confident that | the JavaScript sorting algorithm exactly matches the | underlying SQLite sorting algorithm - taking into account | character sets and custom collations and suchlike. | | I've been deliberately avoiding adding any JavaScript to core | Datasette almost to make a point: you don't need it. HTML | pages load faster than so-called "modern" SPA monstrosities. | They work really well. They don't require a bunch of extra | steps to avoid breaking the back button etc. | | But... Datasette provides a JSON API for everything, and | supports plugins. There's nothing to stop someone who really | wants no-page-reload query execution from implementing it as | a plugin. | | I myself have built JavaScript plugins for things like bar | charts and line charts ( https://github.com/simonw/datasette- | vega ) and map visualizations ( | https://github.com/simonw/datasette-cluster-map ) | | So yeah, my personal bias here is that building websites like | we did in 1997 is deeply undervalued. | seemslegit wrote: | The UI needs to be such so as to make the users understand | that by clicking a column header they would be sorting the | results of a previous query rather than re-running it with | a different sort column and that the two are not generally | interchangeable, likewise with instant in-results search. | | At least with the server being on the internet and the | example setup I disagree that the current way is faster or | even near a modern webui | simonw wrote: | Can you think of a way I could make that clear in the UI | ("this will sort all of the data including the rows you | can't see" v.s. "this will sort the rows that are visible | to you right now")? | seemslegit wrote: | I think this is the natural expectation of anyone | familiar with spreadsheets, reporting tools or data grids | with sortable column headers, but a message along the | lines of | | "Sorting results of the previous query, click 'Run SQL' | button again to query with ${column_name} as the ORDER BY | column" | | and a "got it" hyperlink to be stored in your session | state store of choice upon first click on a column header | would probably make it clear. | 3l3ktr4 wrote: | None taken! This is a much more developed project indeed. | | I think the SQLite kernel for Jupyter has a nice foreseeable | future with graph visualizations for the query results and the | whole integration with conda, mamba and pip environments, | though. | crazygringo wrote: | Wow. It never even occurred to me that this was missing from | Jupyter -- but in hindsight seems _so_ obvious. | | Congrats to the Jupyter team on this! | 3l3ktr4 wrote: | Thanks! <3 ___________________________________________________________________ (page generated 2020-06-16 23:00 UTC)