[HN Gopher] Show HN: DataStation - App to easily query, script, ...
       ___________________________________________________________________
        
       Show HN: DataStation - App to easily query, script, and visualize
       data
        
       Author : eatonphil
       Score  : 45 points
       Date   : 2022-05-31 20:10 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | canMarsHaveLife wrote:
       | How does it compare to Redash (now Databricks SQL):
       | https://github.com/getredash/redash?
        
         | [deleted]
        
         | eatonphil wrote:
         | I haven't used it but just from looking at the Github page. It
         | looks like redash has more advanced dashboarding features today
         | (I'd like to catch up here). In contrast redash doesn't really
         | allow you to manipulate data very much if it doesn't come in a
         | form you want or if you can't get it into the right form with
         | SQL alone.
         | 
         | DataStation allows you to script results of database queries
         | (or loaded Parquet, Excel, CSV, etc. files or HTTP API
         | responses) in Python, Node, R, Julia, etc.
         | 
         | Also, DataStation is first-off a desktop app today so it's very
         | easy to install and use -- especially in a corporate
         | environment. Data never leaves your laptop. In the future I
         | think more people will use the server version of DataStation so
         | you can get server features like recurring exports and hosted
         | dashboards but desktop will always be supported too.
        
       | programmarchy wrote:
       | Looks very useful! In terms of feedback, I think if you brought
       | in a designer you'd have a much bigger "wow" factor. There's a
       | lot of low hanging fruit like consistent button styles, fonts,
       | whitespace, larger text inputs, that'd go a long way. And I'm
       | sure you've thought of this already, but seems like a node-based
       | paradigm could be an improvement over the panel-based paradigm
       | e.g. more akin to something like Blender nodes, or Tableau.
        
         | eatonphil wrote:
         | > In terms of feedback, I think if you brought in a designer
         | you'd have a much bigger "wow" factor. There's a lot of low
         | hanging fruit like consistent button styles, fonts, whitespace,
         | larger text inputs, that'd go a long way.
         | 
         | Yes this would be nice to have! If there's a version of this
         | that gets funded or bootstrapped then I'd definitely like to
         | bring someone on to help.
         | 
         | > And I'm sure you've thought of this already, but seems like a
         | node-based paradigm could be an improvement over the panel-
         | based paradigm e.g. more akin to something like Blender nodes,
         | or Tableau.
         | 
         | Actually no I'm not familiar with this concept. But I have seen
         | what natto.dev does and I'm concerned that that is too free
         | form compared to how DataStation works. A little structure is
         | useful IMO. I'm not sure how similar Blender nodes or Tableau
         | are to natto.dev.
         | 
         | That said, DataStation panels show up in an order but the order
         | of evaluation is not set. You can import the results of a panel
         | defined below the current panel it just matters that the panel
         | you refer to has been _run_. So it may be closer to a node-
         | based design in that case. But again I 'm not sure if that's
         | what you mean.
        
           | programmarchy wrote:
           | Hadn't seen natto before, but I agree that's pretty far out
           | there! If you search images of Tableau Prep, that's more
           | along the lines of what I had in mind. Although Tableau
           | supports Python and R, it's not nearly as well integrated as
           | what you've done with DataStation. In general, it's more
           | geared towards Excel power user types, rather than
           | programmers.
        
             | eatonphil wrote:
             | > If you search images of Tableau Prep, that's more along
             | the lines of what I had in mind.
             | 
             | Ah! I think this is a visualization of what does happen
             | with DataStation panels too. Eventually I'd like to have
             | better support for understanding the dependency graph like
             | this but for now that's just been a nice idea to have
             | sometime in the future.
             | 
             | > Although Tableau supports Python and R, it's not nearly
             | as well integrated as what you've done with DataStation. In
             | general, it's more geared towards Excel power user types,
             | rather than programmers.
             | 
             | Yeah it was definitely my impression it was not geared
             | toward programmers as much (though I know many programmers
             | or data scientists use it).
        
       | bamazizi wrote:
       | The UX reminded me of [PipeDream](https://pipedream.com/)
       | 
       | The industry around abstractions tools/ui on top DBs is growing.
       | We use Retool very heavily and it does get pricy.
       | 
       | This is a very neat execution and has potential for SAAS or Cloud
       | offering. Like "Bring your own DB" and build your own
       | abstractions.
        
         | [deleted]
        
         | eatonphil wrote:
         | > This is a very neat execution and has potential for SAAS or
         | Cloud offering. Like "Bring your own DB" and build your own
         | abstractions.
         | 
         | Definitely my goal for the future is SaaS/Cloud where you can
         | work on projects as a team and configure hosted dashboards,
         | recurring exports and alerts out of panels you set up in a
         | DataStation project.
        
       | eatonphil wrote:
       | Hey folks! I quit my job at Oracle almost a year ago now to build
       | DataStation. It's an app I've wanted as an engineering manager
       | for years. It's entirely open-source and while I've had a few
       | awesome contributors I'm mostly the only person on it. It has
       | been funded out of contract development and savings.
       | 
       | DataStation helps you query a variety of data sources
       | (conventional SQL like PostgreSQL and MySQL, non-SQL like
       | Prometheus or Elasticsearch), files and HTTP APIs. It is not a
       | SQL layer on top of these various APIs like FDW in Postgres or
       | Apache Calcite.
       | 
       | DataStation just tries to abstract away glue code. So in
       | DataStation for Prometheus you query with PromQL. For
       | Elasticsearch you query with Lucene. And for SQL databases you
       | query with their SQL dialect. But you don't need to remember how
       | to use the appropriate library for your language. You just need
       | your own credentials.
       | 
       | DataStation is made of panels (other apps might call them cells)
       | that each produce a result. Panels can refer to other panels.
       | These allow you to build workflows that cross the boundary of a
       | particular datasource. For example you might have some data in a
       | CSV a product manager gave you and the bulk of your data is in
       | PostgreSQL. In DataStation you could pull in the CSV with a File
       | panel and pull in the Postgres data with a Database panel. Then
       | you can join both panel results in a Code panel using your
       | favorite language like Python, Ruby, R, Node, Julia, etc. You can
       | even script Code panels in a SQLite dialect with a bunch of rich
       | addons (url parsing, best-effort date parsing, statistics
       | aggregation, etc.): https://github.com/multiprocessio/go-
       | sqlite3-stdlib.
       | 
       | You can watch a simple introductory video:
       | https://www.youtube.com/watch?v=q_jRBvbwIzU. Or if you want to
       | see that cross-datasource interaction taken to an extreme, check
       | out this video using Postgres metadata to filter log data in
       | Elasticsearch to do historic request analysis on a subset of
       | customers: https://www.youtube.com/watch?v=tIh99YVHoRE.
       | 
       | DataStation is mainly a desktop app today where the end result is
       | that you export graph SVGs or HTML tables or markdown tables or
       | just a CSV file. All this data stays on your laptop so it's as
       | easy to use in a corporate environment as any existing SQL IDE or
       | Jupyter Notebook.
       | 
       | In the last year it's reached 1.5k stars on Github, over 1000
       | unique users and currently on-average about 40 fairly active
       | users per month (defined as having opened the app more than a few
       | times).
       | 
       | Since it's only just now 12 months old it's been going through a
       | lot of maturing during this time. If you've tried it before and
       | it was buggy or too slow it's probably worth another try now if
       | you're still interested.
       | 
       | DataStation is primarily an Electron app but the code that
       | evaluates panels is written in Go. The Go evaluation code forms
       | the backbone of another app you may have seen around HN, dsq:
       | https://github.com/multiprocessio/dsq, which is a limited version
       | of DataStation as a CLI for querying files with SQL.
       | 
       | In the future I'd like to see more people using it as a server
       | app where my goal is to support read-only dashboards and
       | recurring exports. That part is still work-in-progress.
       | 
       | You can find a ton of tutorials on how to interact with supported
       | databases on the DataStation website:
       | https://datastation.multiprocess.io/docs/.
       | 
       | Looking forward to your feedback!
        
         | lopatin wrote:
         | This is really cool. Maybe in the future you can make a paid
         | version with a bunch of BI features.
         | 
         | In your opinion, how does it compare to PyCharm (Enterprise
         | version) when it's all blinged out with big data tools and
         | integrations? I recently realized that PyCharm is my Data IDE
         | and not just my Python editor. I only use limited features
         | though, so hard for me to compare the extent of functionalities
         | between the two.
         | 
         | Edit: Well, PyCharm won't let you join two different data
         | sources, so that's one big difference!
        
           | eatonphil wrote:
           | > Edit: Well, PyCharm won't let you join two different data
           | sources, so that's one big difference!
           | 
           | Right!
           | 
           | On the other hand, any real code IDE will have high-quality
           | autocomplete, jump-to-definition, all that code IDE stuff. In
           | the future DataStation may be able to hook into tree-sitter
           | or LSP but for now it's more like a textarea with syntax
           | highlighting (although the SQL code panel autocomplete is
           | relatively complete).
           | 
           | Similarly, SQL IDEs have better exploration of your database.
           | DataStation can't tell you about which tables or schemas
           | exist yet (although I want it to in the future).
           | 
           | DataStation competes more directly with _Python scripts_ than
           | with SQL IDEs and code IDEs (although there is of course
           | overlap).
        
             | tyingq wrote:
             | It does look at bit like parts of Tableau's desktop
             | product.
        
               | eatonphil wrote:
               | I haven't used Tableau but I have had some people show up
               | in Discord to ask about using DataStation as an
               | alternative. So maybe it is similar, but I don't know.
        
         | alashow wrote:
         | Any reason for not having a web client?
        
           | eatonphil wrote:
           | You can run it as a web server! It's just not as commonly
           | done right now since I haven't put much time into integration
           | with cloud providers (stuff like CloudFormation templates I
           | mean) and I don't yet have a public Docker image that is up
           | to date.
           | 
           | https://datastation.multiprocess.io/docs/0.11.0/DataStation_.
           | ..
        
       | moltar wrote:
       | Looks amazing.
       | 
       | Will try tomorrow. Athena alone is a superior offer in my mind.
       | Even TablePus, my favourite SQL client doesn't do that :)
       | 
       | If you can add dbt integration it will be a killer product!
       | 
       | Thank you!
        
         | eatonphil wrote:
         | Thanks for the kind words!
         | 
         | The only caveat I'll say is that it's definitely not as mature
         | in general as SQL clients (stuff like table, column discovery
         | and autocomplete does not exist yet). But it is pretty
         | convenient to use DataStation if you like being able to easily
         | switch into Python/JavaScript/whatever without needing to look
         | up the docs for how to connect to and run a query against every
         | database.
         | 
         | > If you can add dbt integration it will be a killer product!
         | 
         | I haven't used dbt and my impression was that it was a glue
         | system for copying data from one place to another. But maybe
         | that's not correct. Is it possible to query dbt data directly?
         | Or how would you imagine it fitting into a DataStation flow.
         | Thank you!
        
       ___________________________________________________________________
       (page generated 2022-05-31 23:00 UTC)