[HN Gopher] Generating JSON Directly from Postgres ___________________________________________________________________ Generating JSON Directly from Postgres Author : izend Score : 124 points Date : 2021-07-15 18:15 UTC (4 hours ago) (HTM) web link (blog.crunchydata.com) (TXT) w3m dump (blog.crunchydata.com) | DoomHotel wrote: | Postgres (>11) also implements the SQL/JSON path language that | lets you extract table data from JSON. | tehlike wrote: | Once you build V8 into postgres, things get real interesting | (thin javascript projections, and a bunch of other stuff).. | | https://github.com/plv8/plv8 | gigatexal wrote: | I wish RedShift got these functions. :( | pjungwir wrote: | If you are using Rails and ActiveModelSerializers, I wrote a gem | to take your serializer definition and build the whole JSON | result in Postgres: [0] It hasn't gotten much use, but I know at | least one company that has used it to get 100x performance boosts | on #index endpoints. | | It generates jsonapi right now, since that is standard for Ember | projects, but it'd be pretty easy to add an adapter for some | other format. | | [0] https://github.com/pjungwir/active_model_serializers_pg | davidwparker wrote: | How does it compare in terms of performance to | https://github.com/jsonapi-serializer/jsonapi-serializer (a | fork of Fast JSON API out of Netflix, which wasn't being | maintained)? | jrochkind1 wrote: | wow, that's really neat, this could solve a problem I have, | bookmarking! | blacktriangle wrote: | Please just no, do not do this. Even the dumbest most simple | system there is will evolve beyond simple CRUD access to tables. | The second you put your schema on the wire, that schema can now | never change. That's just such an insane commitment to make. | chousuke wrote: | the database schema is an API like everything else, and you can | solve API evolution problems the same way as you would with a | web API, using eg. views, stored procedures and versioning. | | There's no reason not to expose parts of your database schema | directly to users as long as you treat it like you would any | other API you provide. | LukeEF wrote: | this is a really interesting way to think about it and | something that I've given a fair amount of thought to as we | embark on a data mesh journey - how does a domain team serve | up a schema like an API, how does it evolve, and how do we | think about versions. | onefuncman wrote: | that's what a Schema Registry is for... protobuf is a | little more polished around managing forwards and backwards | compatibility than avro but they both work. | chousuke wrote: | I find that there are very few things related to | programming that don't constitute an interface in some way, | so it feels natural to me to think about everything in | terms of the interface it presents to users, both | explicitly in what you can express via whatever language | you're using, and implicitly via conditions that can't be | enforced but are required for correctness. | | And when you have an interface, you really ought to put | some thought into its design regardless of how "private" it | is. | | An API is an interface used by a program in some way, and | any user-exposed database objects definitely qualify under | that definition. | iammisc wrote: | > The second you put your schema on the wire, that schema can | now never change. That's just such an insane commitment to | make. | | If you change your schema, you can just write a sql expression | to convert between schemas 'at runtime'. Like if you have a | table with first name and last name and then you decide you | really just need the name on the frontend, just change your | SELECT statement to concat the two columns, or vice versa. | tengbretson wrote: | The application layer is typically just a view that maps the | data layer to the expectations of the view layer. If you are | set on eliminating the application layer surely you could just | implement that view in the data layer using... a view. | joevandyk wrote: | you can use views or functions to easily make changes to the | schema | jkingsbery wrote: | "Your scientists were so preoccupied with whether or not they | could, they didn't stop to think if they should." | | While interesting, the advice offered in this post is generally | bad, or at least in complete. OK, so, you "cut out the middle | tier..." what now? Are web clients connecting to PostGRES | directly? Will PostGRES handle connection authentication and | request authorization? Logging that a particular user looked at | particular data? Can the client cache this data? If so, for how | long? Even taking the author's premise, this is not a good | pattern - a simple JSON get request still comes with a bunch of | other stuff that this doesn't bother to address. | | But the premise is wrong - few applications just spit out JSON | like this, they also have to handle updates, and there's business | logic involved in that. Data needs to be structured in a way | that's reasonable for clients to consume, which isn't always a | row of data (maybe it should be collapsing all values in a | column, so that in the JSON the client just gets a JSON array of | the values). | simonw wrote: | I worked on a Django project recently where we were generating | GeoJSON from data in a PostgreSQL table. | | We investigated using tricks like this to generate the JSON | directly from a SQL query - my notes here: | https://til.simonwillison.net/postgresql/constructing-geojso... | | The performance was a big enough improvement that, had we been | bottlenecked on this particular operation, we would have | shipped this (I'm afraid I don't have notes on benchmarks). | | We only decided not to ship it because of concerns about the | ongoing maintenance overhead, since we had other Python code | generating GeoJSON that we would have still used - so we would | have ended up maintaining multiple implementations. | nicholasjarnold wrote: | If one wishes to "cut out the middle tier" they might be better | served by a investigating a project like PostgREST [0] | | [0] https://postgrest.org/en/stable/ | herdrick wrote: | Or PostGraphile: https://www.graphile.org/postgraphile/ | joshgel wrote: | I've been trying to decide whether to move a Django-DRF API | to one of these. | zbjornson wrote: | Shameless plug: I made bson-to-json for more or less the same | purpose with MongoDB: converting the raw BSON response from the | DB directly to JSON strings. Saves lots of CPU cycles and GC | pressure. | | https://github.com/zbjornson/bson-to-json | rkwasny wrote: | Don't do this. | | It will create various problems when next engineer adds some | business logic, another one adds some stored procedures.... | Thaxll wrote: | It's a bad idea and I'd like to see how do people actually do | test using this model. Now you have to run PG locally everytime | you need to unit test something? Also PG returns some JSON but if | the object is not exactly what you want to send back you create | another object and merge the two? Just don't do it, it's a | terrible design. It "works" for simple API but for anything | serious this is wrong. | gigatexal wrote: | Docker run postgres... boom testing solved ;) | Thaxll wrote: | For DB that you can't run locally how do you do? or the | CI/CD, there are a millions of example where people don't run | the DB locally. | kristiandupont wrote: | Why can you not run PG locally? | kerreks wrote: | It's pretty common to run db locally in docker for | integration testing these days | tofuahdude wrote: | Did you read the article at all before criticizing it as | "terrible design"? All its talking about are native postgres | methods for pre-formatting data. | limaoscarjuliet wrote: | It is pretty easy to get any result set straight from PostgreSQL | as json object: | | "WITH _ AS (" real query goes here ") SELECT | COALESCE(array_to_json(array_agg(_.*)), '[]') AS xyz FROM _"; | | From there it behaves as standard JSON, wich can be easily loaded | into any language. | recursivedoubts wrote: | _> Too often, web tiers are full of boilerplate that does nothing | except convert a result set into JSON._ | | Yeah. This is the problem: we've abandoned the hypermedia | architecture of the web for a dumb-data, RPC model. I suppose if | you are going to go that direction and things are simple enough, | you can jam it in the DB and get smoking perf. | | But as things get more complicated, where does the business logic | live? Maybe in the database as stored procedures? It's less crazy | than it sounds. | | An alternative is to cut out the _other_ middle man: JSON, ditch | heavy front-end javascript and return HTML instead, going back to | the hypermedia approach. The big problem with that is a step back | in UX functionality, but there are tools[1][2][3] for addressing | that. | | [1] - https://htmx.org (my library) | | [2] - https://unpoly.com | | [3] - https://hotwired.dev | munk-a wrote: | This is just my experience but I've found that business logic | usually ends up living _before_ the query - most pages (whether | read or update) follow a sort of pattern like... | | 1. Receive, route and validate (for form) the request | | 2. Validate the request for function (business rule validation | - can such a user do such on a Tuesday and the like) | | 3. Compute business attributes around the form of the response | or update | | 4. Execute the query against the database | | 5. Send the results to the user | | I strongly agree that step 5 there doesn't really need to | involve very much stuff happening outside of the DB - in our | particular platform we have a very smooth rowset -> JSON | translator in the application framework that includes support | for mapping functions over the rows as they come out of the DB | - the result is that we pretty much stop executing application | code as soon as we send a query to the DB. While we do still | delay the actual JSON encoding until the application layer it's | thin, dumb and done in a streamed manner so that we don't have | to hold all that junk in memory - and it comes with mode | switches to either redirect the big DB blob to an HTML | template, echo it as a CSV or even run a PDF template to render | it into what the user wants to see. | princesse wrote: | > But as things get more complicated, where does the business | logic live? Maybe in the database as stored procedures? It's | less crazy than it sounds. | | I've worked with a bunch of systems like this before. One was | just a collection of PHP scripts that would trigger SQL | queries. Another one was all lambdas and Cron jobs on top of | mssql stored procedures. | | If you have a decent team and if you use version control as | intensely as you would with code, I have no reason to believe | this cannot work. To be fair, those condition are true | regarding of tech and architecture. | | What you cannot do easily though is pivot, hire, and scale. | This is what bit those teams I worked with and why those | specific systems are no longer around. | _jal wrote: | > One was just a collection of PHP scripts that would trigger | SQL queries. | | Yes, I've had to build a system like that (Java, but same | diff.) | | This model is popular in particular with banks, because they | can split sensitive responsibilities in a way that makes | sense to them. | | It works fine if you have good communications between | developers and DBAs. If you don't... well, I won't have to | suggest finding another gig, you already want to. | simonw wrote: | I used to avoid things like stored procedures like the | plague, because of my fear that crucial business logic would | end up outside of version control. | | These days I'm comfortable enough driving EVERY database | change through a version controlled migration system (usually | Django migrations) that I'm not concerned about this any | more. It's not hard to enforce version controlled migrations | for this kind of thing, provided you have a good migration | system in place. | MuffinFlavored wrote: | > Maybe in the database as stored procedures? | | I always wondered if I could make something like this work: | https://github.com/plv8/plv8 | | Maybe couple it with this: https://postgrest.org/ | | Just not sure if it was worth the effort upfront to learn | something other than simple Express (node.js) | servers/middleware functions/router controllers with database | client access. That paradigm just feels infinitely more "in | control" and "extensible" to me. | inopinatus wrote: | Once the client is the MVC execution environment with the | client-server interaction used mostly for data replication, | plus some extra invokable server-side behaviours written in | isomorphic code, we can congratulate ourselves on having | more-or-less reinvented Lotus Notes. | rhacker wrote: | This all feels backwards for me. I mean I already did YEARS of | JSF with facelets/richfaces/icefaces/primefaces/JSF2.0 | sureglymop wrote: | I am doing this at work, I have an application with all the | business logic in the database. It is a huge hassle though... I | am versioning a bunch of scripts with git but it really isn't | convenient. | | The thing is that a database already stores data but all the | schematics and ddl stuff would be much more conveniently kept | in descriptive plaintext. | programmarchy wrote: | Curious how far you've gone with this. You could have your | CI/CD sync your stored procedures. But you would probably | want to go further than that and manage them with a tool like | db-migrate so you can keep your stored procedures tracking | with your schema changes rather than just versioning them. At | the end of the day, can you really separate your stored | procedures from the data they operate on? I would say no you | can't. | formerly_proven wrote: | One thing I particularly dislike about SQL is DDL and all the | system-specific extensions (e.g. for access control). I don't | really get why there is a separate _thing_ like "ALTER TABLE | foo DROP COLUMN bar;" if you could have "DELETE FROM | schema.foo WHERE name = 'bar';". 90 % of DDL is just | duplicating basic DML functionality, Except For That Special | Kind Of Object Which Totally Smells Like It's Relational But | It's Not. Especially for access control - _why_ is that just | not a table? I 'm sure it literally is one. | | This makes SQL needlessly complex and tools to work with | schemas far more complicated than they actually have to be. | | In Lisp, most _things_ are a list /pair. In SQL, most things | should have been a table. | rhacker wrote: | That would be pretty nice, insert into | schema.users (type, name, columns) values ('btree_index', | 'index_dob', 'dob DESC, name ASC') | inopinatus wrote: | In many RDBMS this is in fact the case. However, the system | schema is necessarily close to the database's architecture | and capabilities, making it implementation specific. DDL is | also implementation specific but there's a much higher | degree of abstracted regularity. | mistersys wrote: | Okay, yeah we can serve UI over the wire. It's possible to make | it work. But the experience will never be the same as a a | native application that's designed to handle offline | functionality, interactivity and optimistic updates. | | The Hey email client is great example, hotwired.dev was built | for Hey. | | Guess what? It kind of sucks. It's buggy and slow. Randomly it | stops working. When the internet goes down, random things work, | random things don't work. If it weren't for Hey's unique | features like the screener, I would much rather use a native | app. | | There's a ton we can do to make the the developer experience of | rendering on the client side better, but there's only so much | we can do to make the user experience of serving UI over the | wire better. When the wire breaks or slows down, the UI | renderer stops working. | | We built an internal tool for our team we call "restless", and | it lets us write server side code in our project, and import it | from the client side like a normal functional call, with full | typescript inference on both ends. It's a dream. No thinking | about JSON, just return your data from the function and use the | fully typed result in your code. | | We combine that with some tools using React.Suspense for data | loading and our code looks like classic synchronous PHP | rendering on the server, but we're rendering on the client so | we can build much more interactive UIs by default. We don't | need to worry about designing APIs, GraphQL, etc. | | Of course, we still need to make sure that the data we return | to the client is safe, so we can't use the active record | approach of fetching all data for a row and sending that to the | client. We built a SQL client that generates queries like the | ones in the OP for us. As a result, our endpoints are | shockingly snappy because we don't need to make round trip | requests to the db server (solving the n+1 ORM problem) | | We write some code like: select( | Project, "id", "name", | "expectedCompletion", "client", "status" | ) .with((project) => { return { | client: subselect(project.client, "id", "name"), | teamAssignments: select(ProjectAssignment) | .with((assignment) => { return { | user: subselect(assignment.user, "id", "firstName", | "lastName"), }; }) | .where({ project: project.id, | }), }; }) .where({ | id: anyOf(params.ids), }) | | And our tool takes this code and generates a single query to | fetch exactly the data we need in one shot. These queries are | easy to write because it's all fully typed as well. | JohnTHaller wrote: | > Maybe in the database as stored procedures? | | I did this for the very first external web app built for | Bankers Trust Company back in the day. SQL Server back end with | (classic) ASP on the front end. Even if someone had gained | access to the web server, they wouldn't have had access to | anything they shouldn't have. I built web apps, an online | store, and a learning/video website using the same tech around | the same time and they worked without issue for well over a | decade. So, there is something to be said for this approach, | though I wouldn't build something like this today. | zozbot234 wrote: | You could return linked data via JSON-LD and get the best of | both worlds - standardization plus smart UX. | nsriv wrote: | This has been the appeal behind Phoenix LiveView for me. | exdsq wrote: | I worked on a stored-procedure driven finance engine once. | There was an internal language to build procedures quicker. | Swear to god I had errors running them through the SQL IDE | because they were too large. Once had to debug a 1.6m line SQL | file. These things can really get out of hand if people are | left unchecked! | kerreks wrote: | Does it scale well? I feel like doing all business logic in | procedures would require company to use one huge database in | order to intersect various domains | jansommer wrote: | The obligatory comment to these posts is always "check out | PostgREST". We use it for all frontend api's. Combined with Node | or any other middleware, we add additional code if everything | can't be done in plpgsql, but requests are usually just passed | straight through Node. | ptrwis wrote: | I did the approach with nesting jsonb_agg queries, it worked for | me but was a bit hacky and sometimes complex. I hoped for Graph | Query Language (the one supposed to be part of SQL standard) to | make it easier to retrieve structred json from flat tables, but | it doesn't look ready yet. | xaduha wrote: | The more things change, the more they stay the same. | | If we lived in XHTML 2.0 universe instead of HTML5 universe, then | that would be the way you do things. Not with SQL and JSON mind | you, but with XML everywhere. | | I had a glimpse into that world and it looked pretty good at the | time. XML databases + XQuery can produce whatever you want as | long as it's XML. And XML could be many things. Many horrible | things too, made by people with arms growing out of places they | shouldn't. | tofuahdude wrote: | A lot of commenters jumping on the "cut out the middle tier" | title. | | The bulk of the content is simply about formatting JSON in the DB | instead of manually mapping rows in the application layer. | | It doesn't say "eliminate your API layer" or "have no application | logic between client and db" as most are jumping to. | | I find the actual methods described as helpful in that I can | convert to the data structures I ultimately want in one pass | instead of two. | | Doesn't mean I don't have validation or a traditional API layer. | Just easier to use. | felixge wrote: | As with everything, it depends. | | I've implemented a large system managing billions of records | using exactly this approach of cutting out a lot of boilerplate | in the application layer. | | The most important thing is to be pragmatic. This approach works | great for CRUD as well as many types of complex analytical | queries. However, for cases where the the complexity or | performance of the SQL was unacceptable, we also decoded data | into application data structures for further processing. | | When done well, you can get the best of both worlds as needed. | maxk42 wrote: | It is a really, really bad idea to put business logic in your | database and there are people in this thread downvoting everyone | who points it out. Please do not do this in production - I've had | to maintain systems that married business logic and database and | it was a nightmare to maintain and cost a fortune to fix. | tshaddox wrote: | What do you mean? Isn't your database precisely where you ought | to put your business logic? | throw1234651234 wrote: | Is this a joke? SPs and SQL Functions are far less | maintainable than .NET/Java/Node/Python. | mdavidn wrote: | That's perhaps true of stored procedures, but SQL views, | constraints, and triggers written in a declarative manner | are far more maintainable and far less buggy than | imperative application code, in my experience. | ako wrote: | Why are they less maintainable? It's just code you can | write in a code editor, version control in a versioning | system, have automatic tests for, have automatic deployment | scripts for, etc. In the end the database is just an | application server executing your code close to the data. | Only thing you have to ask yourself is how much your code | needs to scale. But it can be much more efficient to run | business logic in the database as stored procedures. | Thaxll wrote: | "But it can be much more efficient to run business logic | in the database as stored procedures." | | But this is not code, this SQL or equivalent, if SQL was | known to be better than code to do busines logic everyone | would do that. | mousepilot wrote: | There's all sorts of programming styles in use and | calling stored procedures from dot net programs for | instance is at least a thing. | | Plus, lots of those ORM things get plenty of hate from | DBAs, I mean I like them but then again I'm not a very | good programmer. | ako wrote: | How is this not code: https://oracle.su/docs/11g/appdev.1 | 12/e10472/packages.htm#in... ? | | What is your definition of code? | mdavidn wrote: | Using a migration utility like ActiveRecord, Flyway, or | go-pg, SQL definitions can also live in version control, | have automatic tests, and deploy automatically. Who has a | database in production without these things? | munk-a wrote: | I personally have found it's much more difficult to | partially mock out concepts when writing logic in the DB | layer though my tooling may have been the weakpoint. In a | nice application language I can easily isolate specific | units of code for direct testing - while every test I | write over complex database logic feels like composing a | symphony where I need to make sure the horns are coming | in on the right pitch and the wind section isn't too | early to properly simulate a relatively easy error | scenario. | | Since the application layer is much more naturally | aligned with small well isolated chunks of logic it is | easier to minimize the volume of code and logic in scope | when testing a particular attribute - when that goes into | the DB I've always seen things get more complex rapidly. | [deleted] | pbreit wrote: | Wut? You put data in a database. Logic goes in the app. | tyingq wrote: | In some cases, things like constraints and the model itself | represent business logic. | jayd16 wrote: | Like everything, it depends. Some business logic like "this | field should never be negative" is probably fine. | | More complex logic is a lot harder to maintain. You don't | want to stress your DB's CPU if you can avoid it. You still | need some kind of connection pooler anyway. There's a lot of | reasons it doesn't make sense to put all your business logic | in the DB. SQL is not the easiest language to maintain (not | enough abstraction). | | IMO the line to draw is just enough to keep data sanity. | jasonpeacock wrote: | Business logic goes into the code, data goes into the | database. | | Sometimes business logic is implemented as rules, in which | case the rules (configuration) can go into either | configuration files or a database. But that doesn't make it | data... | fiddlerwoaroof wrote: | Data is just crystallized business logic. Anemic databases | aren't a big deal when the database is just a persistence | layer for a single app but, when you're rewriting the | application code or allowing BI queries directly against | the database, designing a database as a service whose | interface is stored procedures starts making data integrity | much easier. | ahallock wrote: | I think this is slightly reductive. The majority of your | business logic, I agree, should reside in application code, | but lightweight business logic in the form of database | constraints are powerful at disallowing bad data. For | example, you may have a rental booking application that | shouldn't have overlapping date ranges. The database can | give you this guarantee out of the box and will work for | ALL clients and applications. | oauea wrote: | Thanks for sharing now wisdom, now if you could also explain | _why_? | mistersys wrote: | This isn't really business logic... unless you're in the | business of simply converting data formats. | | As a data fetching approach and especially as tooling primitive | to eliminate the n+1 ORM problem this is a terrific tool to be | aware of. | jrochkind1 wrote: | how old were the systems? And have you ever had to maintain | systems that were that many years old that _weren 't_ a | nightmare to maintain and cost a fortune to fix? | tofuahdude wrote: | The posted article doesn't advocate for putting business logic | in the DB. | | It only describes postgres native functions for formatting | data. | green7ea wrote: | I made a side project to experiment with this approach (and rust) | a few years ago. It worked pretty well but it made simple queries | more complex[1]. I was a little bit worried that complex queries | would become a lot more complex so I'm not sure how well this | approach would scale. | | It removed a crazy amount of boilerplate code [2] though and made | the project much more focused on the actual data which is always | a good thing. | | Ever since, I've been very curious to try edgedb [3] since it | promises to do similar things without making the queries more | complex. | | [1]https://github.com/green7ea/newsy/blob/master/src/feed_overv.. | . | | [2]https://github.com/green7ea/newsy/blob/master/src/main.rs#L4.. | . | | [3]https://www.edgedb.com/ | jexp wrote: | I'm not impressed. Same in Neo4j's | | MATCH (e:Employee) RETURN e { .* } as employee; | | MATCH (e:Employee) RETURN collect(e { .* }) as employees; | | MATCH (e:Employee)-[:IN]->(d:Department) WITH d, collect(e {.*}) | as employees ___________________________________________________________________ (page generated 2021-07-15 23:00 UTC)