Call PostgreSQL functions via HTTP JSON-RPC

kbanman · on June 3, 2012

Maybe coming from a web development background has left me paranoid, but this just screams scary to me.

gaius · on June 3, 2012

Nah, it's safe enough. See Oracle's take on this, http://apex.oracle.com/

einhverfr · on June 3, 2012

It scares me because it is letting your front-end pick your stored procedures with very little restriction.

einhverfr · on June 3, 2012

We've been doing something kinda similar in LedgerSMB with versions of PostgreSQL going back to 8.1 when it was current....

Our approach was different though. We have object methods explicitly mapped to stored procedures by name (this is done to limit the stored procedures exposed by the application). If the number of arguments changes, the middleware does not need to be updated.

The queries are then called through an interface that, in essence, hands a hashref (in Perl) and a stored procedure name to a mapper function which then looks up the argument names and assembles the query.

This is then called, with an optional order by clause, and the results returned as a list of hashrefs.

While this isn't the exact technologies listed here, the approach is something I have found wonderful and so it's probably worth sharing.

JoelJacobson · on June 3, 2012

It this part of the LedgerSMB project? If so, what part of the source code handles things? Would be interesting to see how you have implemented it.

When you create a new function, do you have to do anything in addition to do CREATE FUNCTION, before you can use the function in your front-end/backoffice/whatever external system component?

einhverfr · on June 3, 2012

Quick guide to the code (see https://ledger-smb.svn.sourceforge.net/svnroot/ledger-smb/br... and https://ledger-smb.svn.sourceforge.net/svnroot/ledger-smb/br...).

The typical call goes something like this:

$object->execute_method({funcname => 'my_stored_procedure'});

This calls execute_method in DBObject.pm

The primary magic happens between lines 112 and 170 which is where we look up the argument names and process them. Then this gets handed off to LedgerSMB.pm's "call_procedure" interface which builds the query and executes it.

It isn't great code and there is room for improvement. It will be improved over coming versions. The 1.4 codebase is moving to use Moose for newer code (all code by 2.0), and the combination of Moose and this approach to method mapping is proving quite useful.

I am also working on a more ambitious library for doing this query sort of thing which will add the possibility to add windows and windowed columns onto stored procedure calls at mapping time, so you can ask for a GL running balance while specifying ordering.... That probably won't be ready for 1.4 though.

einhverfr · on June 3, 2012

Yes it is. The file that has the mapping routine is LedgerSMB/DBObject.pm.

I won't say it's great code. It was one of the first things I wrote when trying to break out of the inherited codebase from SQL-Ledger (shudders).

We also prefix all argument names in the sql function definition with in_ to prevent collision with column names.

Here's a post I wrote about the mechanics:

http://ledgersmbdev.blogspot.com/2011/10/introduction-to-sod...

JoelJacobson · on June 3, 2012

I looked at the code. Reminds a bit of what we are currently doing. We also have a Perl function for each SP. But that's what I want to get away from, the middle-layer should be fully capable of automatically mapping the request to the appropriate SP, like in this proof-of-concept. Can you think of one scenario when it isn't possible to do so? I've tried but haven't came up with anything.

Also, in execute_method, what if two functions matches the same name? Looks like you just pick the first one it finds. Shouldn't you throw an error if unsure which one to pick?

einhverfr · on June 4, 2012

Answering in inverse order:

Since the name is the discovery criteria, function overloading is not supported in this interface. It's possible to define a different one, but the idea here is that the stored procedures should be discoverable based on name. So we have test cases that raise errors when functions are overloaded that we don't know should be.

So in answer to your last question, behavior is undefined, and such a condition will trigger failures in our database unit tests.

On the other side, the reason to enforce an explicit mapping is because it isn't clear how much you really want to trust the client to execute any function in the database at all. An earlier version used AUTOLOAD to map these methods in directly so no middleware changes would be necessary. We abandoned that approach in part because of concerns it could be abused. So the issue isn't what can you do with my framework that you can't do with your framework but actually the other way around. Do I want to whitelist or blacklist stored procedures? Whitelisting seems safer.

A future version will probably use Module::Compile to build mappings from the database, but we haven't defined interfaces sufficiently to make this happen. However one idea is to tie Perl classes to PostgreSQL complex data types, and grab functions which return those complex types. Since all procedures are supposed to return useful data (even delete operations might return the deleted row), then this might work.

Again that's in the design phase.

Edit: It occurs to me that if the return type and the function name become the discovery criteria, then two functions could have the same name but different args as long as they have different return types.

EugeneOZ · on June 3, 2012

It's horrible. You don't need API, just give to clients of your API credentials to your database and they will use DB directly. And it still will be horrible (because only good way to use application by another application is API, not direct access to data). Also, stored procedures is a bad thing - business logic should be written in code, not in data and not in the database.

mryan · on June 3, 2012

"business logic should be written in code, not in data and not in the database"

This makes no sense. Stored procedures are code - just code that executes within your DB, instead of on your app servers. I agree that it makes sense to keep your business logic all in one codebase, but there are times when stored procs are a very efficient solution to business-logic problems.

Overall I agree that this "DB API" solution is not entirely elegant, and might provide as many security holes as it does interesting debugging challenges.

However, there are some situations in which I could see this being incredibly useful. Combining this Nginx's JSON/API modules means some parts of your API could entirely bypass your app servers.

Let's say you have an API that shows the ten items most recently liked by a particular user. Typically, this request would hit Nginx, your app server, then your cache and/or DB. By using the "shared procs API" and some Nginx modules, you could provide the same data by letting Nginx query your cache/DB directly.

EugeneOZ · on June 3, 2012

No, stored procedures is not a part of business logic. Read books of Robert Martin and Martin Fowler, they will explain it for you.

fdr · on June 3, 2012

Procedures are procedures regardless of where they are written. In many aspects it can be mechanically more difficult to deal with SQL UDFs but that is a usability weakness that just needs some more treatment rather than anything particularly fundamental, I feel.

There are three common reasons I write UDFs:

* The SQL type system and dependency tracking are deemed helpful. This applies only to 'LANGUAGE SQL' functions, which are just regular queries. Unlike dynamically generated SQL from the application, type checks and dependency tracking on other SQL objects can be employed. This is hard to address in any tooling without having a complete copy of the catalogs (and many databases are not as self-hosted complete as Postgres' catalogs, so sometimes that may not be enough...) and semantic analyzer.

* Diverse language and/or library access. If there is a series of obscure rules to hold simple invariants locked into exactly one application's libraries, it makes it pretty much impossible to use any other library. This can be addressed, but it's often expensive to do so.

* Performance, in rare cases where network round trip dominates execution time and I need more than one plain SQL statement; this is only the case for the integrated 'procedural' languages. One could accomplish something similar by running the application's code closer to the machine, and I think that'd be better, actually, but requires some orchestration.

mryan · on June 4, 2012

I think the flood of replies disagreeing with you suggest you might want to rethink that assumption.

Another reply to your post said you are making a 'religious argument' - I think there is some truth in that.

If you had said "in my experience, shared procs are not a good place to store business logic", that would have been your opinion, and no-one could argue with you. But you stated, as fact, that "business logic should be written in code, not in data and not in the database". This is not objectively true for all companies, and can be easily disproved. And was disproved, by all the people who said this assumption is incorrect.

Throwing around the names of authors does nothing to help your case (useless "appeal to authority"). I have read some of Robert Martin's work (but none of Martin Fowler's) and still think storing business logic in database rules can be a very good idea.

EugeneOZ · on June 5, 2012

"And was disproved" - lol. It was just disputed, but not disproved. I don't care about flood of some anonymous on HN, I more care about articles and books, written by really experienced authors.

gaius · on June 3, 2012

This is silly. If you have many applications accessing the database - and not every app will go via your API, unless you plan to rewrite everything - then you want your business rules enforced in one place, that will be effective for any client, be it a command line, a web app, client-server, Excel, third party products, anything.

The web world has produced people for whom there is a 1:1 correlation between app and DB. In the enterprise world, most apps are written to run against existing DBs, some of which have been around for decades. This is scalable development processes. And the funny thing is, web kids will blithely repeat "DRY" without ever understanding the truth of it.

JoelJacobson · on June 3, 2012

What's wrong with a 1:1 correlation, when it makes sense? If what you need to do is to access or modify data, then a stored procedure language such as PL/pgSQL is perfect for the job. But if you need to render some graphics using GD in PHP, that's obviously not a job for the database. The total API of a system is probably not only stored procedures, but quite a lot of API methods, stored procedures can take care of the entire task, without the need to involve PHP or any other middle-layer language.

gaius · on June 3, 2012

Because it's a tiny corner of the database world, and isn't terribly representative of the majority of database work. The kids who do all their logic in the app, are storing up trouble for themselves in future, when they want to use the DB for more than just the website.

ocharles · on June 3, 2012

> Also, stored procedures is a bad thing - business logic should be written in code, not in data and not in the database.

Why?

Your post generally makes a lot of criticisms, but fails to back up any of the points made.

EugeneOZ · on June 3, 2012

Lot of reasons. Refactoring, unit-tests, readability... If you don't understand my points, it doesn't means they falls :)

mryan · on June 4, 2012

You did not mention any of these things in your original post. Why would you assume that he doesn't understand them, if you didn't even bother to type them?

But yes, completely failing to mention a point when attempting to make an argument generally means they fail :)

EugeneOZ · on June 5, 2012

You talk like teenager.

mryan · on June 8, 2012

Dude, that's like, totally SO not true.

You might want to read the HN posting guidelines - your comment history suggests you have not done so yet. I have no interest in continuing a conversation with you, based on the level of discourse evident in your previous comments. Have a good day.

NickNameNick · on June 3, 2012

Primarily, because the stored procedures are not effectively under version control.

JoelJacobson · on June 3, 2012

Maybe yours aren't, but mine are. I've been using this simple tool for three years now.

https://github.com/gluefinance/PgDeploy

It gives you an extra safety net, in case the version in the database would differ compared to the version in the VCS, even though they shouldn't, someone might have been evil behind your back. :)

PgDeploy lets you preview the change and show you a diff. If the diff is expected, you can proceed and deploy.

icebraining · on June 3, 2012

Only if your workflow is bad and you're editing stuff directly on the DB. Stored procedures work fine with VCS if you write them on files and then run those. Where I work, creating a new database with all the procedures is just a matter of loading the right .sql file.

JoelJacobson · on June 3, 2012

That's a religious argument. Many applications are a perfect fit for a database centric design with business logic written in stored procedures. Not all, but some.

You might find find old article interesting, http://ora-00001.blogspot.com/2011/07/mythbusters-stored-pro...

einhverfr · on June 3, 2012

I agree with you about API's (and would further note that stored procedures are a way to build API's around your database, as are views).

However, I disagree that stored procedures are a bad thing. There are several ways that stored procedures can be used.

1) Named queries (An API around your db!)

2) Set-intensive operations are really best done in SQL. You can do a heck of a lot (of even multi-stage logic!) in a single query when you throw in some in-line views, common table expressions. Writing in stored procedures gives you a single, testable reference for functionality.

3) Exporting application functionality to other programs that may want to use the db (like Excel for example).

4) Unit tests in the db are very, very helpful and highly underrated. Gotta love the fact that rollback means "none of this ever happened."