Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

should someone literally pull the plug on your Postgres database server, your data is 100% safe, except for unlogged tables, in which your data is 100% gone - by design!

This omits a crucial fact: your data is not gone until Postgresql has gone through its startup recovery phase. If you really need to recover unlogged table data (to which every proper database administrator will rightly say "then WhyTH did you make it unlogged"), you should capture the table files before starting the database server again. And then pay through the nose for a data recovery specialist.

However, a crash will truncate it.

So this isn't exactly true. A crash recovery will truncate it.



Unlogged table data commonly won't even have been written to disk. And what is on disk is completely inconsistent after a crash. So really, the data is gone once postgres crashed (itself, due to hw issues, kill -9, ...)


Interesting, I thought unlogged tables meant they dont get get WAL-ed and stored on FS at all.

Any further reading you can suggest?


They do have backing files - after all, unlogged tables can end up many times the size of memory. We need to reserve space for them, so there's a very high likelihood we could evict the buffers / shut down. But they're just written out due to buffer pressure or shutdown checkpoints (+things like slowing the tablespace).


s/slowing/altering/

Damn you autocorrect.


A quick scan of the postgresql manual turns up nothing. I would have expected the steps performed during startup recovery to be documented there. Neither is the manual explicit about data file usage for an unlogged table, there's just these two snippets:

from https://www.postgresql.org/docs/15/sql-createtable.html#SQL-...:

> Data written to unlogged tables is not written to the write-ahead log (see Chapter 30), which makes them considerably faster than ordinary tables

and from https://www.postgresql.org/docs/15/glossary.html#GLOSSARY-UN...:

> The property of certain relations that the changes to them are not reflected in the WAL. This disables replication and crash recovery for these relations.

Which both say nothing about the normal data files underlying unlogged tables, so none of what I wrote can be found in the official docs (or maybe it can, just not by me ;)

However, there is also this page from the postgres developer wiki: https://wiki.postgresql.org/wiki/Future_of_storage which does say that pure in-memory tables are not supported by Postgres:

> it would be nice to have in-memory tables if they would perform faster. Somebody could object that PostgreSQL is on-disk database which shouldn't utilize in-memory storage. But users would be interested in such storage engine if it would give serious performance advantages.


The article says it is stored.

The documentation on feature says it is only truncated on crash.

Where did you read it is not saved on disk?


At a high level, there are some things that coordinate when transactions happen in terms of WAL logging:

1. The Buffer Pool (memory for pages in the database)

2. The Log Manager

3. The Transaction Manager

4. The Storage Manager (read/write to disk)

5. Accessor Methods (interpret page bytes as e.g. Heap or BTree)

This is abstract, not particular to Postgres, which doesn't have exact such names for all above things.

Normally, when the Transaction Manager creates transactions that modify records/tuples (using the Accessor Methods) these actions need to be persisted via the Log Manager to the WAL.

The pages of memory backing these records come from the Buffer Pool, and the Buffer Pool must also log certain actions.

Before the Buffer Pool can flush any modified page to disk, the changes up to that page must have been persisted by the WAL via the Storage Manager as well.

When you create unlogged tables, none of this happens, and when you modify records there's no trail.

There's an attempt at tl;dr'ing WAL

I am not an expert (Anarazel is)


I thought the article was pretty clear.


'Safe' here obviously means including transactional integrity, so I feel you're criticising the wrong thing.

> A crash recovery will truncate it.

That's so nitpicky it has lesser nitpicks living upon it.


> 'Safe' here obviously means including transactional integrity, so I feel you're criticising the wrong thing.

Good news, they're not criticizing the word "safe", they're criticizing the word "gone".

> That's so nitpicky it has lesser nitpicks living upon it.

It says "100% gone" when you pull the plug.

Pointing out that is merely queued for deletion is a big deal, not a nitpick.


> Pointing out that is merely queued for deletion...

ok, so before deletion occurs, 1) how are you going to get it back and 2) how do you know how much you have/haven't lost (that transactional integrity I mentioned)?


"capture the table files before starting the database server again. And then pay through the nose for a data recovery specialist."

How much? Probably most of it. The data is not safe. But it's also not gone.


I understand what you're saying and I think we're both right, except that we're looking from different angles: you're thinking of it as bits on a disc and I'm looking at it as if I were running my business on that database.

Edit: put another way, you: "Yep, I think we can get something back". Me: "aaaaaaaaaaaargh! mah payroll!"


Suggesting you pay a lot of money for data recovery is also treating it as important business data.

In this scenario you already screwed up badly but you can salvage a lot. Very different from there being no hope.


Is it likely that the data is corrupted?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: