Finding its origin as a fix for [pg_rewind] (https://www.postgresql.org/docs/devel/static/app-pgrewind.html) where rewind process could fail if a file marked as listed in the source server was removed before fetching its data (see more details here, this could impact temporary files or relation file for example), here is a small feature of PostgreSQL 9.5 that may be useful for application developers:
commit: cb2acb1081e13b4b27a76c6b5311115528e49c59
author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
date: Sun, 28 Jun 2015 21:35:46 +0300
Add missing_ok option to the SQL functions for reading files.
This makes it possible to use the functions without getting errors, if there
is a chance that the file might be removed or renamed concurrently.
pg_rewind needs to do just that, although this could be useful for other
purposes too. (The changes to pg_rewind to use these functions will come in
a separate commit.)
The read_binary_file() function isn't very well-suited for extensions.c's
purposes anymore, if it ever was. So bite the bullet and make a copy of it
in extension.c, tailored for that use case. This seems better than the
accidental code reuse, even if it's a some more lines of code.
Michael Paquier, with plenty of kibitzing by me.
Postgres has a set of superuser functions allowing to have a look at the files of PGDATA from the [SQL interface] (https://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-ADMIN-GENFILE-TABLE):
- pg_ls_dir, to list the files of a given path.
- pg_read_file, to read a given file and return its data as text.
- pg_read_binary_file, to read a given file and return its data as bytea.
- pg_stat_file, to get statistics about a file similarly to the system function stat().
With the above commit, all those functions have gained a missing_ok option allowing to not fail should a file (or path for pg_ls_dir) selected by the user not exist, returning a NULL result instead (or more or less an empty record for pg_ls_dir). In each case, the default is false.
In the case of pg_read_file and pg_read_binary_file, this new boolean flag is available with a new 4-argument version of those functions (for backward-compatibility purposes).
=# SELECT pg_read_file('postgresql.auto.conf', 0, 32, true) AS data;
data
----------------------------------
# Do not edit this file manually
(1 row)
=# SELECT pg_read_file('file_not_exists', 0, 1, true) AS data;
data
------
null
(1 row)
For pg_stat_file, a second argument is added.
=# SELECT * FROM pg_stat_file('postgresql.auto.conf', true);
-[ RECORD 1 ]+-----------------------
size | 88
access | 2015-07-05 21:15:52+09
modification | 2015-07-03 21:42:12+09
change | 2015-07-03 21:42:12+09
creation | null
isdir | f
=# SELECT * FROM pg_stat_file('file_not_exists', true);
-[ RECORD 1 ]+-----
size | null
access | null
modification | null
change | null
creation | null
isdir | null
pg_ls_dir is a bit more special, actually two options have been added:
- missing_ok to avoid an error in case of a missing path.
- include_dot_dirs to list ‘.’ and ‘..’ in the content list of this path.
The problem with including only missing_ok would be that as it returns an empty content list should the path specified by user be missing, it would not be possible to make the difference between an empty folder that exists, and one which is missing. include_dot_dirs is here to make the difference between both cases: if the content listed just contains ‘.’ and ‘..’ and missing_ok is true, it means that the directory defined by user is empty but exists. missing_ok is the second argument of the function and include_dot_dirs its third argument.
-- This one does not exist.
=# SELECT pg_ls_dir('not_exists', true, true);
pg_ls_dir
-----------
(0 rows)
-- No 2PC transactions are running.
=# SELECT pg_ls_dir('pg_twophase', true, true);
pg_ls_dir
-----------
.
..
(2 rows)
Combined together, this set of functions can be quite powerful, here is for example a WITH RECURSIVE query able to list all the files in PGDATA (query more or less a copycat of the one used by pg_rewind in libpq_fetch.c), informing as well the size of each entry or if it is a directory or not.
WITH RECURSIVE files (path, filename, size, isdir) AS (
SELECT '' AS path, filename, size, isdir FROM
(SELECT pg_ls_dir('.', true, false) AS filename) AS fn,
pg_stat_file(fn.filename, true) AS this
UNION ALL
SELECT parent.path || parent.filename || '/' AS path,
fn, this.size, this.isdir
FROM files AS parent,
pg_ls_dir(parent.path || parent.filename, true, false) AS fn,
pg_stat_file(parent.path || parent.filename || '/' || fn, true) AS this
WHERE parent.isdir = 't' AND
this.size IS NOT NULL)
SELECT path || filename AS file_path, size, isdir
FROM files;
Then use for example pg_read_file to grab the data of a given file and it is possible to fetch a complete copy of PGDATA via SQL (superuser rights needed for that of course).