加载中

PostgreSQL contains a slew of great features. Many of them are very well known. Others can be incredibly useful but are not as widely appreciated. Here are our top picks for great PostgreSQL features you may not have taken a look at but really should, since they can help you get code into production faster, make ops easier and generally get things done with less code and sweat.

Pub/Sub Notifications

PostgreSQL comes with a simple non-durable topic-based publish-subscribe notification system. It’s no Kafka, but the features do support common use cases.

Messages on a specific topic can be broadcast to all connected subscribers who are listening for that topic. The messages are pushed by the Postgres server to the listening clients. Polling is not required, but your database driver should support delivery of notifications to the application asynchronously.

The notification consists of a topic name and a payload (upto about 8000 characters). The payload would typically be a JSON string, but of course it can be anything. You can send a notification using the NOTIFYcommand:

NOTIFY 'foo_events', '{"userid":42,"action":"grok"}'

or the pg_notify() function:

SELECT pg_notify('foo_events', '{"userid":42,"action":"grok"}');

The subscription happens with the LISTEN command, but typically you’ve to use driver-specific APIs. Here’s the Go version for example.

PostgreSQL包含许多重要的功能。他们中的许多人都非常知名。其他人可以是非常有用的,但没有广泛赞赏。以下是我们首选的PostgreSQL功能,您可能没有仔细看过,但实际上应该这样做,因为它们可以帮助您更快地将代码投入生产,使操作更轻松,并且通常可以使用更少的代码和劳动来完成任务。

发布/订阅通知

PostgreSQL带有一个简单的非持久基于主题的发布 - 订阅通知系统。它不是Kafka,但功能确实支持常见用例。

关于特定主题的消息可以广播给正在监听该主题的所有连接的订阅者。这些消息被   Postgres服务器送给侦听客户端。轮询不是必需的,但您的数据库驱动程序应支持异步向应用程序传递通知。

通知由主题名称和有效负载组成(最多约8000个字符)。有效载荷通常是一个JSON字符串,但它当然可以是任何东西。您可以使用NOTIFY命令发送通知  :

NOTIFY'foo_events','{“userid”:42,“action”:“grok”}'

或者  pg_notify()  函数:

SELECT pg_notify('foo_events','{“userid”:42,“action”:“grok”}');

订阅发生在  LISTEN  命令中,但通常您必须使用驱动程序特定的API。这里的  围棋版本  的例子。

Table Inheritance

Assume you have a table called “invoices”. You’ve to now support “government invoices”, which are invoices but have a few additional fields. How would you model this? Rather than adding nullable columns to invoices, or a single nullable JSON column to invoices, try the inheritance feature:

CREATE TABLE invoices (
    invoice_number   int  NOT NULL PRIMARY KEY,
    issued_on        date NOT NULL DEFAULT now()
);

CREATE TABLE government_invoices (
    department_id    text NOT NULL
) INHERITS (invoices);

This reflects the situation that all government invoices are invoices, but have an extra attribute. The “government_invoices” table above has a total of 3 columns:

test=# \d invoices
                  Table "public.invoices"
     Column     |  Type   | Collation | Nullable | Default
----------------+---------+-----------+----------+---------
 invoice_number | integer |           | not null |
 issued_on      | date    |           | not null | now()
Indexes:
    "invoices_pkey" PRIMARY KEY, btree (invoice_number)
Number of child tables: 1 (Use \d+ to list them.)

test=# \d government_invoices
            Table "public.government_invoices"
     Column     |  Type   | Collation | Nullable | Default
----------------+---------+-----------+----------+---------
 invoice_number | integer |           | not null |
 issued_on      | date    |           | not null | now()
 department_id  | text    |           | not null |
Inherits: invoices

Adding rows work as though the tables are independent:

INSERT INTO invoices (invoice_number) VALUES (100);

INSERT INTO government_invoices
    (invoice_number, department_id) VALUES (101, 'DOD');

But watch happens when you SELECT:

test=# SELECT * FROM government_invoices;
 invoice_number | issued_on  | department_id
----------------+------------+---------------
            101 | 2018-06-19 | DOD
(1 row)

test=# SELECT * FROM invoices;
 invoice_number | issued_on
----------------+------------
            100 | 2018-06-19
            101 | 2018-06-19
(2 rows)

Invoice number 101 from the child table is also listed in the parent table. This facilitates algorithms that know only about invoices to work on the parent invoices table while ignoring any child tables.

You can learn more about table inheritance in PostgreSQL in the docs.

表继承

假如有一张叫 “invoices(发票)” 的表。你现在想支持 “government invoices(政府发票)”,这种发票在原来的发票之上添加了一些字段。该如何建模?是在 invoices 表中添加若干可空字段,还是增加一个可空的 JSON 字段?不妨试试继承功能:

CREATE TABLE invoices (
    invoice_number   int  NOT NULL PRIMARY KEY,
    issued_on        date NOT NULL DEFAULT now()
);

CREATE TABLE government_invoices (
    department_id    text NOT NULL
) INHERITS (invoices);

上述模型反映出了政府发票就是发票,但比发票多一些属性的情况。上面的 “government_invoices” 表总共有 3 列:

test=# \d invoices
                  Table "public.invoices"
     Column     |  Type   | Collation | Nullable | Default
----------------+---------+-----------+----------+---------
 invoice_number | integer |           | not null |
 issued_on      | date    |           | not null | now()
Indexes:
    "invoices_pkey" PRIMARY KEY, btree (invoice_number)
Number of child tables: 1 (Use \d+ to list them.)

test=# \d government_invoices
            Table "public.government_invoices"
     Column     |  Type   | Collation | Nullable | Default
----------------+---------+-----------+----------+---------
 invoice_number | integer |           | not null |
 issued_on      | date    |           | not null | now()
 department_id  | text    |           | not null |
Inherits: invoices

为它添加数据行就跟独立表一样:

INSERT INTO invoices (invoice_number) VALUES (100);

INSERT INTO government_invoices
    (invoice_number, department_id) VALUES (101, 'DOD');

不过观察一下 SELECT 时的情况:

test=# SELECT * FROM government_invoices;
 invoice_number | issued_on  | department_id
----------------+------------+---------------
            101 | 2018-06-19 | DOD
(1 row)

test=# SELECT * FROM invoices;
 invoice_number | issued_on
----------------+------------
            100 | 2018-06-19
            101 | 2018-06-19
(2 rows)

子表添加的编号为 101 的发票,也父表中也列出来了。这样做的好处是在父表中进行的各种算法In完全以忽略子表的存在。

这个文档可以了解到更多关于 PostgreSQL 继承方面的内容。

Foreign Data Wrappers

Did you know you can have virtual tables that actually serve data from another PostgreSQL instance? Or even SQLite files, MongoDB, Redis, and more? This feature is called Foreign Data Wrappers, which provides a standardized way to access and manipulate data stored externally to the Postgres server you’re connecting to. There are various FDW implementations available that let you connect to various different data sources. These are typically packaged as extensions.

The standard Postgres distribution comes with the postgres_fdw extension, which lets you connect to other Postgres servers. For example, you can move a big table to another server, and setup a virtual table (the proper term is a “foreign table”) locally:

-- install the extension (required only once)
CREATE EXTENSION postgres_fdw;

-- big_server is our big, remote server with the migrated table
CREATE SERVER big_server FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host '10.1.2.3', dbname 'big_db');

-- create a user mapping for the app user
CREATE USER MAPPING FOR app_user SERVER big_server
  OPTIONS (user 'remote_user', password 'remote_pass');

-- and finally create the virtual table invoices -> big_db.public.invoices
CREATE FOREIGN TABLE invoices (
  invoice_num int NOT NULL PRIMARY KEY
  -- other columns omitted for brevity
) SERVER big_server;

This Wiki page has a good list of the many FDW implementations available.

Apart from accessing data from other servers, FDW has also been used to implement alternate storage layouts, like cstore_fdw.

There is also dblink, which is yet another method of accessing remote PostgreSQL (only) data.

外部数据包装器

你知道你可以有一张虚表用来指向另一个PostgreSQL实例吗?或者另一个SQLite、MongoDB、Redis甚至其它的数据库?这个功能叫做外部数据包装器(FDW),它提供一个标准化的方法来存取和操作连接到Postgres服务器的外部数据源。有各种各样的FDW实现让你可以连接到不同的数据源,它们通常被打包为扩展插件。

标准Postgres分发包中有一个postgres_fdw扩展,它可以让你连接到其它Postgres服务器。例如,你可以移动一张大表到其它服务器,同时在本地建立一张虚表(正确的术语叫做"外部表"):

-- install the extension (required only once)
CREATE EXTENSION postgres_fdw;

-- big_server is our big, remote server with the migrated table
CREATE SERVER big_server FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host '10.1.2.3', dbname 'big_db');

-- create a user mapping for the app user
CREATE USER MAPPING FOR app_user SERVER big_server
  OPTIONS (user 'remote_user', password 'remote_pass');

-- and finally create the virtual table invoices -> big_db.public.invoices
CREATE FOREIGN TABLE invoices (
  invoice_num int NOT NULL PRIMARY KEY
  -- other columns omitted for brevity
) SERVER big_server;

这个Wiki有一个很好的列表列出了许多FDW的有效实现。

除了可以从其它服务器存取数据,FDW也被用作实现交互存储层,比如 cstore_fdw.

还有一个dblink扩展,它是另一种用来存取远程PostgreSQL数据的实现.

Partitioned Tables

Starting with version 10, PostgreSQL natively supports sharding of a table into multiple child tables, based on a partition key value computed from one or more column values. This allows a single, large table to be physically stored as separate tables, for better DML performance and storage management.

Here’s how to create a partitioned table, with the intention of adding a new table each month:

-- the parent table
CREATE TABLE invoices (
    invoice_number   int  NOT NULL,
    issued_on        date NOT NULL DEFAULT now()
) PARTITION BY RANGE (issued_on);
  
-- table for the month of May 2018
CREATE TABLE invoices_2018_05 PARTITION OF invoices
  FOR VALUES FROM ('2018-05-01') TO ('2018-06-01');

-- table for the month of June 2018
CREATE TABLE invoices_2018_06 PARTITION OF invoices
  FOR VALUES FROM ('2018-06-01') TO ('2018-07-01');

The child tables have to be manually or programmatically created, it does not happen automatically.

You can query and insert at the parent level, and PostgreSQL will route it to the appropriate child table, have a look:

First, we insert two rows:

test=# INSERT INTO invoices VALUES (10042, '2018-05-15');
INSERT 0 1
test=# INSERT INTO invoices VALUES (43029, '2018-06-15');
INSERT 0 1

We can see that the rows are actually inserted into the child tables:

test=# SELECT * FROM invoices_2018_05;
 invoice_number | issued_on
----------------+------------
          10042 | 2018-05-15
(1 row)

test=# SELECT * FROM invoices_2018_06;
 invoice_number | issued_on
----------------+------------
          43029 | 2018-06-15
(1 row)

But queries can be done on the parent, returning combined result:

test=# SELECT * FROM invoices;
 invoice_number | issued_on
----------------+------------
          10042 | 2018-05-15
          43029 | 2018-06-15
(2 rows)

Partitioning is in some ways similar to inheritance (query at parent level), but there are differences too (partition parent contains no data, for example). You can read more about partitioning in the docs.

PostgreSQL 11, currently in beta, improves upon this feature. This article has an overview of the improvements.

拆分表

从版本 10 开始,PostgreSQL 原生支持将一个表拆分成多个子表,其拆分基于对一列或多列数据的计算来进行。这一功能可以让一个巨大的表在物理上存储于多个表中,改善DML性能和存储管理。

下面演示了如何创建拆分表,该演示会为每个月的数据增加一张表:

-- the parent table
CREATE TABLE invoices (
    invoice_number   int  NOT NULL,
    issued_on        date NOT NULL DEFAULT now()
) PARTITION BY RANGE (issued_on);
  
-- table for the month of May 2018
CREATE TABLE invoices_2018_05 PARTITION OF invoices
  FOR VALUES FROM ('2018-05-01') TO ('2018-06-01');

-- table for the month of June 2018
CREATE TABLE invoices_2018_06 PARTITION OF invoices
  FOR VALUES FROM ('2018-06-01') TO ('2018-07-01');

子表必须由人工或通过程序创建,这个创建过程不会自动发生。

你可以在父级表中查询或插入数据,PostgreSQL 会自动到子表中去进行操作,来看一下:

先插入两行数据:

test=# INSERT INTO invoices VALUES (10042, '2018-05-15');
INSERT 0 1
test=# INSERT INTO invoices VALUES (43029, '2018-06-15');
INSERT 0 1

可以看到数据实际被插入到了子表中:

test=# SELECT * FROM invoices_2018_05;
 invoice_number | issued_on
----------------+------------
          10042 | 2018-05-15
(1 row)

test=# SELECT * FROM invoices_2018_06;
 invoice_number | issued_on
----------------+------------
          43029 | 2018-06-15
(1 row)

但在父表中也可以完成查询,返回合并的结果:

test=# SELECT * FROM invoices;
 invoice_number | issued_on
----------------+------------
          10042 | 2018-05-15
          43029 | 2018-06-15
(2 rows)

拆分方法与继承相似(在父表级别查询),但也存在一些区别(比如在拆分父表中没有保存数据)。你可以在这个文档中阅读到更多相关内容。

已经进入 Beta 阶段的 PostgreSQL 11 对这一功能会有所改进,这篇文章对此进行了叙述。

Range Types

Ever worked with temperature ranges, calendar scheduling, price ranges and the like? Working with intervals are one of those deceptively simple things that gently lead you into hair-pulling and late night debugging.

Here’s a table with a range column, and some values:

CREATE TABLE prices (
    item  text,
    price int4range -- int4range is a range of regular integers
);

INSERT INTO prices VALUES ('mouse',    '[10,16)');
INSERT INTO prices VALUES ('keyboard', '[20,31)');
INSERT INTO prices VALUES ('joystick', '[35,56)');

The numbers in the mismatched brackets represent a half-open interval. Here is the query to find all items that are in the price range $15 to $30, using the && operator (range overlap):

test=# SELECT * FROM prices WHERE price && int4range(15,30);
   item   |  price
----------+---------
 mouse    | [10,16)
 keyboard | [20,31)
(2 rows)

If you are not impressed, try writing the query without ranges. (No really, just try it.)

Range types are quite powerful – there are more operatorsfunctions, you can define your own range types, and even index them.

To learn more about ranges, try this presentation, and the docs.

区间类型

你以前与温度范围、日程表、价格区间或类似的数值范围打过交道吗?如果是,那你就会有这样的经验:看似简单的问题总会导致你抓耳挠腮并且经常深夜调试bug。以下是一个包含区间列的表和一些数值:

CREATE TABLE prices (
    item  text,
    price int4range -- int4range is a range of regular integers
);

INSERT INTO prices VALUES ('mouse',    '[10,16)');
INSERT INTO prices VALUES ('keyboard', '[20,31)');
INSERT INTO prices VALUES ('joystick', '[35,56)');

在错配方括号中的数值代表半开区间。以下是一个查询语句,它可以找出在价格区间15$~30$中的所有项,使用了&&操作符(区间交错):

test=# SELECT * FROM prices WHERE price && int4range(15,30);
   item   |  price
----------+---------
 mouse    | [10,16)
 keyboard | [20,31)
(2 rows)

为了让你印象深刻,你可以尝试一下使用无区间类型的查询语句有多难(试试就好)。

区间类型非常强大 --- 这里还有操作符函数,你也可以定义你自己的区间类型,甚至还可以索引它们。

为了学习更多关于区间的知识,你可以看看这篇文章,还有这篇

Array Types

PostgreSQL has supported arrays for a long time. Array types can reduce the amount of boilerplate application code and simplify queries. Here is a table that uses an array column:

CREATE TABLE posts (
    title text NOT NULL PRIMARY KEY,
    tags  text[]
);

Assuming each row represents a blog post, each having a set of tags, here is how we can list all the posts that have both “postgres” and “go” tags:

test=# SELECT title, tags FROM posts WHERE '{"postgres", "go"}' <@ tags;
               title               |          tags
-----------------------------------+------------------------
 Writing PostgreSQL Triggers in Go | {postgres,triggers,go}
(1 row)

The usage of the array type here makes for concise data modelling and simpler queries. Postgres arrays come with operators and functions, including aggregate functions. You can also create indexes on array expressions. Here is an article on using arrays with Go.

数组类型

PostgreSQL很久以前就已经支持数组类型了。数组类型可以精简应用代码并可以简化查询操作。以下是一个在表中使用数组列的例子:

CREATE TABLE posts (
    title text NOT NULL PRIMARY KEY,
    tags  text[]
);

假设每一行代表一篇博客,每篇博客又都有一个标签集,下面是我们如何列出所有带“postgres”和"go"标签的博客的代码:

test=# SELECT title, tags FROM posts WHERE '{"postgres", "go"}' <@ tags;
               title               |          tags
-----------------------------------+------------------------
 Writing PostgreSQL Triggers in Go | {postgres,triggers,go}
(1 row)

这里数组类型的使用使我们的数据模型更精确,同时也简化了查询操作。Postgres数组总是与操作符和函数一起出现,其中也包括集合函数。你也可以基于数组表达式创建索引。这里有一篇关于如何在Go语言中使用数组的文章。

Triggers

You can ask PostgreSQL to execute a specific function when rows are inserted, updated or deleted from a table. The function can even alter the values during an insert. You can read more about triggers here. Just to whet your appetite, here is a trigger that sends out a notification and writes to an audit log when a user is created:

-- a table of users
CREATE TABLE users (
  username text NOT NULL PRIMARY KEY
);

-- an audit log
CREATE TABLE audit_log (
  at          timestamptz NOT NULL DEFAULT now(),
  description text NOT NULL
);

-- the actual function that is executed per insert
CREATE FUNCTION on_user_added() RETURNS TRIGGER AS $$
BEGIN
  IF (TG_OP = 'INSERT') THEN
    -- add an entry into the audit log
    INSERT INTO audit_log (description)
        VALUES ('new user created, username is ' || NEW.username);
    -- send a notification
    PERFORM pg_notify('usercreated', NEW.username);
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

-- set the function as an insert trigger
CREATE TRIGGER on_user_added AFTER INSERT ON users
  FOR EACH ROW EXECUTE PROCEDURE on_user_added();

Now if we try to add a new user, an audit log entry is automatically added:

test=# INSERT INTO users VALUES ('alice');
INSERT 0 1
test=# SELECT * FROM audit_log;
                at             |             description
-------------------------------+-------------------------------------
 2018-06-19 04:00:30.672947+00 | new user created, username is alice
(1 row)

触发器

当对表中的行进行插入、更新或删除操作时,你能请求PostgreSQL执行一个特殊的函数,这个函数甚至可以在插入过程中修改值。你可以点击这里了解更多关于触发器的信息。以下是一个例子:当创建用户时,触发器发出通知并写入稽核日志。

-- a table of users
CREATE TABLE users (
  username text NOT NULL PRIMARY KEY
);

-- an audit log
CREATE TABLE audit_log (
  at          timestamptz NOT NULL DEFAULT now(),
  description text NOT NULL
);

-- the actual function that is executed per insert
CREATE FUNCTION on_user_added() RETURNS TRIGGER AS $$
BEGIN
  IF (TG_OP = 'INSERT') THEN
    -- add an entry into the audit log
    INSERT INTO audit_log (description)
        VALUES ('new user created, username is ' || NEW.username);
    -- send a notification
    PERFORM pg_notify('usercreated', NEW.username);
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

-- set the function as an insert trigger
CREATE TRIGGER on_user_added AFTER INSERT ON users
  FOR EACH ROW EXECUTE PROCEDURE on_user_added();

现在,如果你尝试增加一个新用户,一个稽核日志记录将会被自动添加。

test=# INSERT INTO users VALUES ('alice');
INSERT 0 1
test=# SELECT * FROM audit_log;
                at             |             description
-------------------------------+-------------------------------------
 2018-06-19 04:00:30.672947+00 | new user created, username is alice
(1 row)

pg_stat_statements

pg_stat_statements is an extension that is present by default in your PostgreSQL distribution, but is not enabled. This extension records a wealth of information about each statement executed, including the time taken, the memory used and disk I/Os initiated. It’s indispensible for understanding and debugging query performance issues.

The overhead of installing and enabling this extension is small, it’s very simple to use, and there is no reason NOT to run it on your production servers! Read the docs for more info.

pg_stat_statements

pg_stat_statements是一个扩展插件,默认在PostgreSQL分发包中就已经包含了,只是默认没有启用。这个扩展记录了每条执行语句的健康信息,包括执行时长、内存使用、磁盘IO初始化等。对于需要了解和调试查询性能的场景它是不可或缺的一个扩展。

安装和启用这个扩展的开销非常小,它也非常易于使用,因此没有理由不在你的生产server中使用这个扩展。

Hash, GIN and BRIN Indexes

The default index type in PostgreSQL is the B-Tree, but there are also other types, that are documented here. Other index types are very helpful in cases that aren’t actually uncommon. In particular, setting up indexes of the hash, GIN and BRIN type might just be the solution to your performance issues:

  • Hash: Unlike B-Tree indexes which have inherent ordering, hash indexes are unordered and can only do equality matches (lookup). However, hash indexes occupy much lesser space and are faster than B-Trees for equality matches. (Also, note that prior to PostgreSQL 10 it was not possible to replicate hash indexes; they were unlogged.)

  • GIN: GIN is an inverted index, which essentially allows for multiple values for a single key. GIN indexes are useful for indexing arrays, JSON, ranges, full text search etc.

  • BRIN: If your data has a specific, natural order – for example, time series data – and your queries typically work only with a small range of it, then BRIN indexes can speed up your queries with very little overhead. BRIN indexes maintain ranges per block of data, allowing the optimizer to skip over blocks that contain rows that won’t be selected by the query.

Start reading about PostgreSQL index types here.

哈希GIN还有BRIN索引

PostgreSQL中默认的索引类型是B-Tree,有记录表示也有其他类型。其他索引类型在非常不常见的情况下非常有用。特别是设置散列,GIN和BRIN类型的索引可能只是解决您的性能问题:

  • 散列:与具有固有排序的B树索引不同,散列索引是无序的,只能执行相等匹配(查找)。然而,散列索引占用更小的空间并且比平等匹配的B树更快。 (另外,请注意,在PostgreSQL 10之前,不可能复制散列索引;它们未被记录。)

  • GIN:GIN是一个倒排索引,它基本上允许单个键的多个值。 GIN索引对索引数组,JSON,范围,全文搜索等非常有用。

  • BRIN:如果您的数据具有特定的自然顺序(例如时间序列数据),并且您的查询通常只适用于其中的一小部分范围,那么BRIN索引可以以很小的开销加快查询速度。 BRIN索引维护每个数据块的范围,允许优化器跳过包含不会被查询选中的行的块。
    在这里开始阅读关于PostgreSQL索引类型。

Full Text Search

PostgreSQL can do full text search very well, including in languages other than English. Here’s an article that walks you through the steps of creating a full text query search app over Postgres in Go.

全文本搜索

PostgreSQL也很好地支持全文本搜索,甚至支持除英语之外的语言。这里有一篇文章教你如何基于PostgreSQL用Go语言一步步创建一个全文本搜索查询。

返回顶部
顶部