🛀 🍢 🛒 使用PostgreSQL中索引的所有功能 🆗 📼 👧🏽

在Postgres世界中，索引对于有效导航数据库存储库（称为堆，堆）至关重要。 Postgres不支持集群，并且MVCC体系结构使您可以累积同一元组的多个版本。因此，能够创建和维护有效的索引以支持应用程序非常重要。

这里是一些优化和改进索引使用的技巧。

注意：下面显示的查询适用于未修改的pagila示例数据库。

使用覆盖索引

让我们回顾一下为非活动用户检索电子邮件地址的请求。 customer表具有active列，并且请求很简单：

 pagila=# EXPLAIN SELECT email FROM customer WHERE active=0; QUERY PLAN ----------------------------------------------------------- Seq Scan on customer (cost=0.00..16.49 rows=15 width=32) Filter: (active = 0) (2 rows)

该查询将调用扫描customer表的完整顺序。让我们为active列创建索引：

 pagila=# CREATE INDEX idx_cust1 ON customer(active); CREATE INDEX pagila=# EXPLAIN SELECT email FROM customer WHERE active=0; QUERY PLAN ----------------------------------------------------------------------------- Index Scan using idx_cust1 on customer (cost=0.28..12.29 rows=15 width=32) Index Cond: (active = 0) (2 rows)

它有所帮助，随后的扫描变成了“ index scan ”。这意味着Postgres将扫描idx_cust1索引，然后继续搜索表的堆以读取查询所需的其他列（在本例中为email列）的值。

PostgreSQL 11引入了覆盖索引。它们允许您在索引本身中包括一个或多个其他列-它们的值存储在索引数据存储中。

如果我们使用此功能并在索引内添加了电子邮件值，则Postgres无需在表堆中查找email值。让我们看看这是否有效：

 pagila=# CREATE INDEX idx_cust2 ON customer(active) INCLUDE (email); CREATE INDEX pagila=# EXPLAIN SELECT email FROM customer WHERE active=0; QUERY PLAN ---------------------------------------------------------------------------------- Index Only Scan using idx_cust2 on customer (cost=0.28..12.29 rows=15 width=32) Index Cond: (active = 0) (2 rows)

“ Index Only Scan ”告诉我们查询现在仅需要一个索引，这有助于避免所有磁盘I / O读取表堆。

如今，覆盖索引仅适用于B树。但是，在这种情况下，陪同人员的努力会更高。

使用部分索引

部分索引仅索引表中行的子集。这样可以节省索引的大小和更快的扫描速度。

假设我们需要从加利福尼亚客户那里获得电子邮件地址列表。该请求将如下所示：

 SELECT c.email FROM customer c JOIN address a ON c.address_id = a.address_id WHERE a.district = 'California'; which has a query plan that involves scanning both the tables that are joined: pagila=# EXPLAIN SELECT c.email FROM customer c pagila-# JOIN address a ON c.address_id = a.address_id pagila-# WHERE a.district = 'California'; QUERY PLAN ---------------------------------------------------------------------- Hash Join (cost=15.65..32.22 rows=9 width=32) Hash Cond: (c.address_id = a.address_id) -> Seq Scan on customer c (cost=0.00..14.99 rows=599 width=34) -> Hash (cost=15.54..15.54 rows=9 width=4) -> Seq Scan on address a (cost=0.00..15.54 rows=9 width=4) Filter: (district = 'California'::text) (6 rows)

普通索引会给我们带来什么：

 pagila=# CREATE INDEX idx_address1 ON address(district); CREATE INDEX pagila=# EXPLAIN SELECT c.email FROM customer c pagila-# JOIN address a ON c.address_id = a.address_id pagila-# WHERE a.district = 'California'; QUERY PLAN --------------------------------------------------------------------------------------- Hash Join (cost=12.98..29.55 rows=9 width=32) Hash Cond: (c.address_id = a.address_id) -> Seq Scan on customer c (cost=0.00..14.99 rows=599 width=34) -> Hash (cost=12.87..12.87 rows=9 width=4) -> Bitmap Heap Scan on address a (cost=4.34..12.87 rows=9 width=4) Recheck Cond: (district = 'California'::text) -> Bitmap Index Scan on idx_address1 (cost=0.00..4.34 rows=9 width=0) Index Cond: (district = 'California'::text) (8 rows)

address扫描由idx_address1索引idx_address1代替，然后idx_address1了address堆。

由于这是一个频繁查询并且需要优化，因此我们可以使用部分索引，该索引仅索引那些地址为'California' ：

 pagila=# CREATE INDEX idx_address2 ON address(address_id) WHERE district='California'; CREATE INDEX pagila=# EXPLAIN SELECT c.email FROM customer c pagila-# JOIN address a ON c.address_id = a.address_id pagila-# WHERE a.district = 'California'; QUERY PLAN ------------------------------------------------------------------------------------------------ Hash Join (cost=12.38..28.96 rows=9 width=32) Hash Cond: (c.address_id = a.address_id) -> Seq Scan on customer c (cost=0.00..14.99 rows=599 width=34) -> Hash (cost=12.27..12.27 rows=9 width=4) -> Index Only Scan using idx_address2 on address a (cost=0.14..12.27 rows=9 width=4) (5 rows)

现在，该请求仅读取idx_address2 ，而不接触address表。

使用多值索引

一些需要索引的列可能不包含标量数据类型。诸如jsonb ， arrays和tsvector类的jsonb类型包含多个值。如果需要索引此类列，通常必须搜索这些列中的所有单个值。

让我们尝试查找所有包含未成功剪裁的电影的名称。 film表中有一个名为special_features的文本列。如果电影具有此“特殊属性”，则该列包含Behind The Scenes文本数组形式的元素。要搜索所有此类电影，我们需要选择“幕后花絮”的所有行作为special_features数组的任何值：

 SELECT title FROM film WHERE special_features @> '{"Behind The Scenes"}';

遏制运算符@>检查右侧是否是左侧的子集。

申请方案：

 pagila=# EXPLAIN SELECT title FROM film pagila-# WHERE special_features @> '{"Behind The Scenes"}'; QUERY PLAN ----------------------------------------------------------------- Seq Scan on film (cost=0.00..67.50 rows=5 width=15) Filter: (special_features @> '{"Behind The Scenes"}'::text[]) (2 rows)

它要求以67的成本进行全堆扫描。

让我们看看常规的B树索引是否对我们有帮助：

 pagila=# CREATE INDEX idx_film1 ON film(special_features); CREATE INDEX pagila=# EXPLAIN SELECT title FROM film pagila-# WHERE special_features @> '{"Behind The Scenes"}'; QUERY PLAN ----------------------------------------------------------------- Seq Scan on film (cost=0.00..67.50 rows=5 width=15) Filter: (special_features @> '{"Behind The Scenes"}'::text[]) (2 rows)

该指数甚至没有被考虑。 B树索引不知道索引值中是否存在单个元素。

我们需要一个GIN索引。

 pagila=# CREATE INDEX idx_film2 ON film USING GIN(special_features); CREATE INDEX pagila=# EXPLAIN SELECT title FROM film pagila-# WHERE special_features @> '{"Behind The Scenes"}'; QUERY PLAN --------------------------------------------------------------------------- Bitmap Heap Scan on film (cost=8.04..23.58 rows=5 width=15) Recheck Cond: (special_features @> '{"Behind The Scenes"}'::text[]) -> Bitmap Index Scan on idx_film2 (cost=0.00..8.04 rows=5 width=0) Index Cond: (special_features @> '{"Behind The Scenes"}'::text[]) (4 rows)

GIN-index支持将单个值与索引的复合值进行比较，因此，查询计划的成本降低了一半以上。

摆脱重复索引

索引随时间累积，有时新索引可能包含与先前索引相同的定义。要获得人类可读的索引SQL定义，可以使用目录视图pg_indexes 。您还可以轻松找到相同的定义：

  SELECT array_agg(indexname) AS indexes, replace(indexdef, indexname, '') AS defn FROM pg_indexes GROUP BY defn HAVING count(*) > 1; And here's the result when run on the stock pagila database: pagila=# SELECT array_agg(indexname) AS indexes, replace(indexdef, indexname, '') AS defn pagila-# FROM pg_indexes pagila-# GROUP BY defn pagila-# HAVING count(*) > 1; indexes | defn ------------------------------------------------------------------------+------------------------------------------------------------------ {payment_p2017_01_customer_id_idx,idx_fk_payment_p2017_01_customer_id} | CREATE INDEX ON public.payment_p2017_01 USING btree (customer_id {payment_p2017_02_customer_id_idx,idx_fk_payment_p2017_02_customer_id} | CREATE INDEX ON public.payment_p2017_02 USING btree (customer_id {payment_p2017_03_customer_id_idx,idx_fk_payment_p2017_03_customer_id} | CREATE INDEX ON public.payment_p2017_03 USING btree (customer_id {idx_fk_payment_p2017_04_customer_id,payment_p2017_04_customer_id_idx} | CREATE INDEX ON public.payment_p2017_04 USING btree (customer_id {payment_p2017_05_customer_id_idx,idx_fk_payment_p2017_05_customer_id} | CREATE INDEX ON public.payment_p2017_05 USING btree (customer_id {idx_fk_payment_p2017_06_customer_id,payment_p2017_06_customer_id_idx} | CREATE INDEX ON public.payment_p2017_06 USING btree (customer_id (6 rows)

超集索引

您可能会积累许多索引，其中一个索引建立了索引其他索引的列的子集。这可能是合乎需要的，也可能不是-超集只能导致通过索引进行扫描，这很好，但是会占用太多空间，或者不再使用要为其优化的查询。

如果需要自动化此类索引的定义，则可以从pg_catalog表中的pg_index开始。

未使用的索引

随着使用数据库的应用程序的发展，它们使用的查询也在发展。先前添加的索引可能不再被任何查询使用。每次扫描索引时，都会由统计信息管理器对其进行标记，并且在系统目录pg_stat_user_indexes您可以看到idx_scan值，它是一个累积计数器。在一段时间（例如一个月）内跟踪此值可以很好地了解哪些索引未使用且可以删除。

这是获取'public'模式中所有索引的当前扫描计数的请求：

 SELECT relname, indexrelname, idx_scan FROM pg_catalog.pg_stat_user_indexes WHERE schemaname = 'public'; with output like this: pagila=# SELECT relname, indexrelname, idx_scan pagila-# FROM pg_catalog.pg_stat_user_indexes pagila-# WHERE schemaname = 'public' pagila-# LIMIT 10; relname | indexrelname | idx_scan ---------------+--------------------+---------- customer | customer_pkey | 32093 actor | actor_pkey | 5462 address | address_pkey | 660 category | category_pkey | 1000 city | city_pkey | 609 country | country_pkey | 604 film_actor | film_actor_pkey | 0 film_category | film_category_pkey | 0 film | film_pkey | 11043 inventory | inventory_pkey | 16048 (10 rows)

用更少的锁重新创建索引

例如，当索引大小过大时，通常必须重新创建索引，并且重新创建可以加快扫描速度。另外，索引可能已损坏。更改索引参数可能还需要重新创建它。

启用并行索引创建

在PostgreSQL 11中，创建B树索引是有竞争力的。为了加快创建过程，可以使用几个并行工作器。但是，请确保正确设置以下配置参数：

 SET max_parallel_workers = 32; SET max_parallel_maintenance_workers = 16;

默认值太小。理想情况下，这些数量应与处理器内核数量一起增加。阅读文档以获取更多详细信息。

背景索引创建

您可以使用CREATE INDEX命令的CONCURRENTLY参数在后台CREATE INDEX ：

 pagila=# CREATE INDEX CONCURRENTLY idx_address1 ON address(district); CREATE INDEX

此索引创建过程不同于通常的索引创建过程，因为它不需要表锁定，因此不会阻止写操作。另一方面，它花费更多的时间并消耗更多的资源。

Postgres提供了许多灵活的选项来创建索引和解决任何特殊情况的方法，并提供了在应用程序爆炸性增长的情况下管理数据库的方法。我们希望这些技巧将帮助您快速进行查询，并准备好扩展数据库。

使用PostgreSQL中索引的所有功能