もふもふ技術部

IT技術系mofmofメディア

PostgreSQLでクエリの実行計画を見る

こんにちは。豊泉です。

データベースはPostgreSQLを使うことが多いのですが、最近実行計画をあまり見ていないなと思ったので、PostgreSQLでいくつかクエリを試しながら実行計画を見てみようと思います。

検証環境

DockerでPostgreSQLを立ち上げます。

クエリの実行やデータの確認にはpgAdminを使用します。

// compose.yml

services:
  postgres:
    image: postgres:16.3
    ports:
      - ${DB_PORT}:${DB_PORT}
    volumes:
      - db-store:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_PORT: ${DB_PORT}
      POSTGRES_DB: ${DB_NAME}
      TZ: Asia/Tokyo
    restart: always
  pgadmin:
    image: dpage/pgadmin4:8.8
    ports:
        - 8888:80
    volumes:
        - pgadmin-data:/var/lib/pgadmin
    environment:
        PGADMIN_DEFAULT_EMAIL: root@example.com
        PGADMIN_DEFAULT_PASSWORD: ${DB_PASSWORD}
        TZ: Asia/Tokyo
    depends_on:
      - postgres
    restart: always
volumes:
  db-store:
  pgadmin-data:

検証データ

検証データの準備にはprismaとfakerを使用しました。

// package.json

{
  "name": "postgresql-explain",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "@faker-js/faker": "^8.4.1",
    "@prisma/client": "^5.15.0",
    "prisma": "^5.15.0",
    "ts-node": "^10.9.2"
  },
  "prisma": {
    "seed": "ts-node prisma/seed.ts"
  }
}

AccountテーブルとUserテーブルを作成。

// schema.prisma

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}

enum AccountType {
  FREE
  STANDARD
  PREMIUM
  SPECIAL
}

model Account {
  id         Int          @id @default(autoincrement())
  name       String       @db.VarChar(255)
  type       AccountType  @default(STANDARD)
  createdAt  DateTime     @default(now()) @map("created_at")
  updatedAt  DateTime     @default(now()) @updatedAt @map("updated_at")

  users      User[]

  @@map("accounts")
}

enum UserRoleType {
  OWNER
  ADMIN
  STANDARD
}

model User {
  id         Int           @id @default(autoincrement())
  accountId  Int           @map("account_id")
  name       String        @db.VarChar(255)
  email      String        @db.VarChar(255) @unique
  role       UserRoleType  @default(STANDARD)
  createdAt  DateTime      @default(now()) @map("created_at")
  updatedAt  DateTime      @default(now()) @updatedAt @map("updated_at")

  account    Account       @relation(fields: [accountId], references: [id])

  @@map("users")
}

データを作成。

// seed.ts

import { PrismaClient } from '@prisma/client';
import { faker } from '@faker-js/faker';

const prisma = new PrismaClient();

const ACCOUNT_COUNT = 10;
const USER_COUNT = 10000;

enum AccountType {
  FREE = 'FREE',
  STANDARD = 'STANDARD',
  PREMIUM = 'PREMIUM',
  SPECIAL = 'SPECIAL',
}

enum UserRoleType {
  OWNER = 'OWNER',
  ADMIN = 'ADMIN',
  STANDARD = 'STANDARD',
}

async function main() {
  const accountTypes = Object.values(AccountType);
  const userRoleTypes = Object.values(UserRoleType);
  for (let i = 0; i < ACCOUNT_COUNT; i++) {
    const type = accountTypes[Math.floor(Math.random() * accountTypes.length)];
    const account = await prisma.account.create({
      data: {
        name: faker.company.name(),
        type: type,
      }
    });

    const users = [...Array(USER_COUNT)].map((_, index) => {
      const role = userRoleTypes[Math.floor(Math.random() * userRoleTypes.length)];
      return {
        name: faker.person.fullName(),
        email: `${i}${index}${faker.internet.email()}`,
        role: role,
        accountId: account.id,
      };
    });

    await prisma.user.createMany({
      data: users
    });
  }
}

main()
  .then(async () => {
    await prisma.$disconnect();
  })
  .catch(async (e) => {
    console.error(e);
    await prisma.$disconnect();
    process.exit(1);
  });

クエリを試してみる

  • EXPLAINコマンドを使用してクエリの実行計画を表示します。
  • 下記のオプションもつけてみます。
    • ANALYZE: 実際の実行時間やその他の統計情報を表示
    • BUFFERS: バッファの使用情報(メモリorディスクのI/O状況)を表示
  • クエリの実行前にVACUUM ANALYZE;しておきます。

Accountテーブルに10, 100, 1000, 10000, 100000件のデータを用意し全件取得してみる

// 実行するクエリ

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM public.accounts;
// 結果

// 10 rows
"Seq Scan on accounts  (cost=0.00..1.10 rows=10 width=42) (actual time=0.012..0.014 rows=10 loops=1)"
"  Buffers: shared hit=1"
"Planning Time: 0.271 ms"
"Execution Time: 0.031 ms"

// 100 rows
"Seq Scan on accounts  (cost=0.00..2.00 rows=100 width=43) (actual time=0.010..0.018 rows=100 loops=1)"
"  Buffers: shared hit=1"
"Planning Time: 0.058 ms"
"Execution Time: 0.038 ms"

// 1000 rows
"Seq Scan on accounts  (cost=0.00..20.00 rows=1000 width=43) (actual time=0.010..0.104 rows=1000 loops=1)"
"  Buffers: shared hit=10"
"Planning Time: 0.057 ms"
"Execution Time: 0.173 ms"

// 10000 rows
"Seq Scan on accounts  (cost=0.00..193.00 rows=10000 width=43) (actual time=0.017..1.218 rows=10000 loops=1)"
"  Buffers: shared hit=93"
"Planning Time: 0.074 ms"
"Execution Time: 1.820 ms"

// 100000 rows
"Seq Scan on accounts  (cost=0.00..1921.00 rows=100000 width=43) (actual time=0.011..8.659 rows=100000 loops=1)"
"  Buffers: shared hit=921"
"Planning Time: 2.020 ms"
"Execution Time: 17.374 ms"
  • 実行結果は上記のとおりでした
  • 試しにvarchar(255)型のカラムを追加してみたら widthの値が少し大きくなりました。
  • ANALYZEコマンドを実行すると推定行数と実際の行数が一致していました。

WHERE条件を追加してインデックスありorなしで試してみる

// 実行するクエリ

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM public.accounts
WHERE type = 'STANDARD';
// 結果

// インデックスなし
// 10 rows
"Seq Scan on accounts  (cost=0.00..1.12 rows=1 width=41) (actual time=0.017..0.017 rows=0 loops=1)"
"  Filter: (type = 'STANDARD'::""AccountType"")"
"  Rows Removed by Filter: 10"
"  Buffers: shared hit=1"
"Planning Time: 0.076 ms"
"Execution Time: 0.044 ms"

// 100 rows
"Seq Scan on accounts  (cost=0.00..2.25 rows=24 width=44) (actual time=0.018..0.032 rows=24 loops=1)"
"  Filter: (type = 'STANDARD'::""AccountType"")"
"  Rows Removed by Filter: 76"
"  Buffers: shared hit=1"
"Planning Time: 0.090 ms"
"Execution Time: 0.054 ms"

// 1000 rows
"Seq Scan on accounts  (cost=0.00..22.50 rows=252 width=43) (actual time=0.015..0.130 rows=252 loops=1)"
"  Filter: (type = 'STANDARD'::""AccountType"")"
"  Rows Removed by Filter: 748"
"  Buffers: shared hit=10"
"Planning Time: 0.082 ms"
"Execution Time: 0.161 ms"

// 10000 rows
"Seq Scan on accounts  (cost=0.00..218.00 rows=2532 width=43) (actual time=0.012..1.162 rows=2532 loops=1)"
"  Filter: (type = 'STANDARD'::""AccountType"")"
"  Rows Removed by Filter: 7468"
"  Buffers: shared hit=93"
"Planning Time: 0.076 ms"
"Execution Time: 1.320 ms"

// 100000 rows
"Seq Scan on accounts  (cost=0.00..2171.00 rows=25000 width=43) (actual time=0.016..11.027 rows=24963 loops=1)"
"  Filter: (type = 'STANDARD'::""AccountType"")"
"  Rows Removed by Filter: 75037"
"  Buffers: shared hit=921"
"Planning Time: 0.080 ms"
"Execution Time: 12.270 ms"

// インデックスあり
// 10 rows
"Seq Scan on accounts  (cost=0.00..1.12 rows=3 width=47) (actual time=0.018..0.021 rows=3 loops=1)"
"  Filter: (type = 'STANDARD'::""AccountType"")"
"  Rows Removed by Filter: 7"
"  Buffers: shared hit=1"
"Planning Time: 0.127 ms"
"Execution Time: 0.040 ms"

// 100 rows
"Seq Scan on accounts  (cost=0.00..2.25 rows=23 width=44) (actual time=0.015..0.030 rows=23 loops=1)"
"  Filter: (type = 'STANDARD'::""AccountType"")"
"  Rows Removed by Filter: 77"
"  Buffers: shared hit=1"
"Planning Time: 0.102 ms"
"Execution Time: 0.053 ms"

// 1000 rows
"Bitmap Heap Scan on accounts  (cost=6.06..19.13 rows=246 width=43) (actual time=0.042..0.099 rows=246 loops=1)"
"  Recheck Cond: (type = 'STANDARD'::""AccountType"")"
"  Heap Blocks: exact=10"
"  Buffers: shared hit=11"
"  ->  Bitmap Index Scan on accounts_type_idx  (cost=0.00..6.00 rows=246 width=0) (actual time=0.029..0.029 rows=246 loops=1)"
"        Index Cond: (type = 'STANDARD'::""AccountType"")"
"        Buffers: shared hit=1"
"Planning Time: 0.123 ms"
"Execution Time: 0.141 ms"

// 10000 rows
"Bitmap Heap Scan on accounts  (cost=31.74..156.11 rows=2510 width=43) (actual time=0.200..1.066 rows=2510 loops=1)"
"  Recheck Cond: (type = 'STANDARD'::""AccountType"")"
"  Heap Blocks: exact=93"
"  Buffers: shared hit=96"
"  ->  Bitmap Index Scan on accounts_type_idx  (cost=0.00..31.11 rows=2510 width=0) (actual time=0.170..0.171 rows=2510 loops=1)"
"        Index Cond: (type = 'STANDARD'::""AccountType"")"
"        Buffers: shared hit=3"
"Planning Time: 0.106 ms"
"Execution Time: 1.544 ms"

// 100000 rows
"Bitmap Heap Scan on accounts  (cost=280.18..1510.68 rows=24760 width=43) (actual time=1.177..6.132 rows=24963 loops=1)"
"  Recheck Cond: (type = 'STANDARD'::""AccountType"")"
"  Heap Blocks: exact=921"
"  Buffers: shared hit=944"
"  ->  Bitmap Index Scan on accounts_type_idx  (cost=0.00..273.99 rows=24760 width=0) (actual time=1.005..1.005 rows=24963 loops=1)"
"        Index Cond: (type = 'STANDARD'::""AccountType"")"
"        Buffers: shared hit=23"
"Planning Time: 0.097 ms"
"Execution Time: 7.361 ms"
  • 実行結果は上記のとおりでした
  • インデックスが存在する場合は1000行くらいから実行計画がBitmap Heap Scanになっていました。
  • また、100000行くらいになると実行速度にも差が出てきていました。

JOIN, ORDER BY, COUNT, LIMIT/OFFSETを試してみる

JOIN

// 実行するクエリ

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM public.accounts
INNER JOIN users ON accounts.id = users.account_id;
// 結果

"Hash Join  (cost=1.23..2657.97 rows=100000 width=114) (actual time=0.025..33.274 rows=100000 loops=1)"
"  Hash Cond: (users.account_id = accounts.id)"
"  Buffers: shared hit=1284"
"  ->  Seq Scan on users  (cost=0.00..2283.00 rows=100000 width=72) (actual time=0.006..7.511 rows=100000 loops=1)"
"        Buffers: shared hit=1283"
"  ->  Hash  (cost=1.10..1.10 rows=10 width=42) (actual time=0.009..0.010 rows=10 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        Buffers: shared hit=1"
"        ->  Seq Scan on accounts  (cost=0.00..1.10 rows=10 width=42) (actual time=0.002..0.003 rows=10 loops=1)"
"              Buffers: shared hit=1"
"Planning:"
"  Buffers: shared hit=4"
"Planning Time: 0.243 ms"
"Execution Time: 36.996 ms"

ORDER BY

// 実行するクエリ

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM public.users
ORDER BY created_at DESC;
// 結果

"Sort  (cost=14689.82..14939.82 rows=100000 width=72) (actual time=23.787..31.736 rows=100000 loops=1)"
"  Sort Key: created_at DESC"
"  Sort Method: external merge  Disk: 8456kB"
"  Buffers: shared hit=1283, temp read=1057 written=1060"
"  ->  Seq Scan on users  (cost=0.00..2283.00 rows=100000 width=72) (actual time=0.009..9.380 rows=100000 loops=1)"
"        Buffers: shared hit=1283"
"Planning Time: 0.324 ms"
"Execution Time: 35.338 ms"

COUNT

// 実行するクエリ

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(id) FROM public.accounts;
// 結果

"Aggregate  (cost=1.12..1.14 rows=1 width=8) (actual time=0.021..0.022 rows=1 loops=1)"
"  Buffers: shared hit=1"
"  ->  Seq Scan on accounts  (cost=0.00..1.10 rows=10 width=4) (actual time=0.011..0.013 rows=10 loops=1)"
"        Buffers: shared hit=1"
"Planning Time: 0.079 ms"
"Execution Time: 0.046 ms"

LIMIT/OFFSET

// 実行するクエリ

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM public.accounts
LIMIT 10
OFFSET 10;
// 結果

"Limit  (cost=1.10..1.21 rows=1 width=42) (actual time=0.015..0.016 rows=0 loops=1)"
"  Buffers: shared hit=1"
"  ->  Seq Scan on accounts  (cost=0.00..1.10 rows=10 width=42) (actual time=0.011..0.013 rows=10 loops=1)"
"        Buffers: shared hit=1"
"Planning Time: 0.220 ms"
"Execution Time: 0.031 ms"

クエリの実行速度を見てみる

pg_stat_statementsというエクステンションを入れることで、下記のようにしてクエリの情報を見ることができるようです。

// 実行するクエリ

SELECT query, calls, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

実行計画はJSONやYAML、XMLでも表示できるみたい

// 実行するクエリ

EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) // jsonの場合
SELECT * FROM public.accounts;
// json

[
  {
    "Plan": {
      "Node Type": "Seq Scan",
      "Parallel Aware": false,
      "Async Capable": false,
      "Relation Name": "accounts",
      "Alias": "accounts",
      "Startup Cost": 0.00,
      "Total Cost": 2279.00,
      "Plan Rows": 100000,
      "Plan Width": 71,
      "Actual Startup Time": 0.079,
      "Actual Total Time": 36.894,
      "Actual Rows": 100000,
      "Actual Loops": 1,
      "Shared Hit Blocks": 1279,
      "Shared Read Blocks": 0,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0
    },
    "Planning": {
      "Shared Hit Blocks": 0,
      "Shared Read Blocks": 0,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0
    },
    "Planning Time": 4.952,
    "Triggers": [
    ],
    "Execution Time": 43.987
  }
]
// yaml

- Plan: 
    Node Type: "Seq Scan"
    Parallel Aware: false
    Async Capable: false
    Relation Name: "accounts"
    Alias: "accounts"
    Startup Cost: 0.00
    Total Cost: 2279.00
    Plan Rows: 100000
    Plan Width: 71
    Actual Startup Time: 0.080
    Actual Total Time: 13.224
    Actual Rows: 100000
    Actual Loops: 1
    Shared Hit Blocks: 1279
    Shared Read Blocks: 0
    Shared Dirtied Blocks: 0
    Shared Written Blocks: 0
    Local Hit Blocks: 0
    Local Read Blocks: 0
    Local Dirtied Blocks: 0
    Local Written Blocks: 0
    Temp Read Blocks: 0
    Temp Written Blocks: 0
  Planning: 
    Shared Hit Blocks: 0
    Shared Read Blocks: 0
    Shared Dirtied Blocks: 0
    Shared Written Blocks: 0
    Local Hit Blocks: 0
    Local Read Blocks: 0
    Local Dirtied Blocks: 0
    Local Written Blocks: 0
    Temp Read Blocks: 0
    Temp Written Blocks: 0
  Planning Time: 0.411
  Triggers: 
  Execution Time: 19.508
// xml

<explain xmlns="http://www.postgresql.org/2009/explain">
  <Query>
    <Plan>
      <Node-Type>Seq Scan</Node-Type>
      <Parallel-Aware>false</Parallel-Aware>
      <Async-Capable>false</Async-Capable>
      <Relation-Name>accounts</Relation-Name>
      <Alias>accounts</Alias>
      <Startup-Cost>0.00</Startup-Cost>
      <Total-Cost>2279.00</Total-Cost>
      <Plan-Rows>100000</Plan-Rows>
      <Plan-Width>71</Plan-Width>
      <Actual-Startup-Time>0.041</Actual-Startup-Time>
      <Actual-Total-Time>12.188</Actual-Total-Time>
      <Actual-Rows>100000</Actual-Rows>
      <Actual-Loops>1</Actual-Loops>
      <Shared-Hit-Blocks>1279</Shared-Hit-Blocks>
      <Shared-Read-Blocks>0</Shared-Read-Blocks>
      <Shared-Dirtied-Blocks>0</Shared-Dirtied-Blocks>
      <Shared-Written-Blocks>0</Shared-Written-Blocks>
      <Local-Hit-Blocks>0</Local-Hit-Blocks>
      <Local-Read-Blocks>0</Local-Read-Blocks>
      <Local-Dirtied-Blocks>0</Local-Dirtied-Blocks>
      <Local-Written-Blocks>0</Local-Written-Blocks>
      <Temp-Read-Blocks>0</Temp-Read-Blocks>
      <Temp-Written-Blocks>0</Temp-Written-Blocks>
    </Plan>
    <Planning>
      <Shared-Hit-Blocks>0</Shared-Hit-Blocks>
      <Shared-Read-Blocks>0</Shared-Read-Blocks>
      <Shared-Dirtied-Blocks>0</Shared-Dirtied-Blocks>
      <Shared-Written-Blocks>0</Shared-Written-Blocks>
      <Local-Hit-Blocks>0</Local-Hit-Blocks>
      <Local-Read-Blocks>0</Local-Read-Blocks>
      <Local-Dirtied-Blocks>0</Local-Dirtied-Blocks>
      <Local-Written-Blocks>0</Local-Written-Blocks>
      <Temp-Read-Blocks>0</Temp-Read-Blocks>
      <Temp-Written-Blocks>0</Temp-Written-Blocks>
    </Planning>
    <Planning-Time>0.409</Planning-Time>
    <Triggers>
    </Triggers>
    <Execution-Time>15.334</Execution-Time>
  </Query>
</explain>

使用したソースはこちら

github.com

最後に

普段はORMがあるおかげでSQLを書くことは少ないですが、こんな感じで他にもいろんなクエリを試してみるのは発見も多くて面白そうです。