본문 바로가기
DBMS

[PostgreSQL 이중화] #4. PostgreSQL 14.5 Pgpool2+Watchdog을 이용한 이중화(1)

by 이미존재 2022. 11. 8.
반응형

■ PostgreSQL 이중화 구성(Pgpool2+Watchdog)


▶ Pgpool2 + Watchdog을 이용한 PostgreSQL 이중화 구성을 Active+Standby 형태로 구성합니다.
▶ pgpool-II version 4.1.4 버전을 사용하였습니다.
▶ pgpool2과 연동하여 사용한 Watchdog 버전은 5.16 버전입니다.
▶ 구성 환경은 Ubuntu 22.04 LTS 버전에서 PostgreSQL 14.5 버전을 apt install 한 환경입니다.
▶ Failover 되면 Master DB에 장애가 발생한 상태이므로, 원인이 확인되지 않은 상태에서 원복 되는 상황을 방지하기 위해 Auto Failback은 구성하지 않았습니다.
※유의 : pgpool2 4.1.4 버전의 경우 DB노드 3개 이상으로 클러스터 구성 해야 VIP를 이용한 서비스 및 Master Server Online Recovery 등이 원활하게 테스트 되는 것으로 확인됩니다. pgpool 4.3 이상 사용을 권장드립니다.
  - pgpool2는 버전마다 설정이 다소 상이한 부분이 있습니다. https://www.pgpool.net 에서 참고하시기 바랍니다.

▶ Master 서버는 현재 운영 중인 서버이며, Slave서버는 신규로 PostgreSQL설치 후 pg_baseback으로 Master 데이터를 가져온 상태입니다.
▶ pgpool은 Failover와 Failback과 관련한 스크립트가 정교하게 작성되어야 하며 버전 및 OS환경에 따라 달라질 수 있으므로 많은 사이트를 참고하여 최적화시켜야 합니다.


1. Pgpool 과 Watchdog의 기본 정보

 - Pgpool은 PostgreSQL의 커넥션과 이중화를 관리해주는 프로그램입니다.
 - 데이타베이스 클라이언트는 Pgpool의 VIP(가상IP 주소)로 접속하고 Pgpool에서 Master DB로 연결해 줍니다.
 - Pgpool에서 DB서버 상태 체크는 Watchdog을 이용합니다. 
 - Watchdog은 DB서버를 감시하고 있다가 Master에 장애가 발생하면 VIP를 Slave로 넘겨줍니다.

https://www.pgpool.net/docs/41/en/html/example-cluster.html

2. 시스템 구성 환경 설정

 1) Hostname 과 IP
  - Masger : berasix(192.168.48.129)
  - Slave : berasix-2(192.168.48.130)
  - VIP : 192.168.48.121
 2) pgpool2 환경
  : Config 파일 위치 - /etc/pgpool2/
 3) DB Users
  : postgres - PostgreSQL 관리 및 Online recovery
  : pgpool - Pgpool Health. 신규 생성 필요
  : repluser - Replication User. 이전 포스팅에서 생성
 4) Script
  4)-1. Failover 
   - /data/pgpool/script/failover.sh
    : /data/pgpool/script 디렉터리는 신규 생성 및 postgres 권한 필요
  4)-2. Failback
   - /data/pgpool/script/failback.sh
    : /data/postgres14-data/script 디렉터리는 신규 생성 및 postgres 권한 필요
  4)-3. Online Recovery
   - /data/postgres14-data/script/recovery_1st_stage
   - /data/postgres14-data/script/pgpool_remote_start
  4)-4. Watchdog escalation : Executes this command at escalation on new active pgpool
   - /data/pgpool/script/escalation.sh
 5) 디렉토리 생성과 권한 부여(Master, slave 서버 모두)
  5)-1. failover, failback 스트립트 디렉터리 생성

root@berasix:~# mkdir -p /data/pgpool/script 
root@berasix:~# chown -R postgres.postgres /data/pgpool

  5)-2. pgpool2 로그 디렉토리 생성 및 설정
   - 디렉토리 및 파일 생성

root@berasix:~# mkdir /var/log/pgpool2
root@berasix:~# touch /var/log/pgpool2/pgpool.log
root@berasix:~# chown -R syslog.adm /var/log/pgpool2

   - syslog 설정

root@berasix:~# vi /etc/rsyslog.d/50-default.conf
...
local2.*                                                /var/log/pgpool2/pgpool.log


  - log rotate 설정

root@berasix:~# vi /etc/logrotate.d/rsyslog
...
/var/log/pgpool2/pgpool.log


 6) /etc/hosts 파일에 서버 등록

root@berasix:/data/postgres14-data# cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 berasix
192.168.48.130  berasix-2
192.168.48.129  berasix


3. pgpool2 설치(Master, slave 서버 모두)

 - 설치는 편의상 root 계정으로 진행했습니다.
 - Ubuntu의 경우 apt로 install이 완료되면 데몬이 자동 기동 되니 알고 계시기 바랍니다.

root@berasix-2:/data/postgres14-data# apt install pgpool2

 - 설치된 pgpool2의 버전은 pgpool-II version 4.1.4 (karasukiboshi)입니다. 최신 버전은 4.3.3 이지만 Ubuntu repository에 등록된 버전을 사용하고자 아래 버전을 사용했습니다.


4. Watchdog 설치(Master, slave 서버 모두)

 - 설치는 편의상 root 계정으로 진행했습니다.

root@berasix:~# apt install watchdog

 

5. pgpool db 계정 생성(Master, slave 서버 모두)

root@berasix:~# sudo -u postgres psql
could not change directory to "/root": Permission denied
psql (14.5 (Ubuntu 14.5-0ubuntu0.22.04.1))
Type "help" for help.

postgres=# create role pgpool with login;
CREATE ROLE
postgres=# \password pgpool
Enter new password for user "pgpool": 
Enter it again: 
postgres=# grant pg_monitor to pgpool;
GRANT ROLE
postgres=#

 

6. Pgpool 환경설정(pgpool.conf)

 1) pgpool.conf 파일의 master, slave 설정 파일은 같으며, 다른 점은 hostname[서로의 IP를 설정] 설정 부분입니다.
  - 설정파일의 원본 파일은 백업해 둡니다.

   : root@berasix:~# cp -arp /etc/pgpool2/pgpool.conf /etc/pgpool2/pgpool.conf.org
  - vi /etc/pgpool2/pgpool.conf
#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------
listen_addresses = '*' #서비스 할 주소 입력
port = 5433  #서비스 할 기본 포트
pcp_listen_addresses = '*'
pcp_port = 9898

# - Backend Connection Settings - #PostgreSQL 정보 입력
backend_hostname0 = '192.168.48.129' # Master(berasix)
backend_port0 = 5432              # Master DB Port
backend_weight0 = 1
backend_data_directory0 = '/data/postgres14-data/main' # Master DB Directory
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = '192.168.48.130' # Slave(berasix-2)
backend_port1 = 5432 # DB Service Port
backend_weight1 = 1
backend_data_directory1 = '/data/postgres14-data/main' # DB Directory
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -
enable_pool_hba = on  #접속 인증용 pool_hba.conf 사용을 위해 on으로 합니다.

#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------

# - What to log -

log_connections = on # Default off.
log_hostname = on # Default off.

#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
logdir = '/var/log/postgresql' # LOG파일, pgpool_status 파일의 위치


#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------
load_balance_mode = on   # Default off. Active + hot_standby 로 운영시 on으로 변경 필요


#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------

master_slave_mode = on  # Default off. Active + hot_standby 로 운영시 on으로 변경 필요
master_slave_sub_mode = 'stream'

# - Streaming -

sr_check_period = 5      # Disabled (0) by default.  Active + hot_standby 로 운영시 on으로 변경 필요
sr_check_user = 'repluser' # Streaming 복제 계정
sr_check_password = 'replica'
sr_check_database = 'postgres'

# - Special commands -

follow_master_command = ''
#follow_master_command = '/etc/pgpool2/follow_master.sh %d %h %p %D %m %H %M %P %r %R' ## 3개 이상 노드에서 failover 시 master 설정. 2개노드에서는 필요없음

#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS
#------------------------------------------------------------------------------

health_check_period = 5   # Disabled (0) by default.       
health_check_timeout = 10
health_check_user = 'pgpool'
health_check_password = 'pgpool-2'
health_check_max_retries = 3  # 0 by default. Health Check 재시도는 상황에 맞게 설정


#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/data/pgpool/script/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
failback_command = '/data/pgpool/script/failback.sh %d %h %p %D %m %H %M %P %r %R %N %S'
auto_failback = off  # Default off


#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------

recovery_user = 'postgres'
recovery_password = ''
recovery_1st_stage_command = 'recovery_1st_stage'  # pcp_recovery_node 명령 사용시


#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -
use_watchdog = on  # Default off. Watchdog 사용

# - Watchdog communication Settings -
wd_hostname = '192.168.48.129' # Watchdog 체크 IP입니다. Slave(berasix-2) 서버는 '192.168.48.130'을 넣어줍니다.
wd_port = 9000

# - Virtual IP control Setting -
delegate_IP = '192.168.48.121' # db 서비스용 VIP입니다.
#if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev ens33 label ens33:0' # NIC 장치명에 맞게 넣어주세요
#if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev ens33'
arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I ens33' # NIC 장치명에 맞게 넣어주세요


# - Behaivor on escalation Setting -
wd_escalation_command = '/data/pgpool/script/escalation.sh'  # Escalation 명령어 스크립트. watchdog down시 VIP를 내립니다.


# -- heartbeat mode --
heartbeat_destination0 = '192.168.48.130' # heartbea 체크 IP입니다. #상태측 서버 IP 정보를 넣어줍니다.

other_pgpool_hostname0 = '192.168.48.130' #상태측 서버 IP 정보를 넣어줍니다.
other_pgpool_port0 = 5433  # Default 5432 인데 상대측 pgpool port 이므로 5433을 넣어줘야 한다.
other_wd_port0 = 9000

#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------

# - Where to log -

log_destination = 'syslog'
syslog_facility = 'local2'

#------------------------------------------------------------------------------

※ 참고

 1) arping 명령
 - ARP 요청을 이용한 네트워크 연결을 확인
 - Ubuntu 22.04에서는 기본 미설치로 설치를 해야 합니다.
root@berasix:/etc/pgpool2# apt install arping 

 2) VIP 사용을 위한 /sbin/ip, /usr/sbin/arping 명령을 postgres 명령에서 패스워드 없이 사용 가능 하도록 설정해줘야 합니다.
root@berasix:/etc/pgpool2# echo 'postgres        ALL=NOPASSWD: /sbin/ip, /usr/sbin/arping' >> /etc/sudoers

 

7. ssh key 생성(Master, Slave 서버 모두)

 - Failover 및 Failback을 위한 원격지 DB서버 ssh 접속 key 생성
 - ssh key 생성 관련해서는 아래 포스팅을 참고 하세요.
   https://berasix.tistory.com/entry/SSH-%ED%8C%A8%EC%8A%A4%EC%9B%8C%EB%93%9C%EC%97%86%EC%9D%B4-%EB%A1%9C%EA%B7%B8%EC%9D%B8%ED%95%98%EA%B8%B0

 1) root

cd ~/.ssh
ssh-keygen -t rsa -f id_rsa_pgpool
ssh-copy-id -i id_rsa_pgpool.pub postgres@192.168.48.129
ssh-copy-id -i id_rsa_pgpool.pub postgres@192.168.48.130


 2) postgres

su - postgres
mkdir ~/.ssh
chmod 700 ~/.ssh
cd ~/.ssh
ssh-keygen -t rsa -f id_rsa_pgpool
ssh-copy-id -i id_rsa_pgpool.pub postgres@192.168.48.129
ssh-copy-id -i id_rsa_pgpool.pub postgres@192.168.48.130

 

8. 스크립트 생성 및 복사(Master 서버 => Slave 서버)

 1) 해당 스크립트는 9번 내용을 참고 하시기 바랍니다.
 2) Online recovery를 위한 스트립트 경로는 아래와 같습니다.
$PGDATA/recovery_1st_stage
$PGDATA/pgpool_remote_start

 3) 스크립트를 Slave 서버 동일 경로에 복사 합니다.

root@berasix:~# chown -R postgres.postgres /data/pgpool
root@berasix:~# chown postgres.postgres /etc/pgpool2/pgpool.conf
root@berasix:~# scp -p /etc/pgpool2/pgpool.conf postgres@192.168.48.130:/etc/pgpool2/
root@berasix:~# scp -p /data/postgres14-data/main/recovery_1st_stage postgres@192.168.48.130:/data/postgres14-data/main
root@berasix:~# scp -p /data/postgres14-data/main/pgpool_remote_start postgres@192.168.48.130:/data/postgres14-data/main
root@berasix:~# scp -p /data/pgpool/script/failover.sh  postgres@192.168.48.130:/data/pgpool/script
root@berasix:~# scp -p /data/pgpool/script/failback.sh  postgres@192.168.48.130:/data/pgpool/script
root@berasix:~# scp -p /data/pgpool/script/escalation.sh  postgres@192.168.48.130:/data/pgpool/script


※ 참고

recovery_1st_stage 스크립트는 테이블스페이스를 지원하지 않습니다. 테이블스페이스를 사용하는 경우 테이블스페이스를 지원하도록 스크립트를 수정해야 합니다.

 

9. pgpool 스크립트

 - 해당 스크립트는 pgpool 예제 사이트 ( https://www.pgpool.net/docs/41/en/html/example-cluster.html )를 참고하시기 바랍니다.

 1) Failover 스크립트

  - /data/pgpool/script/failover.sh

더보기

#!/bin/bash
# This script is run by failover_command.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

# Special values:
#   %d = failed node id
#   %h = failed node hostname
#   %p = failed node port number
#   %D = failed node database cluster path
#   %m = new master node id
#   %H = new master node hostname
#   %M = old master node id
#   %P = old primary node id
#   %r = new master port number
#   %R = new master database cluster path
#   %N = old primary node hostname
#   %S = old primary node port number
#   %% = '%' character

FAILED_NODE_ID="$1"
FAILED_NODE_HOST="$2"
FAILED_NODE_PORT="$3"
FAILED_NODE_PGDATA="$4"
NEW_MASTER_NODE_ID="$5"
NEW_MASTER_NODE_HOST="$6"
OLD_MASTER_NODE_ID="$7"
OLD_PRIMARY_NODE_ID="$8"
NEW_MASTER_NODE_PORT="$9"
NEW_MASTER_NODE_PGDATA="${10}"
OLD_PRIMARY_NODE_HOST="${11}"
OLD_PRIMARY_NODE_PORT="${12}"

PGHOME=/usr/lib/postgresql/14
REPL_SLOT_NAME=${FAILED_NODE_HOST//[-.]/_}
POSTGRESQL_STARTUP_USER=postgres
SSH_KEY_FILE=id_rsa_pgpool
SSH_OPTIONS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/${SSH_KEY_FILE}"

logger -i -p local1.info failover.sh: start: failed_node_id=$FAILED_NODE_ID old_primary_node_id=$OLD_PRIMARY_NODE_ID failed_host=$FAILED_NODE_HOST new_master_host=$NEW_MASTER_NODE_HOST

## If there's no master node anymore, skip failover.
if [ $NEW_MASTER_NODE_ID -lt 0 ]; then
    logger -i -p local1.info failover.sh: All nodes are down. Skipping failover.
    exit 0
fi

## Test passwordless SSH
ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${NEW_MASTER_NODE_HOST} ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info failover.sh: passwordless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwordless SSH.
    exit 1
fi

## If Standby node is down, skip failover.
if [ $OLD_PRIMARY_NODE_ID != "-1" -a $FAILED_NODE_ID != $OLD_PRIMARY_NODE_ID ]; then
    logger -i -p local1.info failover.sh: Standby node is down. Skipping failover.

    ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${OLD_PRIMARY_NODE_HOST} "
        ${PGHOME}/bin/psql -p $OLD_PRIMARY_NODE_PORT -c \"SELECT pg_drop_replication_slot('${REPL_SLOT_NAME}')\"
    "

    if [ $? -ne 0 ]; then
        logger -i -p local1.error failover.sh: drop replication slot "${REPL_SLOT_NAME}" failed
        exit 1
    fi

    exit 0
fi

## Promote Standby node.
logger -i -p local1.info failover.sh: Primary node is down, promote standby node ${NEW_MASTER_NODE_HOST}.

ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${NEW_MASTER_NODE_HOST} ${PGHOME}/bin/pg_ctl -D ${NEW_MASTER_NODE_PGDATA} -w promote

if [ $? -ne 0 ]; then
    logger -i -p local1.error failover.sh: new_master_host=$NEW_MASTER_NODE_HOST promote failed
    exit 1
fi

logger -i -p local1.info failover.sh: end: new_master_node_id=$NEW_MASTER_NODE_ID started as the primary node
exit 0


 2) Failback 스크립트
  - /data/pgpool/script/failback.sh

더보기

#!/bin/bash
# This script is run after failover_command to synchronize the Standby with the new Primary.
# First try pg_rewind. If pg_rewind failed, use pg_basebackup.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

# Special values:
# 1)  %d = node id
# 2)  %h = hostname
# 3)  %p = port number
# 4)  %D = database cluster path
# 5)  %m = new primary node id
# 6)  %H = new primary node hostname
# 7)  %M = old master node id
# 8)  %P = old primary node id
# 9)  %r = new primary port number
# 10) %R = new primary database cluster path
# 11) %N = old primary node hostname
# 12) %S = old primary node port number
# 13) %% = '%' character

NODE_ID="$1"
NODE_HOST="$2"
NODE_PORT="$3"
NODE_PGDATA="$4"
NEW_MASTER_NODE_ID="$5"
NEW_MASTER_NODE_HOST="$6"
OLD_MASTER_NODE_ID="$7"
OLD_PRIMARY_NODE_ID="$8"
NEW_MASTER_NODE_PORT="$9"
NEW_MASTER_NODE_PGDATA="${10}"

PGHOME=/usr/lib/postgresql/14
ARCHIVEDIR=/data/postgres14-data/archive
SSH_USER=postgres
PCP_USER=pgpool
PGPOOL_PATH=/usr/sbin
PCP_PORT=9898
REPL_SLOT_NAME=${NODE_HOST//[-.]/_}
POSTGRESQL_STARTUP_USER=postgres
SSH_KEY_FILE=id_rsa_pgpool
SSH_OPTIONS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/${SSH_KEY_FILE}"

logger -i -p local1.info follow_master.sh: start: Standby node ${NODE_ID}

# Check the connection status of Standby
${PGHOME}/bin/pg_isready -h ${NODE_HOST} -p ${NODE_PORT} > /dev/null 2>&1

if [ $? -ne 0 ]; then
    logger -i -p local1.info follow_master.sh: node_id=${NODE_ID} is not running. skipping follow master command
    exit 0
fi

## Test passwordless SSH
ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${NEW_MASTER_NODE_HOST} ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info follow_master.sh: passwordless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwordless SSH.
    exit 1
fi

## Get PostgreSQL major version
PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`

if [ $PGVERSION -ge 12 ]; then
    RECOVERYCONF=${NODE_PGDATA}/myrecovery.conf
else
    RECOVERYCONF=${NODE_PGDATA}/recovery.conf
fi

# Synchronize Standby with the new Primary.
logger -i -p local1.info follow_master.sh: pg_rewind for node $NODE_ID

# Create replication slot "${REPL_SLOT_NAME}"
${PGHOME}/bin/psql -h ${NEW_MASTER_NODE_HOST} -p ${NEW_MASTER_NODE_PORT} \
    -c "SELECT pg_create_physical_replication_slot('${REPL_SLOT_NAME}');"  >/dev/null 2>&1

if [ $? -ne 0 ]; then
    logger -i -p local1.error follow_master.sh: create replication slot \"${REPL_SLOT_NAME}\" failed. You may need to create replication slot manually.
fi

ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${NODE_HOST} "

    set -o errexit

    ${PGHOME}/bin/pg_ctl -w -m f -D ${NODE_PGDATA} stop

    ${PGHOME}/bin/pg_rewind -D ${NODE_PGDATA} --source-server=\"user=postgres host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT}\"

    rm -rf ${NODE_PGDATA}/pg_replslot/*

    cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${SSH_OPTIONS} ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${REPL_SLOT_NAME}'
EOT

    if [ ${PGVERSION} -ge 12 ]; then
        sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
               -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${NODE_PGDATA}/postgresql.conf
        touch ${NODE_PGDATA}/standby.signal
    else
        echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
    fi

    ${PGHOME}/bin/pg_ctl -l /dev/null -w -D ${NODE_PGDATA} start

"

# If pg_rewind failed, try pg_basebackup
if [ $? -ne 0 ]; then
    logger -i -p local1.error follow_master.sh: end: pg_rewind failed. Try pg_basebackup.

    ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${NODE_HOST} "

        set -o errexit

        rm -rf ${NODE_PGDATA}
        rm -rf ${ARCHIVEDIR}/*
        ${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U $REPLUSER -p ${NEW_MASTER_NODE_PORT} -D ${NODE_PGDATA} -X stream

        cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${SSH_OPTIONS} ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${REPL_SLOT_NAME}'
EOT

        if [ ${PGVERSION} -ge 12 ]; then
            sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
                   -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${NODE_PGDATA}/postgresql.conf
            touch ${NODE_PGDATA}/standby.signal
        else
            echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
        fi
    "

    if [ $? -ne 0 ]; then
        # drop replication slot
        ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${NEW_MASTER_NODE_HOST} "
            ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${REPL_SLOT_NAME}')\"
        "

        logger -i -p local1.error follow_master.sh: end: pg_basebackup failed
        exit 1
    fi

    # start Standby node on ${NODE_HOST}
    ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${NODE_HOST} $PGHOME/bin/pg_ctl -l /dev/null -w -D ${NODE_PGDATA} start

fi

# If start Standby successfully, attach this node
if [ $? -eq 0 ]; then

    # Run pcp_attact_node to attach Standby node to Pgpool-II.
    ${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${NODE_ID}

    if [ $? -ne 0 ]; then
        logger -i -p local1.error follow_master.sh: end: pcp_attach_node failed
        exit 1
    fi

else

    # If start Standby failed, drop replication slot "${REPL_SLOT_NAME}"
    ${PGHOME}/bin/psql -h ${NEW_MASTER_NODE_HOST} -p ${NEW_MASTER_NODE_PORT} \
        -c "SELECT pg_drop_replication_slot('${REPL_SLOT_NAME}');"  >/dev/null 2>&1

    if [ $? -ne 0 ]; then
        logger -i -p local1.error follow_master.sh: drop replication slot \"${REPL_SLOT_NAME}\" failed. You may need to drop replication slot manually.
    fi

    logger -i -p local1.error follow_master.sh: end: follow master command failed
    exit 1
fi

logger -i -p local1.info follow_master.sh: end: follow master command complete
exit 0


 3) VIP Escalation 스크립트
  - /data/pgpool/script/escalation.sh

더보기

#!/bin/bash
# This script is run by wd_escalation_command to bring down the virtual IP on other pgpool nodes
# before bringing up the virtual IP on the new active pgpool node.

set -o xtrace

#PGPOOLS=(server1 server2)
PGPOOLS=(192.168.48.129 192.168.48.130)
VIP=192.168.48.121
DEVICE=ens33
USER=postgres

for pgpool in "${PGPOOLS[@]}"; do
    [ "$HOSTNAME" = "$pgpool" ] && continue

    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null ${USER}@$pgpool -i ~/.ssh/id_rsa_pgpool "
        /usr/bin/sudo /sbin/ip addr del $VIP/24 dev $DEVICE
    "
done
exit 0


 4) Master Server Online Recovery(standby mode) 스크립트
  - /data/postgres14-data/main/recovery_1st_stage

더보기

#!/bin/bash
# This script is executed by "recovery_1st_stage" to recovery a Standby node.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

PRIMARY_NODE_PGDATA="$1"
DEST_NODE_HOST="$2"
DEST_NODE_PGDATA="$3"
PRIMARY_NODE_PORT="$4"
DEST_NODE_ID="$5"
DEST_NODE_PORT="$6"

PRIMARY_NODE_HOST=$(hostname)
PGHOME=/usr/lib/postgresql/14
ARCHIVEDIR=/data/postgres14-data/archive
POSTGRESQL_CONF=/etc/postgresql/14/main
REPLUSER=repluser
REPL_SLOT_NAME=${DEST_NODE_HOST//[-.]/_}
POSTGRESQL_STARTUP_USER=postgres
SSH_KEY_FILE=id_rsa_pgpool
SSH_OPTIONS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/${SSH_KEY_FILE}"

logger -i -p local1.info recovery_1st_stage: start: pg_basebackup for Standby node $DEST_NODE_ID

## Test passwordless SSH
ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${DEST_NODE_HOST} ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info recovery_1st_stage: passwordless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwordless SSH.
    exit 1
fi

## Get PostgreSQL major version
PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`
if [ $PGVERSION -ge 12 ]; then
    RECOVERYCONF=${DEST_NODE_PGDATA}/myrecovery.conf
else
    RECOVERYCONF=${DEST_NODE_PGDATA}/recovery.conf
fi

## Create replication slot "${REPL_SLOT_NAME}"
${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
SELECT pg_create_physical_replication_slot('${REPL_SLOT_NAME}');
EOQ

## Execute pg_basebackup to recovery Standby node
ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@$DEST_NODE_HOST "

    set -o errexit

    rm -rf $DEST_NODE_PGDATA
    rm -rf $ARCHIVEDIR/*

    ${PGHOME}/bin/pg_basebackup -h $PRIMARY_NODE_HOST -U $REPLUSER -p $PRIMARY_NODE_PORT -D $DEST_NODE_PGDATA -X stream

    cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${PRIMARY_NODE_HOST} port=${PRIMARY_NODE_PORT} user=${REPLUSER} application_name=${DEST_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${SSH_OPTIONS} ${PRIMARY_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${REPL_SLOT_NAME}'
EOT

    if [ ${PGVERSION} -ge 12 ]; then
        sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
               -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${POSTGRESQL_CONF}/postgresql.conf
        touch ${DEST_NODE_PGDATA}/standby.signal
    else
        echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
    fi

    sed -i \"s/#*port = .*/port = ${DEST_NODE_PORT}/\" ${POSTGRESQL_CONF}/postgresql.conf
"

if [ $? -ne 0 ]; then

    ${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
SELECT pg_drop_replication_slot('${REPL_SLOT_NAME}');
EOQ

    logger -i -p local1.error recovery_1st_stage: end: pg_basebackup failed. online recovery failed
    exit 1
fi

logger -i -p local1.info recovery_1st_stage: end: recovery_1st_stage complete
exit 0


 5) Online Recovery 완료 후 기동 스크립트
  - /data/postgres14-data/main/pgpool_remote_start

더보기

#!/bin/bash
# This script is run after recovery_1st_stage to start Standby node.

set -o xtrace
exec > >(logger -i -p local1.info) 2>&1

DEST_NODE_HOST="$1"
DEST_NODE_PGDATA="$2"

PGHOME=/usr/lib/postgresql/14
POSTGRESQL_CONF=/etc/postgresql/14/main
POSTGRESQL_STARTUP_USER=postgres
SSH_KEY_FILE=id_rsa_pgpool
SSH_OPTIONS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/${SSH_KEY_FILE}"

logger -i -p local1.info pgpool_remote_start: start: remote start Standby node $DEST_NODE_HOST

## Test passwordless SSH
ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${DEST_NODE_HOST} ls /tmp > /dev/null

if [ $? -ne 0 ]; then
    logger -i -p local1.info pgpool_remote_start: passwordless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwordless SSH.
    exit 1
fi

## Start Standby node
ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@${DEST_NODE_HOST} "
    $PGHOME/bin/pg_ctl -l /dev/null -l /data/pgpool/pgctl_log -w -D $POSTGRESQL_CONF start
"

if [ $? -ne 0 ]; then
    logger -i -p local1.error pgpool_remote_start: $DEST_NODE_HOST PostgreSQL start failed.
    exit 1
fi

logger -i -p local1.info pgpool_remote_start: end: $DEST_NODE_HOST PostgreSQL started successfully.
exit 0



※ 참고 및 출처
https://www.pgpool.net/docs/41/en/html/example-cluster.html
https://www.pgpool.net/docs/42/en/html/example-cluster.html
https://www.pgpool.net/docs/41/en/html/runtime-watchdog-config.html

728x90

댓글