After few weeks trail-end-error method find out that replication in postgresql with multimaster and online-recovery is really simple. So:
I use:
- pgpool-II
- PostgreSQL 8.3.3 from Fedora Core 9
Testing environment was: Amazon EC2
Start configuration:
- Hostname: db, IP: 10.0.0.1, Use for: Pgpool-II node
- Hostname: pghost1, IP: 10.0.0.x, Use for: PostgreSQL node 1
- Hostname: pghost2, IP: 10.0.0.z, Use for: PostgreSQL node 2
First from pgpool-a side (/etc/pgpool.conf):
listen_addresses = '*'
port = 5432
pcp_port = 9898
socket_dir = '/var/run'
pcp_socket_dir = '/var/run'
backend_socket_dir = '/var/run'
pcp_timeout = 10
num_init_children = 32
max_pool = 4
child_life_time = 300
connection_life_time = 0
client_idle_limit=600
authentication_timeout=0
child_max_connections = 0
logdir = '/var/log'
replication_mode = true
replication_strict = true
replication_timeout = 5000
load_balance_mode = true
replication_stop_on_mismatch = true
replicate_select = false
reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
print_timestamp = true
master_slave_mode = false
connection_cache = true
health_check_timeout = 20
health_check_period = 60
health_check_user = 'hchecker'
failover_command = ''
failback_command = ''
insert_lock = false
ignore_leading_white_space = true
recovery_user='pgrecov'
recovery_password = ''
recovery_1st_stage_command = 'base-backup.sh'
recovery_2nd_stage_command = 'pgpool-recovery'
log_statement = true
log_connections = true
log_hostname = true
parallel_mode = false
enable_query_cache = false
pgpool2_hostname = 'db'
enable_pool_hba = false
backend_hostname0 = '10.0.0.x'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/data'
backend_hostname1 = '10.0.0.z'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/data'
also /etc/pcp.conf:
admin:a8555f20b804e53934ee112121ed3r5
From postgresql side:
Account for pgpoola:
- User: pooler
- Password: NONE
- Permissions: superuser.
- host all pooler 10.0.0.1/32 trust
psql -f pgpool-recovery.sql template1 (in source of pgpool you can find that file)- also should be compile pgpool function for online-recovery
- should be turn on PITR in postgresql server
- for each postgresql server data dir should contain 3 scripts: base-backup.sh, pgpool-recovery, pgpool_remote_start.
#!/bin/sh
# PGDATA path for master node
DATA=$1
# Recovery host name
RECOVERY_TARGET=$2
# PGDATA path for recovery node
RECOVERY_DATA=$3
# ok, this needs to be issued on some running backend, so I
# personally use pcp_node_count and pcp_node_info to get the
# ip address of a master (script follows, as soon as the
# cluster is running well
psql -c "select pg_start_backup('pgpool-recovery')" postgres
# I would favor /tmp/pgpool-recovery instead of /data,
# as it contains live data
echo "restore_command = 'scp $HOSTNAME:/data/archive_log/%f %p'" > /var/lib/pgsql/data/recovery.conf
# I guess pgsql is the datadir I've with /var/lib/postgresql/8.3/main
# under debian?
tar -C /mnt -zcf pgsql.tar.gz pgsql
psql -c 'select pg_stop_backup()' postgres
scp /mnt/pgsql/data/pgsql.tar.gz $RECOVERY_TARGET:/mnt/pgsql/
In my version of that script, data dir is unpack in /mnt cause I keep there postgresql server. It's place where I mount Amazon EC2 storage. Links to /var/lib/pgsql -> /mnt/pgsql
pgpool-recovery:
#!/bin/sh
# Archive a current xlog.
psql -c 'select pg_switch_xlog()' postgres
pgpool_remote_start:
#!/bin/sh
DEST=$1
DESTDIR=$2
PGCTL=/usr/bin/pg_ctl
# Expand a base backup
ssh -T $DEST 'cd /mnt/; tar zxf pgsql/pgsql.tar.gz' 2>/dev/null 1>/dev/null < /dev/null # Startup PostgreSQL server ssh -T -l root $DEST /etc/init.d/postgresql restart 2>/dev/null 1>/dev/null < /dev/null &
As far as I notice doesn't work yet restart when postgresql server is working.
Run pgpool:
# /usr/bin/pgpool -f ../etc/pgpool.conf -F ../etc/pcp.conf -n > /var/log/pgpool/pgpool.log 2>&1 &
Now small test:
- Turn off one postgresql server,
- Made some changes at base,
- Online-recovery: pcp_recovery_node 5 localhost 9898 admin secret_pass 1
- pgpool made backup at working node(node0) - checkpoint + tar data dir,
- Backup move at unworking node(node1),
- Second checkpoint copy WALs, connections from client side are stop,
- Untar,
- Turn on postgres at node1.
Debug:
2008-07-10 15:03:43 DEBUG: pid 15173: starting health checking
2008-07-10 15:03:43 DEBUG: pid 15173: health_check: 0 the DB node status: 2
2008-07-10 15:03:43 DEBUG: pid 15173: health_check: 1 the DB node status: 3
2008-07-10 15:03:49 DEBUG: pid 15207: I am PCP 15207 accept fd 5
2008-07-10 15:03:49 DEBUG: pid 15207: pcp_child: received PCP packet type of service 'M'
2008-07-10 15:03:49 DEBUG: pid 15207: pcp_child: salt sent to the client
2008-07-10 15:03:49 DEBUG: pid 15207: pcp_child: received PCP packet type of service 'R'
2008-07-10 15:03:49 DEBUG: pid 15207: pcp_child: authentication OK
2008-07-10 15:03:49 DEBUG: pid 15207: pcp_child: received PCP packet type of service 'O'
2008-07-10 15:03:49 DEBUG: pid 15207: pcp_child: start online recovery
2008-07-10 15:03:49 LOG: pid 15207: starting recovering node 1
2008-07-10 15:03:49 DEBUG: pid 15207: exec_checkpoint: start checkpoint
2008-07-10 15:03:49 DEBUG: pid 15207: exec_checkpoint: finish checkpoint
2008-07-10 15:03:49 LOG: pid 15207: CHECKPOINT in the 1st stage done
2008-07-10 15:03:49 LOG: pid 15207: starting recovery command: "SELECT pgpool_recovery('base-backup.sh', '10.0.0.y', '/var/lib/pgsql/data')"
2008-07-10 15:03:49 DEBUG: pid 15207: exec_recovery: start recovery
2008-07-10 15:04:43 DEBUG: pid 15173: starting health checking
2008-07-10 15:04:43 DEBUG: pid 15173: health_check: 0 the DB node status: 2
2008-07-10 15:04:43 DEBUG: pid 15173: health_check: 1 the DB node status: 3
2008-07-10 15:05:08 DEBUG: pid 15207: exec_recovery: finish recovery
2008-07-10 15:05:08 LOG: pid 15207: 1st stage is done
2008-07-10 15:05:08 LOG: pid 15207: starting 2nd stage
2008-07-10 15:05:08 LOG: pid 15207: all connections from clients have been closed
2008-07-10 15:05:08 DEBUG: pid 15207: exec_checkpoint: start checkpoint
2008-07-10 15:05:09 DEBUG: pid 15207: exec_checkpoint: finish checkpoint
2008-07-10 15:05:09 LOG: pid 15207: CHECKPOINT in the 2nd stage done
2008-07-10 15:05:09 LOG: pid 15207: starting recovery command: "SELECT pgpool_recovery('pgpool-recovery', '10.0.0.y', '/var/lib/pgsql/data')"
2008-07-10 15:05:09 DEBUG: pid 15207: exec_recovery: start recovery
2008-07-10 15:05:09 DEBUG: pid 15207: exec_recovery: finish recovery
2008-07-10 15:05:09 DEBUG: pid 15207: exec_remote_start: start pgpool_remote_start
2008-07-10 15:05:33 DEBUG: pid 15207: exec_remote_start: finish pgpool_remote_start
2008-07-10 15:05:33 LOG: pid 15207: 1 node restarted
2008-07-10 15:05:33 LOG: pid 15207: send_failback_request: fail back 1 th node request from pid 15207
2008-07-10 15:05:33 LOG: pid 15207: recovery done
2008-07-10 15:05:33 DEBUG: pid 15173: failover_handler called
2008-07-10 15:05:33 DEBUG: pid 15173: failover_handler: starting to select new master node
2008-07-10 15:05:33 LOG: pid 15173: starting fail back. reconnect host 10.252.178.18(5432)
2008-07-10 15:05:33 LOG: pid 15173: failover_handler: do not restart pgpool. same master node 0 was selected
2008-07-10 15:05:33 LOG: pid 15173: failback done. reconnect host 10.252.178.18(5432)
2008-07-10 15:05:33 DEBUG: pid 15207: pcp_child: received PCP packet type of service 'X'
2008-07-10 15:05:33 DEBUG: pid 15207: pcp_child: client disconnecting. close connection
2008-07-10 15:05:43 DEBUG: pid 15173: starting health checking
2008-07-10 15:05:43 DEBUG: pid 15173: health_check: 0 the DB node status: 2
2008-07-10 15:05:43 DEBUG: pid 15173: health_check: 1 the DB node status: 1
Any suggestions, mistakes, improves ? I would be really happy if you let me know.
#UPDATE 1 (24.09.2008):
- Notice that gzip PostgreSQL base dir isn't most smart idea. Better just tar it, otherway will take ages to create archive if database got few GB.
- Access to 'template1' at recovery process is blocked, not really know yet what to do with that.

3 komentarze:
Could you please translate this to english please!!!. here is really worthy information. Thanks
Yeah when got little time.
How do you enable PITR with Postgres, with your example?
archive_mode = on
But what do you have for the "archive_command"?
archive_command = '???'
Prześlij komentarz