分布式postgres部署不同步-Java 学习之路

我设置了一个分布式postgres数据库系统，并使用hot_standby wal_level配置了复制 .

有一个中央主数据库，其中有多个副本（目前全世界有15个）用作只读实例 - 因此不需要故障转移 - 我们只想将数据同步到我们可以读取它们的远程位置 .

主：

wal_level = hot_standby
max_wal_senders = 20
checkpoint_segments = 8    
wal_keep_segments = 8

客户：

wal_level = hot_standby
max_wal_senders = 3
checkpoint_segments = 8    
wal_keep_segments = 8 
hot_standby = on

客户端的/var/lib/postgresql/9.4/recovery.conf：

standby_mode = 'on'
primary_conninfo = 'host=<IP of master> port=5432 user=replicator password=xxxx sslmode=require'
trigger_file = '/tmp/postgresql.trigger'

复制开始 - 有些日子似乎都很好 . 几天之后，主人似乎没有接受更多连接进行复制......

客户：

2017-05-04 01:16:51 UTC [9608-1] FATAL:  could not connect to the primary server: FATAL:  sorry, too many clients already
2017-05-04 01:16:57 UTC [10807-1] FATAL:  could not connect to the primary server: FATAL:  sorry, too many clients already
2017-05-04 01:17:02 UTC [12022-1] FATAL:  could not connect to the primary server: FATAL:  sorry, too many clients already
2017-05-04 01:17:06 UTC [13217-1] FATAL:  could not connect to the primary server: FATAL:  remaining connection slots are reserved for non-replication superuser connections
...

主：

然后loag充满了下面的消息 - 它永远不会恢复......

2017-05-04 08:44:14 UTC [24850-1] replicator@[unknown] ERROR:  requested WAL segment 000000010000003500000014 has already been removed
2017-05-04 08:44:19 UTC [25958-1] replicator@[unknown] ERROR:  requested WAL segment 000000010000003500000014 has already been removed
2017-05-04 08:44:24 UTC [27063-1] replicator@[unknown] ERROR:  requested WAL segment 000000010000003500000014 has already been removed
2017-05-04 08:44:29 UTC [28144-1] replicator@[unknown] ERROR:  requested WAL segment 000000010000003500000014 has already been removed
2017-05-04 08:44:34 UTC [29227-1] replicator@[unknown] ERROR:  requested WAL segment 000000010000003500000014 has already been removed
2017-05-04 08:44:39 UTC [30316-1] replicator@[unknown] ERROR:  requested WAL segment 000000010000003500000014 has already been removed
...

客户：

2017-04-30 11:26:22 UTC [28474-1] LOG:  started streaming WAL from primary at 35/14000000 on timeline 1
2017-04-30 11:26:22 UTC [28474-2] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000003500000014 has already been removed
2017-04-30 11:26:26 UTC [29328-1] LOG:  started streaming WAL from primary at 35/14000000 on timeline 1
2017-04-30 11:26:26 UTC [29328-2] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000003500000014 has already been removed
2017-04-30 11:26:31 UTC [30394-1] LOG:  started streaming WAL from primary at 35/14000000 on timeline 1
2017-04-30 11:26:31 UTC [30394-2] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000003500000014 has already been removed
...

所以我的问题：

是否有更好的方法来将更改同步到远程只读实例，或者我只是在配置中出现错误，使我的复制在一段时间后中断？

1 回答

1

https://www.postgresql.org/docs/current/static/runtime-config-replication.html：

max_wal_senders（整数）指定来自备用服务器或流式基本备份客户端的最大并发连接数（即，同时运行的WAL发送器进程的最大数量） . 默认值为零，表示禁用复制 . WAL发送方进程计入连接总数，因此参数不能设置为高于max_connections . 突然流式客户端断开连接可能会导致孤立连接插槽，直到达到超时，因此此参数应设置为略高于预期客户端的最大数量，以便断开连接的客户端可以立即重新连接 .

（强调我的） . 应用程序连接或孤立连接导致您的

致命：抱歉，已有太多客户

您可能希望为应用程序使用某些连接池，例如pgbouncer，在实际发生之前限制太多连接 .

如果您设置为在某处实际复制WAL，请回答您的问题，修改slave上的recovery.conf中的restore_command以获取它们 . 它将允许奴隶从失去的那一刻开始追赶 . 否则你必须重建它 .

回复于 2024-05-10T14:39:03+08:00

分布式postgres部署不同步

1 回答

相关问题