温馨提示×

温馨提示×

您好,登录后才能下订单哦!

密码登录×
登录注册×
其他方式登录
点击 登录注册 即表示同意《亿速云用户服务条款》

扩展segment数量

发布时间:2020-08-30 03:24:07 来源:网络 阅读:1063 作者:faer615 栏目:数据库

第一部分 扩展Greenplum segment数量

1. 将主机加入集群(如果在原有主机扩展,不需要这一步)

环境配置,例如OS kernel 参数; 创建gp管理用户; ssh key的交换(使用gpssh-exkeys -e exist_hosts -x new_hosts); greenplum bin软件的拷贝; 规划segment 数据目录; 使用gpcheck检查 (gpcheck -f new_hosts ); 使用gpcheckperf检查性能 (gpcheckperf -f new_hosts_file -d /data1 -d /data2 -v)

2. 初始化segment并加入集群

这一步主要做的是 产生配置文件(gpexpand -f new_hosts_file),也可以自己写配置文件; 在指定目录初始化segment数据库(gpexpand -i cnf -D dbname ); 将新增的segment信息添加到master元表; 扩展失败了怎么处理?

3. 重分布表

规划表的重分布优先级顺序; 将表数据根据新的 segments 重新分布。 分析表;

第二部分 实例演示

1.一种是在集群中的原有机器中添加segment。

1.1 测试环境

4台虚拟机,每台虚拟机16G内存,跑8个segment。 查看配置 test=# select * from gp_segment_configuration ;

1.2 原地扩展8个segment

因为没有新增主机,所以直接进入第二步. 创建需要扩展segment的主机文件 cat > seg_nodes << EOF gpsegment62 gpsegment63 gpsegment64 EOF

1.3 产生配置文件

$gpexpand -f ./seg_nodes Would you like to initiate a new System Expansion Yy|Nn (default=N): > y How many new primary segments per host do you want to add? (default=0): 每个主机加几个segment > 8 Enter new primary data directory 1: segment data目录 > /greenplum/data/gpdatap9 此处中间省略。。。 Enter new primary data directory 8: segment data目录 > /greenplum/data/gpdatap16 Enter new mirror data directory 1: segment data目录 > /greenplum/data/gpdatam9 此处中间省略。。。 Enter new mirror data directory 8: segment data目录 > /greenplum/data/gpdatam16 Input configuration files were written to 'gpexpand_inputfile_20180814_140954' and 'None'. Please review the file and make sure that it is correct then re-run with: gpexpand -i gpexpand_inputfile_20180814_140954 -D digoal

1.4 产生的配置文件内容如下

$cat gpexpand_inputfile_20180814_140954 gpsegment62:gpsegment62:40008:/greenplum/data/gpdatap9/gpseg24:50:24:p:41008 gpsegment63:gpsegment63:50008:/greenplum/data/gpdatam9/gpseg24:82:24:m:51008 gpsegment62:gpsegment62:40009:/greenplum/data/gpdatap10/gpseg25:51:25:p:41009 gpsegment63:gpsegment63:50009:/greenplum/data/gpdatam10/gpseg25:83:25:m:51009 gpsegment62:gpsegment62:40010:/greenplum/data/gpdatap11/gpseg26:52:26:p:41010 gpsegment63:gpsegment63:50010:/greenplum/data/gpdatam11/gpseg26:84:26:m:51010 gpsegment62:gpsegment62:40011:/greenplum/data/gpdatap12/gpseg27:53:27:p:41011 gpsegment63:gpsegment63:50011:/greenplum/data/gpdatam12/gpseg27:85:27:m:51011 gpsegment62:gpsegment62:40012:/greenplum/data/gpdatap13/gpseg28:54:28:p:41012 ... gpsegment62:gpsegment62:50012:/greenplum/data/gpdatam13/gpseg44:78:44:m:51012 gpsegment64:gpsegment64:40013:/greenplum/data/gpdatap14/gpseg45:71:45:p:41013 gpsegment62:gpsegment62:50013:/greenplum/data/gpdatam14/gpseg45:79:45:m:51013 gpsegment64:gpsegment64:40014:/greenplum/data/gpdatap15/gpseg46:72:46:p:41014 gpsegment62:gpsegment62:50014:/greenplum/data/gpdatam15/gpseg46:80:46:m:51014 gpsegment64:gpsegment64:40015:/greenplum/data/gpdatap16/gpseg47:73:47:p:41015 gpsegment62:gpsegment62:50015:/greenplum/data/gpdatam16/gpseg47:81:47:m:51015 内容包括几个字段 hostname 主机名 address 类似主机名 port segment监听端口 fselocation segment data目录,注意是全路径 dbid gp集群的唯一ID,可以到gp_segment_configuration中获得,必须顺序累加 content 可以到gp_segment_configuration中获得,必须顺序累加 prefered_role 角色(p或m)(primary , mirror) replication_port 如果没有mirror则不需要(用于replication的端口)。 如果你觉得以上内容有问题,可以手工修改。

1.5 创建扩展数据目录

gpssh -f seg_nodes 'mkdir /greenplum/data/gpdatap{9..16} /greenplum/data/gpdatam{9..16}' 修改配置 gpconfig -c max_connections -v 1000 -m 500 gpconfig -c shared_buffers -v 64m -m 64m gpstop -afr

1.6 执行gpexpand进行扩展

gpexpand -i gpexpand_inputfile_20180814_140954 -D test -V -v -n 8 -B 1 -t /home/gpadmin/gpAdminLogs 解释一下命令 -B <batch_size> Batch size of remote commands to send to a given host before making a one-second pause. Default is 16. Valid values are 1-128. The gpexpand utility issues a number of setup commands that may exceed the host's maximum threshold for authenticated connections as defined by MaxStartups in the SSH daemon configuration. The one-second pause allows authentications to be completed before gpexpand issues any more commands. The default value does not normally need to be changed. However, it may be necessary to reduce the maximum number of commands if gpexpand fails with connection errors such as 'ssh_exchange_identification: Connection closed by remote host.' -D <database_name> Specifies the database in which to create the expansion schema and tables. If this option is not given, the setting for the environment variable PGDATABASE is used. The database templates template1 and template0 cannot be used. -i | --input <input_file> Specifies the name of the expansion configuration file, which contains one line for each segment to be added in the format of: <hostname>:<address>:<port>:<fselocation>:<dbid>:<content>:<preferred_role>:<replication_port> -n <parallel_processes> The number of tables to redistribute simultaneously. Valid values are 1 - 16. Each table redistribution process requires two database connections: one to alter the table, and another to update the table's status in the expansion schema. Before increasing -n, check the current value of the server configuration parameter max_connections and make sure the maximum connection limit is not exceeded. -S | --simple_progress Show simple progress view. -t | --tardir <directory> Specify the temporary directory on segment hosts to put tar file. -v | --verbose Verbose debugging output. With this option, the utility will output all DDL and DML used to expand the database. -V | --novacuum Do not vacuum catalog tables before creating schema copy.

1.7 运行成功的输出如下

20180814:14:15:20:002788 gpexpand:gpmaster61:gpadmin-[DEBUG]:-Transitioning from PREPARE_EXPANSION_SCHEMA_STARTED to PREPARE_EXPANSION_SCHEMA_DONE 20180814:14:15:20:002788 gpexpand:gpmaster61:gpadmin-[DEBUG]:-Transitioning from PREPARE_EXPANSION_SCHEMA_DONE to EXPANSION_PREPARE_DONE 20180814:14:15:20:002788 gpexpand:gpmaster61:gpadmin-[DEBUG]:-Removing segment configuration backup file 20180814:14:15:20:002788 gpexpand:gpmaster61:gpadmin-[INFO]:-Stopping Greenplum Database 20180814:14:16:18:002788 gpexpand:gpmaster61:gpadmin-[INFO]:-Starting Greenplum Database 20180814:14:16:36:002788 gpexpand:gpmaster61:gpadmin-[INFO]:-Starting new mirror segment synchronization 20180814:14:17:48:002788 gpexpand:gpmaster61:gpadmin-[INFO]:-************************************************ 20180814:14:17:48:002788 gpexpand:gpmaster61:gpadmin-[INFO]:-Initialization of the system expansion complete. 20180814:14:17:48:002788 gpexpand:gpmaster61:gpadmin-[INFO]:-To begin table expansion onto the new segments 20180814:14:17:48:002788 gpexpand:gpmaster61:gpadmin-[INFO]:-rerun gpexpand 20180814:14:17:48:002788 gpexpand:gpmaster61:gpadmin-[INFO]:-************************************************ 20180814:14:17:48:002788 gpexpand:gpmaster61:gpadmin-[INFO]:-Exiting... 20180814:14:17:48:002788 gpexpand:gpmaster61:gpadmin-[DEBUG]:-WorkerPool haltWork() 20180814:14:17:48:002788 gpexpand:gpmaster61:gpadmin-[DEBUG]:-[worker0] haltWork 20180814:14:17:48:002788 gpexpand:gpmaster61:gpadmin-[DEBUG]:-[worker0] got a halt cmd

1.8 扩展失败回滚

启动限制模式,回滚。 gpstart -R gpexpand --rollback -D test gpstart -a 然后找问题继续上一步,直到成功。

1.9 成功后可以看到节点数变多了

test=# select * from gp_segment_configuration ;

1.10 在数据重分布前,新增的节点对老数据不起作用。

接下来可以计划重分布任务中,表的调度顺序了 digoal=# select * from gpexpand. gpexpand.status gpexpand.status_detail gpexpand.expansion_progress digoal=# select * from gpexpand.status; status | updated ------------+---------------------------- SETUP | 2015-12-17 16:50:07.15973 SETUP DONE | 2015-12-17 16:50:16.427367 (2 rows) 查看接下来的任务,如果要调整任务的先后顺序,改rank即可。 digoal=# select * from gpexpand.status_detail ; dbname | fq_name | schema_oid | table_oid | distribution_policy | distribution_policy_names | distribution_policy_coloids | storage_options | rank | status | expansion_started | expansion_finished | source_bytes --------+--------------+------------+-----------+---------------------+---------------------------+-----------------------------+-----------------+------+-------------+-------------------+--------------------+-------------- digoal | public.test | 2200 | 17156 | {1} | id | 17156 | | 2 | NOT STARTED | | | 0 digoal | public.test1 | 2200 | 17182 | {1} | id | 17182 | | 2 | NOT STARTED | | | 0 (2 rows) 例如: => UPDATE gpexpand.status_detail SET rank=10; => UPDATE gpexpand.status_detail SET rank=1 WHERE fq_name = 'public.lineitem'; => UPDATE gpexpand.status_detail SET rank=2 WHERE fq_name = 'public.orders'; These commands lower the priority of all tables to 10 and then assign a rank of 1 to lineitem and a rank of 2 to orders. When table redistribution begins, lineitem is redistributed first, followed by orders and all other tables in gpexpand.status_detail. To exclude a table from redistribution, remove the table from gpexpand.status_detail.

还有多少个表未完成重分布
digoal=# select * from gpexpand.expansion_progress ;
name | value
-------------+-------
Tables Left | 2
(1 row)

2.11 执行重分布命令。

需要指定计划在多久内完成,或者计划在哪天完成重分布,脚本会自动调度重分布。 gpexpand -a -d 1:00:00 -D test -S -t /tmp -v -n 1 命令解释 To begin the redistribution phase, you must run gpexpand with either the -d (duration) or -e (end time) options. Until the specified end time or duration is reached, the utility will redistribute tables in the expansion schema. Each table is reorganized using ALTER TABLE commands to rebalance the tables across new segments, and to set tables to their original distribution policy. If gpexpand completes the reorganization of all tables before the specified duration, it displays a success message and ends. NOTE: Data redistribution should be performed during low-use hours. Redistribution can divided into batches over an extended period. -a | --analyze Run ANALYZE to update the table statistics after expansion. The default is to not run ANALYZE. -d | --duration <hh:mm:ss> Duration of the expansion session from beginning to end. -D <database_name> Specifies the database in which to create the expansion schema and tables. If this option is not given, the setting for the environment variable PGDATABASE is used. The database templates template1 and template0 cannot be used. -e | --end '<YYYY-MM-DD hh:mm:ss>' Ending date and time for the expansion session. -S | --simple_progress Show simple progress view. -t | --tardir <directory> Specify the temporary directory on segment hosts to put tar file. -v | --verbose Verbose debugging output. With this option, the utility will output all DDL and DML used to expand the database. -n <parallel_processes> The number of tables to redistribute simultaneously. Valid values are 1 - 16. Each table redistribution process requires two database connections: one to alter the table, and another to update the table's status in the expansion schema. Before increasing -n, check the current value of the server configuration parameter max_connections and make sure the maximum connection limit is not exceeded. 重分布过程中,可以看到进度。 digoal=# select * from gpexpand.expansion_progress ; name | value -----------------+------- Tables Expanded | 1 Tables Left | 1 (2 rows) test=# select * from gpexpand.status_detail ; dbname | fq_name | schema_oid | table_oid | distribution_policy | distribution_policy_names | distribution_policy_coloids | storage_options | rank | status | expansion_started | expansion_finished | source_byte s --------+--------------+------------+-----------+---------------------+---------------------------+------- -- test | public.test | 2200 | 17156 | {1} | id | 17156 | | 2 | NOT STARTED | | |0 test | public.test1 | 2200 | 17182 | {1} | id | 17182 | | 2 | COMPLETED | 2015-12-17 17:12:12.43088 | 2015-12-17 17:13:27.335207 | 0 (2 rows) # 或者在命令行看进度 20151217:17:12:11:020043 gpexpand:digoal193096:digoal-[DEBUG]:-['digoal', 'public.test1', 2200L, 17182L, '{1}', 'id', '17182', None, 2, 'NOT STARTED', None, None, Decimal('0')] 20151217:17:12:11:020043 gpexpand:digoal193096:digoal-[DEBUG]:-Adding cmd to work_queue: None 20151217:17:12:11:020043 gpexpand:digoal193096:digoal-[DEBUG]:-['digoal', 'public.test', 2200L, 17156L, '{1}', 'id', '17156', None, 2, 'NOT STARTED', None, None, Decimal('0')] 20151217:17:12:11:020043 gpexpand:digoal193096:digoal-[DEBUG]:-Adding cmd to work_queue: None 20151217:17:12:11:020043 gpexpand:digoal193096:digoal-[DEBUG]:-woke up. queue: 2 finished 0 。。。。 20151217:17:14:36:020043 gpexpand:digoal193096:digoal-[DEBUG]:-woke up. queue: 2 finished 1 20151217:17:14:40:020043 gpexpand:digoal193096:digoal-[INFO]:-Analyzing public.test 20151217:17:14:41:020043 gpexpand:digoal193096:digoal-[DEBUG]:-woke up. queue: 2 finished 1 20151217:17:14:43:020043 gpexpand:digoal193096:digoal-[INFO]:-Finished expanding digoal.public.test 20151217:17:14:43:020043 gpexpand:digoal193096:digoal-[DEBUG]:-UPDATE gpexpand.status_detail SET status = 'COMPLETED', expansion_started='2015-12-17 17:13:29.258085', expansion_finished='2015-12-17 17:14:43.552232' WHERE dbname = 'digoal' AND schema_oid = 2200 AND table_oid = 17156 20151217:17:14:44:020043 gpexpand:digoal193096:digoal-[DEBUG]:-[worker0] finished cmd: name cmdStr='None' 20151217:17:14:46:020043 gpexpand:digoal193096:digoal-[DEBUG]:-WorkerPool haltWork() 20151217:17:14:46:020043 gpexpand:digoal193096:digoal-[DEBUG]:-[worker0] haltWork 。。。 20151217:17:14:54:020043 gpexpand:digoal193096:digoal-[INFO]:-EXPANSION COMPLETED SUCCESSFULLY 20151217:17:14:54:020043 gpexpand:digoal193096:digoal-[INFO]:-Exiting...

2.12 最后一步操作,清除重分布产生的schema gpexpand

gpexpand -c -D test 问你是否需要在清除gpexpand schema前将状态信息导出。 Do you want to dump the gpexpand.status_detail table to file? Yy|Nn (default=Y): > y

2. 另一种是新增主机,添加segment

例子2,再扩展,6个segment,并且新加一台主机。达到每个主机分布4个SEGMENT的目的。 和例子1的差别就在于新加了主机,所以需要额外的过程。 过程概要 环境配置,例如OS kernel 参数; 创建gp管理用户; ssh key的交换(使用gpssh-exkeys -e exist_hosts -x new_hosts); greenplum bin软件的拷贝; 规划segment 数据目录; 使用gpcheck检查 (gpcheck -f new_hosts ); 使用gpcheckperf检查性能 (gpcheckperf -f new_hosts_file -d /data1 -d /data2 -v)

2.1 新增主机的操作如下

yum -y install rsync coreutils glib2 lrzsz sysstat e4fsprogs xfsprogs ntp readline-devel zlib zlib-devel openssl openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc make smartmontools flex bison perl perl-devel perl-ExtUtils* OpenIPMI-tools openldap openldap-devel logrotate cat > /etc/sysctl.conf << EOF kernel.shmmax = 68719476736 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 50100 64128000 50100 1280 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 1 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.ipv4.ip_local_port_range = 1025 65535 net.core.netdev_max_backlog = 10000 net.core.rmem_max = 2097152 net.core.wmem_max = 2097152 vm.overcommit_memory = 2 fs.file-max = 7672460 net.ipv4.netfilter.ip_conntrack_max = 655360 fs.aio-max-nr = 1048576 net.ipv4.tcp_keepalive_time = 72 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_intvl = 7 EOF sysctl -p cat >> /etc/security/limits.conf EOF * soft nofile 131072 * hard nofile 131072 * soft nproc 131072 * hard nproc 131072 * soft memlock unlimited * hard memlock unlimited EOF rm -f /etc/security/limits.d/90-nproc.conf

2.2 把所有主机的IP和主机名列到/etc/hosts中。使用真实的主机名。(需要在所有节点执行)

cat > /etc/hosts << EOF
127.0.0.1 localhost
192.168.61.61 node61
192.168.61.62 node62
192.168.61.63 node63
192.168.61.64 node64
192.168.61.65 node65
192.168.61.66 node66
192.168.61.67 node67
EOF

2.3 文件系统挂载选项

noatime,nodiratime,nobarrier,discard,nodelalloc,data=writeback

2.4 设置块设备预读

/sbin/blockdev --setra 16384 /dev/xvda1 创建一个管理greenplum 的用户,这里使用digoal 创建一个目录,放gp软件, 给greenplum管理用户写权限,也可以直接使用用户的HOME目录,例如/home/digoal/greenplum-db-4.3.6.1 创建一个目录,放数据库, 给greenplum管理用户写权限 # mkdir -p /data01/gpdata # chown -R digoal /data01/gpdata # chmod -R 700 /data01/gpdata

2.5 以下在主节点执行

创建主机文件,包括所有节点以及主节点本身 $ vi host_exist digoal193096.zmf digoal199092.zmf digoal200164.zmf digoal204016.zmf digoal204063.zmf $ vi host_new digoal209198.zmf 交换KEY,master使用gp管理用户(digoal)访问所有的segment不需要输入密码,master pub拷贝到所有的segment authorized_keys $ gpssh-exkeys -e host_exist -x host_new 安装软件到segment hosts $gpseginstall -f ./host_new -u digoal 使用gpcheck检查 $ gpcheck -f host_new 使用gpcheckperf检查性能 $ gpcheckperf -f host_new -d /data01/gpdata -v 接下来的操作和前面就差不多了,如下: $vi host digoal204016.zmf digoal204063.zmf digoal209198.zmf 产生配置文件 $gpexpand -f ./host -c 产生的配置文件内容如下 $cat gpexpand_inputfile_20151217_173855 digoal209198.zmf:digoal209198.zmf:40000:/data01/gpdata/gpseg20:22:20:p digoal209198.zmf:digoal209198.zmf:40001:/data01/gpdata/gpseg21:23:21:p digoal209198.zmf:digoal209198.zmf:40002:/data01/gpdata/gpseg22:24:22:p digoal209198.zmf:digoal209198.zmf:40003:/data01/gpdata/gpseg23:25:23:p digoal193096.zmf:digoal193096.zmf:40004:/data01/gpdata/gpseg24:26:24:p digoal199092.zmf:digoal199092.zmf:40004:/data01/gpdata/gpseg25:27:25:p digoal200164.zmf:digoal200164.zmf:40004:/data01/gpdata/gpseg26:28:26:p digoal204016.zmf:digoal204016.zmf:40004:/data01/gpdata/gpseg27:29:27:p digoal204063.zmf:digoal204063.zmf:40004:/data01/gpdata/gpseg28:30:28:p digoal209198.zmf:digoal209198.zmf:40004:/data01/gpdata/gpseg29:31:29:p 需要人为调整一下: (dbid, contendid都务必连续, 通过查看gp_segment_configuration) (同一主机,端口不能冲突) digoal204016.zmf:digoal204016.zmf:40004:/data01/gpdata/gpseg20:22:20:p digoal204063.zmf:digoal204063.zmf:40004:/data01/gpdata/gpseg21:23:21:p digoal209198.zmf:digoal209198.zmf:40000:/data01/gpdata/gpseg22:24:22:p digoal209198.zmf:digoal209198.zmf:40001:/data01/gpdata/gpseg23:25:23:p digoal209198.zmf:digoal209198.zmf:40002:/data01/gpdata/gpseg24:26:24:p digoal209198.zmf:digoal209198.zmf:40003:/data01/gpdata/gpseg25:27:25:p 接下来需要修改greenplum bin目录权限,gpexpand需要在这个目录写一些东西。 chmod -R 700 /opt/gpdb

2.6 执行gpexpand进行扩展

$ gpexpand -i ./gpexpand_inputfile_20151217_173855 -D digoal -S -V -v -n 1 -B 1 -t /tmp 执行重分布命令。需要指定计划在多久内完成,或者计划在哪天完成重分布,脚本会自动调度重分布。 $ gpexpand -a -d 1:00:00 -D digoal -S -t /tmp -v -n 1
向AI问一下细节

免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。

AI