找回密码
 注册
查看: 3898|回复: 0

Openstack-Mitaka 高可用之 Pacemaker+corosync+pcs 高可用集群

[复制链接]

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
发表于 2018-10-19 16:05:14 | 显示全部楼层 |阅读模式
介绍及特点
) j4 r" T" W" h& d    Pacemaker:工作在资源分配层,提供资源管理器的功能
) a, l' x. O  L1 b! c    Corosync:提供集群的信息层功能,传递心跳信息和集群事务信息
4 ]1 N. L  O6 `2 T    Pacemaker + Corosync 就可以实现高可用集群架构
. ^5 q3 \+ O9 o5 P) w+ q. g1 X1 ^- ?
: K( T8 V3 c  P% o 集群搭建; B6 O5 Y* Q, C7 ~& Z4 C. Z3 w
以下三个节点都需要执行:
% S0 y9 F( H' D9 r; v0 c( M+ P! A! r9 g: C4 R5 h: f% V
# yum install pcs -y
, w, U" {" T! `1 |" Q# m: I) K# systemctl start  pcsd ; systemctl enable pcsd
' G% L) Q; j5 g- f  M- c# echo 'hacluster' | passwd --stdin hacluster, a4 `7 A1 A9 L
# yum install haproxy  rsyslog -y
0 ]! O: A/ ^9 I" h8 |# K: n- b# echo 'net.ipv4.ip_nonlocal_bind = 1' >> /etc/sysctl.conf        # 启动服务的时候,允许忽视VIP的存在( V3 {8 o( b. F; [! M- Y
# echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf        # 开启内核转发功能4 v' D- `# V" w, ]) w5 l
# sysctl -p# u) q' M1 _) O

) M, p$ W# d( Q& |" [7 }在任意节点创建用于haproxy监控Mariadb的用户8 E0 N: m* q& D
MariaDB [(none)]> CREATE USER 'haproxy'@'%' ;
' a- L& `# G, n: q; |7 M配置haproxy用于负载均衡器
& j* E  F- n+ r8 M8 j/ ]9 S0 k) m
+ l- f# t2 W$ i; p7 ~[root@controller1 ~]# egrep -v "^#|^$" /etc/haproxy/haproxy.cfg
9 j+ \: w4 I; L( }) `    log         127.0.0.1 local2
- S) A5 }# H9 C2 g    chroot      /var/lib/haproxy
, K4 W* w' Q, U9 l1 P7 I/ g$ I    pidfile     /var/run/haproxy.pid
' k1 T: f! T1 ^2 c    maxconn     4000
$ ]2 K5 g8 d& O0 m    user        haproxy
' ^/ z3 n2 P0 G# u+ ~& i" Z    group       haproxy4 _8 V" `9 L9 j" |0 ]! t# _# g
    daemon" {1 k$ _. ?! B  ~3 i! D# G
    # turn on stats unix socket4 X. n0 E" Y0 p' n5 u; v& m
    stats socket /var/lib/haproxy/stats, ?$ z) U. ~6 O- R
defaults
6 }2 q5 _6 \7 r) z7 R* \    mode                    http
: Y. D& o* B3 }; W    log                     global
* T0 ]- j" A$ G6 H$ f! P/ [    option                  httplog
2 V0 H7 B; c1 z0 c9 k    option                  dontlognull
/ a! A4 ~& ?& C' P  c    option http-server-close8 p- q- |9 S( s' x! G
    option forwardfor       except 127.0.0.0/81 Y2 D% w: m) n5 g7 ?' U
    option                  redispatch
* A1 t" }0 N" i$ `    retries                 3
% o2 l" y: N, X# C    timeout http-request    10s
4 R5 d1 ~. i& R1 ~1 I" ^& P* M    timeout queue           1m( S! f; M+ T* }+ J! i
    timeout connect         10s
1 j, R$ L8 P* m    timeout client          1m
. @5 f+ @) C4 |. q% M% {9 c    timeout server          1m( e1 ]4 _8 y* Z# y5 {. }) M  [
    timeout http-keep-alive 10s
! ?8 v+ n. X; A. y    timeout check           10s
% L9 P9 \  _  f1 ]" H' g. j    maxconn                 4000
4 h% y3 F8 P/ [5 P: O$ E# Alisten galera_cluster8 R8 B/ E' x& M0 b4 G2 v% o
    mode tcp            
4 D' m7 d- w) `% J( I& @6 @$ E- h    bind 192.168.0.10:3306
& ^' Y6 |# X& Z1 t* s0 c" s) R    balance source
$ d; N- d7 s, \+ M+ ]    option mysql-check user haproxy- R. P, x  I5 w, q: n" T
    server controller1 192.168.0.11:3306 check inter 2000 rise 3 fall 3 backup
% Q6 r7 Z$ {: T    server controller2 192.168.0.12:3306 check inter 2000 rise 3 fall 3 $ Q# y6 Q) |8 X( O  p
    server controller3 192.168.0.13:3306 check inter 2000 rise 3 fall 3 backup9 W+ d  i2 J4 X: V/ m

1 H0 r% ~' ~0 k, r, [& q# wlisten memcache_cluster) Z( ^1 y) w+ z9 V
    mode tcp
- ^7 Z' l' v5 O    bind 192.168.0.10:112118 G) c4 r/ h5 x
    balance source
, z; |0 @/ ~/ W, b    option tcplog
+ c* w) h0 u. r2 y+ @    server controller1 192.168.0.11:11211 check inter 2000 rise 3 fall 3
$ F: f+ P! O; U    server controller2 192.168.0.12:11211 check inter 2000 rise 3 fall 3
( Y) R4 I# d# T: c    server controller3 192.168.0.13:11211 check inter 2000 rise 3 fall 3* @9 e8 a0 {: E. ?

: j: H5 ?" B2 Z
# }5 d! u5 w6 P6 S0 x9 c! T- m) ^注意:
4 h" H4 d3 W" z4 v; t    (1)确保haproxy配置无误,建议首先修改ip和端口启动测试是否成功。" v3 d) L& f& t8 ~) V' u" ?& Y
    (2)Mariadb-Galera和rabbitmq默认监听到 0.0.0.0 修改调整监听到本地 192.168.0.x
2 n  \( @) N0 Q2 R    (3)将haproxy正确的配置拷贝到其他节点,无需手动启动haproxy服务
  c# i2 k: ]8 h为haproxy配置日志(所有controller节点执行):: O7 H6 q! V8 \0 F7 k
/ R% T' ^- ~7 L. h2 I
# vim /etc/rsyslog.conf
8 ~* k# U8 f( c
) T- P2 c1 B- c* G; Z. t# C" O+ f2 r$ModLoad imudp
  t2 q. h/ O0 V; P. P+ l+ Y$UDPServerRun 514/ z4 f3 m' ]: q2 q
7 ^# u$ n% y' ]0 T; i4 _: `
local2.*                                                /var/log/haproxy/haproxy.log
0 x6 Y) P/ t: U  M7 F5 e% h$ P# H( s( |$ A4 B% a1 D# d6 f

; r, h+ p1 N; _$ n% x; B! j# mkdir -pv /var/log/haproxy/
) \9 X2 h) l0 v5 |3 o: V6 pmkdir: created directory ‘/var/log/haproxy/’
, e. w! R; `1 v  z: Q8 \, Q3 |$ h% F. C- d
# systemctl restart rsyslog
( Z: x' }$ g9 O0 F7 V8 W5 P4 E: R; ^/ |( s; c' M/ L% `' N) A
启动haproxy进行验证操作:
3 K7 z' d: d4 X3 J: @: `
3 t% c$ L+ x" B4 x# systemctl start haproxy/ C3 W7 _. C* U  D. b! t/ d
[root@controller1 ~]# netstat -ntplu | grep ha
9 V# M% @, B9 Htcp        0      0 192.168.0.10:3306       0.0.0.0:*               LISTEN      15467/haproxy      
$ C, a5 M; b+ p+ ?tcp        0      0 192.168.0.10:11211      0.0.0.0:*               LISTEN      15467/haproxy      
; L9 T" p4 q2 _5 o# @1 wudp        0      0 0.0.0.0:43268           0.0.0.0:*                           15466/haproxy
$ S+ X# m7 C* H: \6 d) E4 D
9 t; e6 T1 b2 X; I验证成功,关闭haproxy6 {* q) z, |$ a% N% l8 i
# systemctl stop haproxy3 p! k" t1 S1 g: p  w9 r7 Q" t& @

+ x  ^. g. D1 U3 m 2 X5 ^8 x' R" e, e6 z1 l! \
在controller1节点上执行:: L' u* X' q3 M6 t* l4 z+ l9 U% ?
[root@controller1 ~]# pcs cluster auth controller1 controller2 controller3 -u hacluster -p hacluster --force" G, U1 b3 e$ u3 T+ U4 M
controller3: Authorized
# b0 i$ _4 S; k: J5 I) c8 R3 z; bcontroller2: Authorized; w6 c4 e. f3 b0 ]: P- m1 \: H
controller1: Authorized9 ?- G# Z2 f2 [$ w, x: O1 m$ K
创建集群:9 R4 y  I7 {2 _) f

0 ?$ \4 j% c8 i5 H2 G! |  Y[root@controller1 ~]# pcs cluster setup --name openstack-cluster controller1 controller2 controller3  --force% [- u) _. p7 n- f3 c; c
Destroying cluster on nodes: controller1, controller2, controller3...
6 u3 U$ A, w( e) V% _controller3: Stopping Cluster (pacemaker)...
3 Z9 B( s4 r: O& H8 o& q& Kcontroller2: Stopping Cluster (pacemaker)...0 S- O, e6 Z+ f# Q
controller1: Stopping Cluster (pacemaker)...& L, p1 k; y3 U7 M
controller3: Successfully destroyed cluster
; W3 C6 H" `& ]controller1: Successfully destroyed cluster0 I) L( R% L$ M2 U' u6 E0 G( ?7 q+ ]
controller2: Successfully destroyed cluster/ R- g: l8 ^+ ^; m; u

: k; L  ~2 q# ISending 'pacemaker_remote authkey' to 'controller1', 'controller2', 'controller3'
: o. J$ c" o% Dcontroller3: successful distribution of the file 'pacemaker_remote authkey'
6 w1 |& Y$ _+ ?7 x2 p* A2 econtroller1: successful distribution of the file 'pacemaker_remote authkey'$ y  S  c) z, U' o
controller2: successful distribution of the file 'pacemaker_remote authkey'
/ ~( Y4 ~: P0 nSending cluster config files to the nodes...& _% C/ ^; B! w, g
controller1: Succeeded3 s  W( M! D) w& v! q; M1 x' x
controller2: Succeeded
6 d6 d# t/ G& |3 M  @controller3: Succeeded3 ^( [+ o+ H0 n) q

/ o# S) ?2 R8 [( TSynchronizing pcsd certificates on nodes controller1, controller2, controller3...
4 T0 L5 s% X- ]! i3 V( Xcontroller3: Success$ K+ F* s! y) h' ]) G- b" m
controller2: Success; z$ D4 d& P8 G1 Y" S1 ^( c$ O8 ]( Q
controller1: Success. {7 b3 r0 \4 G2 m+ O" t
Restarting pcsd on the nodes in order to reload the certificates...
: N; e7 q  W" w( O' Ucontroller3: Success3 g" h- r$ p7 K- p- D9 |
controller2: Success# j. D, V) ]5 O. ?( A/ O
controller1: Success6 i$ l5 F5 b$ u! O( n0 a7 L7 V
; i: P9 f2 v# L7 R3 x4 E
启动集群的所有节点:
: C3 ]( W. f: ?# f1 d9 B5 f/ |  ?0 T# o) H& u
[root@controller1 ~]# pcs cluster start --all& H$ B$ A$ k1 ^! L# w; x
controller2: Starting Cluster...0 r. O# ]: ~& X- t  r
controller1: Starting Cluster...
) N) X2 q9 m) |- H7 S; R# pcontroller3: Starting Cluster...4 l- Y% |: K5 ]8 {4 ^# `
[root@controller1 ~]# pcs cluster enable --all
. J" d9 u* K! Ycontroller1: Cluster Enabled) O% l" D( b! n1 c0 Y) Z
controller2: Cluster Enabled2 A3 s7 m3 J" D5 D' y) x
controller3: Cluster Enabled
/ G, u. w' V. m- T
' \* U7 g! {! U$ N8 w9 e查看集群信息:" p: m- p& V8 n( C
! l0 a9 B+ ]$ ]/ q* N5 c
[root@controller1 ~]# pcs status
' Y: y1 g9 E4 _5 g) c% N# f3 `Cluster name: openstack-cluster$ I# |, V- m+ y: h' t  H6 a" p
WARNING: no stonith devices and stonith-enabled is not false, E5 s+ R3 A/ ~9 Q
Stack: corosync7 E) N: W( J- r4 Q
Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum- a2 x5 F  ^, N( I" [
Last updated: Thu Nov 30 19:30:43 2017
  r/ g0 R0 k& p/ R/ FLast change: Thu Nov 30 19:30:17 2017 by hacluster via crmd on controller3+ T1 ^/ Y1 }$ O# @9 y) z  \

% k6 p1 h, A; K( l. G6 J7 j3 nodes configured
5 V2 N- V# y) m% d' `6 {  F0 resources configured) K. ^2 _7 O3 g4 j6 J

* d6 ?$ ^. E% U  nOnline: [ controller1 controller2 controller3 ]
1 f: a5 r0 L* e" p
. t2 M" I" C6 k6 ^) g1 mNo resources
8 w* e' Y9 ~0 E
# k. R) R6 ~+ g0 G1 P/ n9 _, \4 u% x8 s# m5 d
Daemon Status:9 ?# O+ t7 I- z2 d3 q/ S* T( @
  corosync: active/enabled
* `  W. C( V2 {. c. x  pacemaker: active/enabled
. r$ f; q# M3 l5 W  pcsd: active/enabled4 Q; E- }3 J" x2 w
[root@controller1 ~]# pcs cluster status
" t* X& F* I, W1 p: Q2 a4 LCluster Status:0 n$ j4 W) c/ i/ d; Q* B3 x
Stack: corosync
( {( e2 K" d! b1 A: J Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum$ d) G3 z  j" L9 Q, {* Q) K* w
Last updated: Thu Nov 30 19:30:52 2017
, S/ c! o" r; R Last change: Thu Nov 30 19:30:17 2017 by hacluster via crmd on controller32 @7 a1 H7 v/ G) n' J' I
3 nodes configured
$ K7 Z3 e! ], g 0 resources configured2 r+ c2 \. O5 ?& _/ g- `7 L# p
0 Z3 C* E& S( e  ^' K+ W' ]
PCSD Status:9 ]4 R3 V4 s  }1 n1 c1 ?
  controller2: Online% j1 s# E6 g7 b% R: K
  controller3: Online$ c/ d0 V; b, z( |) l: X$ p
  controller1: Online
' c9 H5 a3 M+ u7 n" s
) y/ L$ T2 d* z/ C2 j$ `三个节点都在线7 R* F) p. f. o' l2 p2 M
默认的表决规则建议集群中的节点个数为奇数且不低于3。当集群只有2个节点,其中1个节点崩坏,由于不符合默认的表决规则, 集群资源不发生转移,集群整体仍不可用。no-quorum-policy="ignore"可以解决此双节点的问题,但不要用于生产环境。换句话说,生 产环境还是至少要3节点。
# g4 F  g  k3 O' epe-warn-series-max、pe-input-series-max、pe-error-series-max代表日志深度。
& s  N! v" ?4 M; i4 n1 ]7 O/ Acluster-recheck-interval是节点重新检查的频率。
3 I" n# e" z- u% m2 c; C" a[root@controller1 ~]#  pcs property set pe-warn-series-max=1000 pe-input-series-max=1000 pe-error-series-max=1000 cluster-recheck-interval=5min, \% |& h& Q3 V% C3 s! E# R, @
禁用stonith:
3 E( z; Z& z4 P8 m' d* \0 Tstonith是一种能够接受指令断电的物理设备,环境无此设备,如果不关闭该选项,执行pcs命令总是含其报错信息。
7 r6 S$ f* r+ R( O3 q/ a, C: }[root@controller1 ~]# pcs property set stonith-enabled=false
- d( O' ]3 t9 \二个节点时,忽略节点quorum功能:
" U5 t) Y) x$ r/ |2 F# r& q[root@controller1 ~]# pcs property set no-quorum-policy=ignore
% I  R5 C+ v* {) m' ~7 Q验证集群配置信息
+ E/ o* r# g. o: Y6 {- F[root@controller1 ~]# crm_verify -L -V4 B9 U2 |# u( Q
为集群配置虚拟 ip* y+ Y# ]  w' A; K# ]9 i* b! o
[root@controller1 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \* I# G+ [, g& l  R- Q; k
ip="192.168.0.10" cidr_netmask=32 nic=eno16777736 op monitor interval=30s
- ?# x# v+ C' ^$ T6 P到此,Pacemaker+corosync 是为 haproxy服务的,添加haproxy资源到pacemaker集群
+ E( x- ]; Y6 q; g% T9 B[root@controller1 ~]# pcs resource create lb-haproxy systemd:haproxy --clone
" @) n" |# J' S6 v4 D8 Z说明:创建克隆资源,克隆的资源会在全部节点启动。这里haproxy会在三个节点自动启动。
- \9 A3 {2 \4 {$ V  G: m查看Pacemaker资源情况: c2 E; w( T4 f) s: O' m
[root@controller1 ~]# pcs resource
" E' ?" _8 z8 K, B# y) K ClusterIP    (ocf::heartbeat:IPaddr2):    Started controller1        # 心跳的资源绑定在第三个节点的
, e2 U0 ~0 Q5 V2 x Clone Set: lb-haproxy-clone [lb-haproxy]        # haproxy克隆资源
( ^' b8 E2 W) b0 f5 f     Started: [ controller1 controller2 controller3 ]
  o* G) ~% \, c注意:这里一定要进行资源绑定,否则每个节点都会启动haproxy,造成访问混乱
1 p0 Q. u) M' n" E# _* c将这两个资源绑定到同一个节点上5 ?% _  ]5 C9 T' o. u( M
[root@controller1 ~]# pcs constraint colocation add lb-haproxy-clone ClusterIP INFINITY5 J( c4 w# g9 v& k* c
绑定成功: C- d" h) {/ ]$ P
[root@controller1 ~]# pcs resource6 _$ z6 B. Q# C/ p% x
ClusterIP    (ocf::heartbeat:IPaddr2):    Started controller3
" G5 G8 s4 T' N9 g Clone Set: lb-haproxy-clone [lb-haproxy]" E9 g8 p  b/ B9 w0 v/ x2 F) b
     Started: [ controller1]4 M7 D% n( r' t( s
     Stopped: [ controller2 controller3 ]! i  E% ?5 n1 v: R
配置资源的启动顺序,先启动vip,然后haproxy再启动,因为haproxy是监听到vip
" {( ?. y6 c& n" G9 D5 c[root@controller1 ~]# pcs constraint order ClusterIP then lb-haproxy-clone% l- S; W) h0 d6 ]  ?
手动指定资源到某个默认节点,因为两个资源绑定关系,移动一个资源,另一个资源自动转移。7 `" S% P: |5 J, J: B# b! U5 D! J# a

3 X& A, O- `* t[root@controller1 ~]# pcs constraint location ClusterIP prefers controller15 e& Z/ m! j3 E  F' A
[root@controller1 ~]# pcs resource
  N/ Q! n& {$ R8 ]: S2 R) P! j ClusterIP    (ocf::heartbeat:IPaddr2):    Started controller1& ~9 N4 R  x' g) t! x
Clone Set: lb-haproxy-clone [lb-haproxy]
1 Z  h0 f& `, ~( v     Started: [ controller1 ]
) g. n+ Q8 w4 X     Stopped: [ controller2 controller3 ]
+ l: z9 }! w) K+ S# V[root@controller1 ~]# pcs resource defaults resource-stickiness=100        # 设置资源粘性,防止自动切回造成集群不稳定
* h+ g& ~. g' V  C! o现在vip已经绑定到controller1节点& A+ B1 e! t( |3 v: r! F: L
[root@controller1 ~]# ip a | grep global; p8 |* i# X' @
    inet 192.168.0.11/24 brd 192.168.0.255 scope global eno16777736
1 }8 F3 M8 c) K1 p    inet 192.168.0.10/32 brd 192.168.0.255 scope global eno167777361 u% A7 n0 Y- Y, `* ?& _$ u4 t
    inet 192.168.118.11/24 brd 192.168.118.255 scope global eno33554992
  A: E5 U$ U3 W. Q& ~! X( L
( ~: s' v7 l3 L2 s尝试通过vip连接数据库
' T: d. ?. V- @; wController1:4 H! E  ^: ~+ p% Z3 Y( b& G0 N
( C) w+ \. P2 q7 u
[root@controller1 haproxy]# mysql -ugalera -pgalera -h 192.168.0.10- i; X7 @1 b, P! L3 M
0 t) }& }( ]% Q2 {1 v$ e: r& k! n

* Z9 X+ u: K! r, B# h. r7 D& q' H Controller2:: g7 V; i3 M( Q

# b% E* p1 k8 W , C7 z9 j; E# x$ u/ l* n3 x7 @
高可用配置成功。# t% b- E; u* x1 P0 z: g6 h. h

) [, N& _& }9 g, m- Q测试高可用是否正常% K7 v3 D7 l! F" q, b
在controller1节点上直接执行 poweroff -f/ F5 l0 T1 S( {& S. y: d, k0 M- V
[root@controller1 ~]# poweroff -f2 Y) V5 i# o/ Z# q6 X2 k  O
vip很快就转移到controller2节点上  z0 l9 X# Z1 t

$ ]9 L, u4 E5 y# U1 z7 p再次尝试访问数据库+ o9 J8 T8 D0 f5 h

' H/ [3 d8 b5 ^' D* j3 M
: T' n- H8 S# |& [ 无任何问题,测试成功。6 {8 D1 l! i4 ?& e. o
查看集群信息:
8 c4 k( }4 M5 X1 y
( b5 O7 ~, w2 d: @$ ^[root@controller2 ~]# pcs status 2 T: t4 ^9 ~* Z2 R
Cluster name: openstack-cluster
; T$ T) N; A7 x, r6 u8 PStack: corosync
" f& U( f7 q! O3 N; ZCurrent DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum$ {% T. z4 t" z0 j! Y& ^- u
Last updated: Thu Nov 30 23:57:28 2017- K- \# r# o% b! @- I6 H
Last change: Thu Nov 30 23:54:11 2017 by root via crm_attribute on controller1  [5 p& [8 u* G! _8 [
4 J- ]. ~- M& J8 Z* W& @2 P
3 nodes configured
, j1 R9 I, J; F: H4 i' Z# k4 resources configured. ?' j! R4 w; U9 ?* ]( j8 A
% S/ j2 }7 Q" `8 s- w  o
Online: [ controller2 controller3 ]
. x( ]- t& I& fOFFLINE: [ controller1 ]            # controller1 已经下线
7 b% t( e. j* ^/ K, P, H7 @# z9 @: g/ g6 B- A
Full list of resources:1 q+ F# B% k* t6 W7 r1 N: G
7 T" l/ G# W+ x% e
ClusterIP    (ocf::heartbeat:IPaddr2):    Started controller2
1 m3 Q( F8 D0 r4 P/ t4 d Clone Set: lb-haproxy-clone [lb-haproxy]
& ^/ C5 u6 p. ~7 n$ A; C. _1 z6 h     Started: [ controller2 ]
- R; g' }$ j7 g     Stopped: [ controller1 controller3 ]
$ N# k! p0 h9 O' W7 w9 F+ x# g+ @. @  h# ~
Daemon Status:3 G& b4 @! {8 K: U2 q3 j1 H1 x- E8 J
  corosync: active/enabled: c7 t4 l& M3 ?
  pacemaker: active/enabled
9 o5 L; x- s7 W4 ]  s  pcsd: active/enabled
您需要登录后才可以回帖 登录 | 注册

本版积分规则

返回首页|Archiver|手机版|小黑屋|易陆发现技术论坛 ( 蜀ICP备2026014127号-1 )

GMT+8, 2026-6-12 04:00 , Processed in 0.015423 second(s), 22 queries .

Powered by Discuz! X5.0

© 2001-2026 Discuz! Team.

快速回复 返回顶部 返回列表