找回密码
 注册
查看: 1410|回复: 2

ceph 分布式存储bucket对象回收lifecycle机制

[复制链接]

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
发表于 2022-4-11 15:49:15 | 显示全部楼层 |阅读模式

: S+ R& Z0 J0 M7 i在radosgw中,可以针对bucket设置其对象的生命周期,当对象存活时间超出这个时间后,rgw会对其进行删除回收,这里设置的粒度目前是在bucket,也就是不能针对object设置其失效时间
/ i* V4 x; ~7 ~+ J0 Pbucket的lc机制弊端是很明显的,在lc打开的情况下,进行lc的做法是遍历整个bucket的对象,找出超过存活时间的对象进行回收,这就意味着,如果单个bucket对象数量十分庞大,这个遍历就会极其消耗性能) k! h$ _5 n% l- j, b5 N3 u
开始
; _) L# @$ C! Q首先,与lc相关的参数有
8 ?9 @7 D! M2 {+ |5 C$ o( x% Z( L    "rgw_enable_lc_threads": "true",+ ?6 {3 X. d. x
    "rgw_lc_debug_interval": "-1",
. s$ M& t- g! ?& R) S) j    "rgw_lc_lock_max_time": "60",
$ E- V4 k7 t1 k$ G+ H% N5 f    "rgw_lc_max_objs": "32",! W& f( a7 ~# Y+ U
    "rgw_lc_max_rules": "1000", R$ \# S- d, @5 f
    "rgw_lifecycle_work_time": "00:00-06:00"
' m/ v7 A. C/ M) \% `默认情况下,bucket的属性标识中对象是永不过期的,对于某个bucket,手动设置其过期时间的接口,使用boto3的做法是& M$ s) c* s7 Y3 s' e& f; `+ \
#pip install boto3 -i https://pypi.tuna.tsinghua.edu.cn/simple boto
' M  X4 w4 w, D$ nfrom boto3.session import Session  Z: }' [$ A3 q$ z) {
import boto3
' g: C' d8 J+ T% J6 _* zimport uuid
3 k9 v4 k  n4 b* ^4 `access_key = 'xxx'
5 {- e- y  p' \3 Zsecret_key = 'xxxxx'* t& \7 j3 _' r/ a) e6 ]9 }' j
host = 'http://ip:7480'/ n4 ]! o7 L% Y, \( D
session = Session(access_key, secret_key): {! }: D: l6 M
buckets = ['bucket1']
9 p7 @4 U3 u1 G& j+ Es3_client = session.client('s3', endpoint_url=host)
! G5 o, b6 u/ k% H7 x% Qfor x in buckets:
' e7 H* J$ P- h, Z    s3_client.put_bucket_lifecycle(5 K8 n; x4 L/ \
           Bucket=x,
! X, I; c( v" R8 }           LifecycleConfiguration={) Z  V2 M  _+ k7 w
               'Rules': [" ?/ Z# e8 }6 O. N" G7 ^
                   {
$ ^0 x  C- D1 G$ l2 z/ M                       'Expiration':
, U$ \0 m& n2 w                           {) U! G" p4 r" q0 u; c2 S9 B8 Y9 e" }
                               'Days': 1- j# d/ Y# n+ d- k7 Q
                           },
3 k( A& ~# _+ f0 e+ o                       'ID': str(uuid.uuid1()),- Q+ X# k3 v) ?4 v
                       'Status': 'Enabled',( D$ o" _. @$ d3 ~, d
                       'Prefix': ''
1 @5 U: [7 K8 [( K                   }, v9 D( q9 l* h5 p/ V! F
               ],
' v5 k$ |. e* z# a           })5 b2 ~$ H# P1 B8 T/ n, }+ c' s
    print s3_client.get_bucket_lifecycle(Bucket=x)
# K- E; @. |3 _9 A: ^9 A设置完成后,可以通过命令radosgw-admin lc list
' ^, f8 I4 t# A: `, k查看到当前lc的队列
, q% A' \4 e0 }( q[store@server227 build]$ ./bin/radosgw-admin -k keyring -c ./ceph.conf  lc list
7 d- D- v7 |- T$ V[+ D* t1 U# O; g( W
    {8 E/ }5 F7 K' N% ?% j: _" l
        "bucket": ":cpp-bucket1:eace8f62-b901-47ed-b18f-fe2441f33830.4155.1",  b: H! h+ M$ t' ^, F/ |& \- B9 o: c1 O1 g
        "status": "COMPLETE"4 R+ W2 o) Z/ `* _
    }
" v' f3 O. o( V2 O* w]
2 J. j  G) D! }3 A代码跟读+ f$ a6 v2 ~, A, i2 s
lc处理流程的入口是rgw/rgw_lc.cc7 ]/ e8 v8 u( Z3 V1 Z. v) z) i
的*RGWLC::LCWorker::entry
1 K9 k+ x" h9 H8 o函数0 p0 ?7 B; ?1 s' z9 {
void *RGWLC::LCWorker::entry() {
- U0 ?; Y) K: S6 Q6 t  do {4 h: I' Y6 Z6 u; H7 ~
    utime_t start = ceph_clock_now();, l) \1 Y) M( \" |* ]. k" A5 A
    rgw_lc_debug_interval > 0时或者达到设定的回收时间则返回true$ y0 o' l. x& h  s4 }* C: i; B5 @# W
    if (should_work(start)) {. N- M+ Y, n. y6 D- Z
      ldout(cct, 2) << "life cycle: start" << dendl;
9 O7 R, L; e, i. T- a: @* i      int r = lc->process();% ?4 w, C( }. S/ V+ T) y& Q
      if (r < 0) {/ _9 q0 Z- K+ P. r/ v, w  N1 Q
        ldout(cct, 0) << "ERROR: do life cycle process() returned error r=" << r << dendl;: O4 w+ u; l  ]) T, l+ r  x4 L
      }+ s0 ^3 g; d2 @0 E
      ldout(cct, 2) << "life cycle: stop" << dendl;4 E1 E" J! Q- s4 a8 N- c; c: |
    }2 a; _4 g8 ?7 `+ p- y2 B9 I2 O
    if (lc->going_down())8 h* _8 H0 u) K( j: b5 C
      break;
  c! N6 O% ~. f2 ]    utime_t end = ceph_clock_now();
3 V2 B- P- ]: |" l! n    int secs = schedule_next_start_time(start, end);
8 `- P% o' z# i9 W6 J    utime_t next;
2 M1 t& ~, q4 F( J9 i9 x    next.set_from_double(end + secs);
$ |) u) \+ F. h1 s. A0 C% k. M    ldout(cct, 5) << "schedule life cycle next start time: " << rgw_to_asctime(next) << dendl;
8 R) l' b' q0 M    lock.Lock();
. e4 z4 ]* Z5 B    cond.WaitInterval(lock, utime_t(secs, 0));* ]1 Q& U, \+ i( l) v, c& ^( ^
    lock.Unlock();; h, u* ~; \) {! `; f- f/ a
  } while (!lc->going_down());
3 p9 }9 F2 I$ Q! ]9 F$ \  return NULL;6 i2 `2 s! q% k/ d7 F3 R
}
, W/ l. M/ h* h7 I' A' f8 S8 l如果使能了lc的功能,lc的线程会在进行一次lc作业后,等待一段时间,再唤醒起来进行下一次的作业,这里函数的第三行,使用should_work" ~2 ^4 M7 v" y
函数判断本次的lc是否进行,默认rgw_lc_debug_interval7 H3 F% c( V) c
为-1,表示不进行lc的作业,此时会进一步判断是否在设定的回收时间内,如果不在,也不进行lc作业
: K: P& |# N+ _' A; {- e# n5 M  if (cct->_conf->rgw_lc_debug_interval > 0) {, N7 ]: U' X2 S6 f6 N
      * We're debugging, so say we can run */8 ~4 ~: q9 o; d: ^) m+ a. H
      return true;% H5 r$ h- |( T+ `% x0 L
  } else if ((bdt.tm_hour*60 + bdt.tm_min >= start_hour*60 + start_minute) &&  T4 H" L5 d3 W
             (bdt.tm_hour*60 + bdt.tm_min <= end_hour*60 + end_minute)) {
0 O9 x" G/ c. k& H      return true;
. D! _9 R2 j6 Q) V) [7 U; ?0 T  } else {
' n  D; l1 W1 ?- r* I+ P      return false;0 \& ]. v8 i5 B0 {
  }5 A! ~" J5 q' Z( w. n2 W$ L' p1 l
跟着lc->process()
* ~& x+ P6 p$ |" b2 Y看下
& {7 }5 [' i8 kint RGWLC::process()+ j. v: u8 L  w6 ^( W/ M$ o
{
5 H4 j; a, m& l% B# h1 F+ ]  int max_secs = cct->_conf->rgw_lc_lock_max_time;( u3 }0 f- T4 j2 k/ {
  const int start = ceph::util::generate_random_number(0, max_objs - 1);8 Z2 @3 m8 ]/ n: J: j/ w
  for (int i = 0; i < max_objs; i++) {
# y6 U+ F( V% G) L. A1 @9 U    int index = (i + start) % max_objs;
  J" c  ]7 a  Y0 I! c; y8 o    int ret = process(index, max_secs);. l  K( e& R# ?+ H/ O4 J1 H
    if (ret < 0)! K: Y& Q+ L5 a. x& |
      return ret;
& H# \$ o' D7 H# J6 J! m  }, I, |+ ^/ U7 z  t% \9 k3 `
  return 0;) F3 {3 P8 n' P  \- d3 K4 E4 m
}6 r3 I7 Q. B' b2 G: o9 G6 p2 [4 C
函数比较简短,限制每次作业的时间为rgw_lc_lock_max_time9 [: L) S4 R* u2 }5 }
,默认是60s,结合实际测试看,这个时间并不准确,继续看- U2 Z8 h/ O! n  f; t' l! A
int RGWLC::process(int index, int max_lock_secs)
1 ^. P* B) Y, V! a{
7 ^) a6 F# D) @( F, D  根据max_lock_secs来判断一个执行周期的时间长度
3 M# O8 F' q+ k  rados::cls::lock::Lock l(lc_index_lock_name);4 n5 J2 _4 N! b7 g' l
  do {
8 Z; `- ]: E% f    utime_t now = ceph_clock_now();9 E: U. b2 X3 b* F: W' c
    ......
- H5 d; @7 s: R6 \: X    从这里开始,正式开始进行对象的lc操作,注意到这是在一个死循环里
: e6 I* ?+ F3 F+ n0 \0 @1 e' ~    ret = bucket_lc_process(entry.first);
% {0 K: }+ B" b    bucket_lc_post(index, max_lock_secs, entry, ret);
1 t0 `2 |0 ], h1 \: b' I2 t  }while(1);
, b  h: i9 ]! P6 wlock的对象是lc_process
" H  y9 Z: X! H: x' L% L! H+ w,这个函数中会多次请求osd进行lc的前期操作,包括查询head、set一些配置等,最重要的是检查bucket是否配置了lc,开始lifecycle到调用bucket_lc_process
2 s! Z6 {) k3 a/ T# `; s2 N前,花费时间为68ms7 \5 ]7 h+ a. d7 ]
,接下来
- U7 q& I4 z5 r! wint RGWLC::bucket_lc_process(string& shard_id): i+ O. j/ r  z: \
{
5 p% O. ~7 e/ w* v4 P    对bucket的对象进行回收的操作主要在这个函数中进行
4 D: {  d& A$ f# l  L& Z" Z9 T    ....
  _, g5 v! Z4 l. j5 F" M    设定运行周期
3 s8 E& R8 J, U8 ]& K* Z( F    l.set_duration(time);
8 H, f( ?8 n4 D# J: w    ....
# Z0 y9 s' i" b    do {
; ]1 d1 }) p9 ]# p4 ?        objs.clear();
0 G+ i1 t- f! s+ W9 A! X        list_op.params.marker = list_op.get_next_marker();
2 r0 X3 O4 ?+ O4 W, C6 W# Q1 y        从bucket中选择1000个对象,使用的是bucket list方法, T0 O, c6 S" Y
        ret = list_op.list_objects(1000, &objs, NULL, &is_truncated);" r: X3 u8 A$ Y7 F6 V$ q) t0 ?$ w
        if (ret < 0) {5 e" w- a5 E3 s, b7 K
          if (ret == (-ENOENT)), P% u8 @) A$ m! G
            return 0;8 z9 a9 K: b* q* Z# ~8 j* L
          ldout(cct, 0) << "ERROR: store->list_objects():" <<dendl;
7 x1 Z1 j- f. H: P5 ^/ F7 p          return ret;
0 c0 h! ]1 H% y  T/ J4 Q% @        }# y& E8 v) Z" D4 b/ \: `8 O
        
+ D* u( t* d1 h. {) t8 z8 Q8 J        bool is_expired;2 U) ]2 h- z, Y+ s0 p
        遍历这1000个对象,检查它们是否达到了过期时间
( g+ ?# i( d0 \* m  Q        for (auto obj_iter = objs.begin(); obj_iter != objs.end(); ++obj_iter) {
* M9 V9 d9 Y/ L6 c. N) {! L          rgw_obj_key key(obj_iter->key);
7 R6 f5 O/ }# {/ P3 K7 A2 m9 b9 V' n          ....
2 s7 N+ B3 u* M+ Y          if (prefix_iter->second.expiration_date != boost::none) {
$ q5 d( Y% ]5 B( r" @1 R/ i: v            we have checked it before
* {+ T/ m: O, t8 A1 D* M6 O: \3 x            is_expired = true;4 D+ C$ ^6 I, n9 G( B( e8 `1 Z7 O! u
          } else {
1 V0 W! \$ w- M  V            检查对象是否达到过期时间) q% |$ W5 o6 n4 `  ?& I3 T* w* b
            is_expired = obj_has_expired(obj_iter->meta.mtime, prefix_iter->second.expiration);
, @% K, l$ V* J  N( H  {. c          }# v! |' |. i8 l0 t2 |
          if (is_expired) {4 E2 Z& ~7 ]" M
            int ret = store->get_obj_state(&rctx, bucket_info, obj, &state, false);9 @% F7 y  p+ g* o* Z- b
            if (ret < 0) {8 r, r/ O2 `" I0 z) _
              return ret;* P  V7 ~3 W& N! K" J" @
            }; z* b/ R& X$ Q$ w
            if (state->mtime != obj_iter->meta.mtime) {
+ E- `2 `! p) }# ]              Check mtime again to avoid delete a recently update object as much as possible
4 e5 D* |  i$ d3 o' B              ldout(cct, 20) << __func__ << "() skipping removal: state->mtime " << state->mtime << " obj->mtime " << obj_iter->meta.mtime << dendl;
7 s; Q; I+ N: I7 _& c              continue;
3 k& Z& f, {% H  Q; w) S6 _% L3 r9 A            }* j6 C5 q  M  O% ?/ T" Z
            ret = remove_expired_obj(bucket_info, obj_iter->key, obj_iter->meta.owner, obj_iter->meta.owner_display_name, true);% b4 c  G. ]1 ^. m
            if (ret < 0) {
! D+ k; w: @) i% q' i) k              ldout(cct, 0) << "ERROR: remove_expired_obj " << dendl;
& s& L+ H# v% B+ F$ g            } else {2 H8 O0 M/ ~+ |1 Z6 g
              ldout(cct, 2) << "DELETED:" << bucket_name << ":" << key << dendl;
0 }6 \4 @0 p7 u4 M  Z            }
6 Q7 t" ?* m, K5 n2 g2 b. q: t            if (going_down())+ r& y' f. v3 _0 ?5 O
              return 0;" i9 b9 ~% U. L" T% z
          }% J6 K0 g0 i! K  v/ b) T
        }) G- F# ^) K+ E, O' f
      } while (is_truncated);
6 j- V9 O' x  N: m2 N- M}
, A) T" C) m- d2 c. y! f6 @1 V6 ]0 I在这个函数中,首先设定运行时间,线程通过bucket list的方式拿到1000个obj,获取使用按顺序的方式,遍历这1000个对象,如果该对象超过了过期时间,则删掉,关于循环的停止条件,is_truncated: if number of objects in the bucket is bigger than max, then truncated8 m! }% [. A" M; a/ _
,即要求遍历完整个bucket
' v# L1 j- S% ?9 T1 a$ p这部分耗时主要依赖于bucket内对象的数量,对象越多,则耗时肯定会越多  A+ C/ z; x' s5 c6 M3 d
最后,针对对象中分片情况,进行了处理,在RGWLC::bucket_lc_process
; S4 x( @! i7 L5 E3 I+ {+ R/ m; c的最后
5 z6 z4 N/ P3 y! }  k6 o" M. z2 e  ret = handle_multipart_expiration(&target, prefix_map);
0 A9 }+ c+ {0 F$ S: }0 x8 j" S  return ret;
7 u$ K9 R' t3 {9 @$ x其中prefix_map
1 A* f6 |2 F% b4 L表明了对象分片的相关信息
# p4 _7 t9 U. }+ w* x3 o看下对于分片的处理逻辑5 H& {+ b+ z$ Z7 d
for (auto prefix_iter = prefix_map.begin(); prefix_iter != prefix_map.end(); ++prefix_iter) {
0 m  u8 {1 M; r* B& Y! P      ......  B  {. K+ V9 h: Z) U
    do {
9 J4 T( [# c" ^3 M      objs.clear();6 A8 ^% v& u" U% j. I% d  A
      list_op.params.marker = list_op.get_next_marker();7 N" h; k$ z% B
      ret = list_op.list_objects(1000, &objs, NULL, &is_truncated);
9 j* d7 I5 c+ K4 m, ?( M$ i      ....../ F; z& t% h! K4 {3 B- X; `4 J
    } while(is_truncated);2 H% v! ]- W- ^1 m8 ^8 k
}
1 F6 L, B- m, Y可以看到,与对象删除一样,分片删除也是一次性取出1000个分片对象进行删除+ L( K  o7 w( ?5 B9 h" M1 y
逻辑总结
; u7 ~# k" _+ x% l首先,在默认情况下,lc线程启动后,根据rgw_lc_debug_interval* K4 W" Y* F- b; ]/ ^
及设定的rgw_lifecycle_work_time) z4 f3 T  \7 ^4 P, G7 s
来判断要不要做本次的process,默认rgw_lc_debug_interval! v5 S& t) J; n# g( T2 Q4 ~, t
为-1,表示不做本次lc,另外,在rgw_lc_debug_interval <=0, |* h/ C" x. h0 q8 p; A
时,这个时间间隔单位将放大为天,原注释说的是We're in debug mode; Treat each rgw_lc_debug_interval seconds as a day2 ?# T# z8 |  ^1 A9 r
在决定要做本次lc后,根据rgw_lc_lock_max_time; L+ p" ~$ Q6 M/ }
来判断每次lc的周期时间长度,然后每次从rgw_lc_max_objs# n( O# z: k  G5 ]+ i
中选取一个进行运行,增大这个值不会加快回收速度1 z" w" N0 h( ]6 u/ l0 O
关于rgw_lc_max_objs
" G, A5 K0 M* c,它表示的是lc的处理线程的最大数量,它并不是并发的线程,因而增大这个值并不能加快回收速度,它类似于一个分组,bucket被设置lc规则后将被分配到某个lc中,例如,这个值设置为3,它启动回收的时候会随机启动lc.0、lc.1、lc.2中的其中一个,如果某个lc中有bucket则进行回收,测试发现,多个bucket都设置过期的情况下,单个rgw同一时间仅运行一个lc线程,因而设置自动回收速度其实是很慢的
) X/ L/ H( v9 G( z! Y接下来,请求并设定一些基本的元数据信息,最重要的是判断bucket是否需要做lc,它检测bucket是否进行了lc的设置,如果设置才做,没有设置,默认是不做' u# I1 O$ b# `$ Q5 k) {4 I
然后,从指定的bucket中每次取出最多1000个有序对象,遍历这些对象并检查他们是否到达过期时间,过期则删除,直到bucket中所有的对象都遍历完成.
! [% J) o" m4 l: S针对分片做了特殊处理,删除掉第一个片后,其余分片有专门的逻辑进行进一步的删除
! B6 l2 D( ^% f0 ?. h2 L% n环境中起2个rgw进行测试,发现2个rgw可以同时进行不同bucket的回收,因而,要加快回收速度,只能通过增加rgw的数量实现,单个rgw明显只能串行单线程进行单bucket的回收,不过,因为回收bucket涉及到bucket list,过快的回收速度也可能引发性能问题,需要权衡0 A3 S* h6 D9 O8 G

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2022-4-12 09:28:07 | 显示全部楼层
16 Operating Ceph Services REPORT DOCUMENTATION BUG#EDIT SOURCE& l! a$ n3 c( o. j5 h5 X$ U
& V# J" Q) U0 v" P$ H0 F3 O2 a$ X

8 ~9 o& M& P1 g6 D( U% I& k; f  V$ s16.1 Operating Ceph Cluster Related Services Using systemd16.2 Restarting Ceph Services Using DeepSea16.3 Shutdown and Start of the Whole Ceph Cluster; Z  s9 r: S$ _( U' E

' n' @5 b# l- n) F( T
You can operate Ceph services either using systemd or using DeepSea.
16.1 Operating Ceph Cluster Related Services Using systemd REPORT DOCUMENTATION BUG#EDIT SOURCE
- x3 X. Y5 W# C
' P' ~; C  `) q8 U1 G* r
$ F1 ^9 X8 f/ C/ `
Use the systemctl command to operate all Ceph related services. The operation takes place on the node you are currently logged in to. You need to have root privileges to be able to operate on Ceph services.
16.1.1 Starting, Stopping, and Restarting Services Using Targets REPORT DOCUMENTATION BUG#EDIT SOURCE
4 i- ^& p8 c# E0 z  W* G
4 V1 `% Z/ c2 E! M  E' L; j! X( h' E, {3 P/ W% M
To simplify starting, stopping, and restarting all the services of a particular type (for example all Ceph services, or all MONs, or all OSDs) on a node, Ceph provides the following systemd unit files:
cephadm@adm > ls /usr/lib/systemd/system/ceph*.targetceph.targetceph-osd.targetceph-mon.targetceph-mgr.targetceph-mds.targetceph-radosgw.targetceph-rbd-mirror.targetCOPY

4 U; R; {4 r( e9 [( V% ^7 e
To start/stop/restart all Ceph services on the node, run:
root # systemctl start ceph.targetroot # systemctl stop ceph.targetroot # systemctl restart ceph.targetCOPY
6 a2 I0 `) @- y/ V3 M5 N
To start/stop/restart all OSDs on the node, run:
root # systemctl start ceph-osd.targetroot # systemctl stop ceph-osd.targetroot # systemctl restart ceph-osd.targetCOPY

  G' M0 e$ N$ u: r
Commands for the other targets are analogous.
5 h7 M# Q0 m5 {  @+ T( M
16.1.2 Starting, Stopping, and Restarting Individual Services REPORT DOCUMENTATION BUG#EDIT SOURCE( c: a& G, S1 {  e

( x& K6 n* j. D7 S5 H; g$ z* T# S8 F6 F  q/ P" S: K
You can operate individual services using the following parameterized systemd unit files:
ceph-osd@.serviceceph-mon@.serviceceph-mds@.serviceceph-mgr@.serviceceph-radosgw@.serviceceph-rbd-mirror@.serviceCOPY
" @! N' T4 w# I  g6 T; [+ O
To use these commands, you first need to identify the name of the service you want to operate. See Section 16.1.3, “Identifying Individual Services” to learn more about services identification.
To start, stop or restart the osd.1 service, run:
root # systemctl start ceph-osd@1.serviceroot # systemctl stop ceph-osd@1.serviceroot # systemctl restart ceph-osd@1.serviceCOPY

9 y' E5 ^4 n1 p( H3 [: w
Commands for the other service types are analogous.
4 y4 h5 {# h. F8 o9 q/ U
16.1.3 Identifying Individual Services REPORT DOCUMENTATION BUG#EDIT SOURCE
% J7 {! J+ _8 M- N% m$ f1 I4 v& w% a5 C" |' h- k! u

9 y- _' n4 Y$ r6 |, {$ D9 ^2 y
You can find out the names/numbers of a particular type of service in several ways. The following commands provide results for ceph* services. You can run them on any node of the Ceph cluster.
To list all (even inactive) services of type ceph*, run:
root # systemctl list-units --all --type=service ceph*COPY

& M* X7 `1 C: V! ~/ _6 k: F
To list only the inactive services, run:
root # systemctl list-units --all --state=inactive --type=service ceph*COPY
! t$ c. q3 m! S! r( L  Q% G
You can also use salt to query services across multiple nodes:
root@master # salt TARGET cmd.shell \ "systemctl list-units --all --type=service ceph* | sed -e '/^$/,$ d'"COPY

4 T6 E% z" I9 E( r) q
Query storage nodes only:
root@master # salt -I 'roles:storage' cmd.shell \ 'systemctl list-units --all --type=service ceph*'COPY

5 e- i. }: U: E) M! e3 Q9 G9 q7 |/ z  N& i; X2 n7 E
16.1.4 Service Status REPORT DOCUMENTATION BUG#EDIT SOURCE
" U2 F) @! |" ^9 U/ u
8 y+ {+ Q# q) t
- p# z! Y0 p3 N; ~
You can query systemd for the status of services. For example:
root # systemctl status ceph-osd@1.serviceroot # systemctl status ceph-mon@HOSTNAME.serviceCOPY

! v" h" s0 E* }6 W- U% L
Replace HOSTNAME with the host name the daemon is running on.
If you do not know the exact name/number of the service, see Section 16.1.3, “Identifying Individual Services”.
( P$ Z* ^  K1 P. m% ~# d

/ ~. l5 [; {1 a4 F* ~8 B2 Y16.2 Restarting Ceph Services Using DeepSea REPORT DOCUMENTATION BUG#EDIT SOURCE
4 q6 B- D: D) b% n- d
" n. |/ C$ i, [% Z
4 r6 ?1 {$ I+ T, }( F! s* x
After applying updates to the cluster nodes, the affected Ceph related services need to be restarted. Normally, restarts are performed automatically by DeepSea. This section describes how to restart the services manually.
Tip: Watching the Restart
The process of restarting the cluster may take some time. You can watch the events by using the Salt event bus by running:
root@master # salt-run state.event pretty=TrueCOPY

) E1 P  P7 e6 M( t
Another command to monitor active jobs is
root@master # salt-run jobs.activeCOPY

: Q% K  s4 `+ x$ g( _) Y9 P- S8 h% o
! u6 _# V% r8 h! T' I
16.2.1 Restarting All Services REPORT DOCUMENTATION BUG#EDIT SOURCE- e9 Q/ o% q. z" E/ E. N# @/ B3 d/ O

7 l3 g7 v- x2 d$ o. r1 l5 X/ r: _' {4 n6 X
Warning: Interruption of Services
If Ceph related services—specifically iSCSI or NFS Ganesha—are configured as single points of access with no High Availability setup, restarting them will result in their temporary outage as viewed from the client side.
7 H6 g) c$ `0 E1 g* y
Tip: Samba Not Managed by DeepSea
Because DeepSea and the Ceph Dashboard do not currently support Samba deployments, you need to manage Samba related services manually. For more details, see Chapter 29, Exporting Ceph Data via Samba.
. ?! P8 J1 ~4 ^) P$ k/ b' W4 \
To restart all services on the cluster, run the following command:
root@master # salt-run state.orch ceph.restartCOPY
1 c' T/ W" d) |5 j& H( X' h+ d
  • For DeepSea prior to version 0.8.4, the Metadata Server, iSCSI Gateway, Object Gateway, and NFS Ganesha services restart in parallel.
  • For DeepSea 0.8.4 and newer, all roles you have configured restart in the following order: Ceph Monitor, Ceph Manager, Ceph OSD, Metadata Server, Object Gateway, iSCSI Gateway, NFS Ganesha. To keep the downtime low and to find potential issues as early as possible, nodes are restarted sequentially. For example, only one monitoring node is restarted at a time.

    0 _, r2 f. H% V; Y2 l1 z' q- D% S
& m* r2 }9 q1 i5 b/ Z! y8 y
The command waits for the cluster to recover if the cluster is in a degraded, unhealthy state.

- w! i3 W$ D; V% b) f  W. v' B16.2.2 Restarting Specific Services REPORT DOCUMENTATION BUG#EDIT SOURCE% @5 v7 E+ D' l* F3 ?; X' h" S
# |) j7 C) Z& n0 ~* h0 Z* H& k- D
/ p$ {; O7 q9 ~) Q, e. M
To restart a specific service on the cluster, run:
root@master # salt-run state.orch ceph.restart.service_nameCOPY
  u! V+ S' w6 V
For example, to restart all Object Gateways, run:
root@master # salt-run state.orch ceph.restart.rgwCOPY

' h0 y! h. p' T+ R6 s" V
You can use the following targets:
root@master # salt-run state.orch ceph.restart.monCOPY
3 R' i4 d' k" h% x$ t# h. P
root@master # salt-run state.orch ceph.restart.mgrCOPY

. R$ T/ p1 M9 h& l
root@master # salt-run state.orch ceph.restart.osdCOPY
$ p9 Q+ G: [3 B9 p8 N
root@master # salt-run state.orch ceph.restart.mdsCOPY
# i% e' h/ y7 z2 z2 ~+ w9 t; Z9 R
root@master # salt-run state.orch ceph.restart.rgwCOPY

  U- ~2 v* L2 X+ u! {8 @/ h
root@master # salt-run state.orch ceph.restart.igwCOPY
& R2 ]1 P5 o/ w: C2 j7 t1 i
root@master # salt-run state.orch ceph.restart.ganeshaCOPY

7 E8 ]5 i% ^+ y! {* l' i5 L& B4 R
The restart orchestration checks if the installated binary is newer than the current one, or if configuration changes exist for this daemon and only restarts in those cases. If you run the above command and nothing happens, this is due to these conditions. See Section 16.1.2, “Starting, Stopping, and Restarting Individual Services” for more information.

3 P' U% H7 U/ f
5 y% Y; r6 O; v8 O1 \0 G" @9 p8 h
16.3 Shutdown and Start of the Whole Ceph Cluster REPORT DOCUMENTATION BUG#EDIT SOURCE
- F/ j' W( y$ [: [
' z1 Y. S  H/ l% X5 |% O6 h8 H8 k* |$ A
There are occasions when you need to stop all Ceph related services in the cluster in the recommended order, and then be able to simply start them again. For example, in case of a planned power outage.
PROCEDURE 16.1: SHUTTING DOWN THE WHOLE CEPH CLUSTER REPORT DOCUMENTATION BUG#
& D5 @2 C5 U" d: D
  • Shut down or disconnect any clients accessing the cluster.
  • To prevent CRUSH from automatically rebalancing the cluster, set the cluster to noout:
    root@master # ceph osd set nooutCOPY
    2 ]8 U  C, s- c# s  S7 U: n: `/ o
  • Disable safety measures and run the ceph.shutdown runner:
    root@master # salt-run disengage.safetyroot@master # salt-run state.orch ceph.shutdownCOPY
    : p* q% @) ?/ V9 j! |/ Y  l# W
  • Power off all cluster nodes:
    root@master # salt -C 'G@deepsea:*' cmd.run "shutdown -h"COPY

    # u' V* S" g/ z9 @1 J/ H: g! j1 ?6 Y' t* ^3 i; [

( ?7 y) C3 X% v1 O: m' S- i
1 Z, X0 k9 I0 K, s7 T9 t* z% IPROCEDURE 16.2: STARTING THE WHOLE CEPH CLUSTER REPORT DOCUMENTATION BUG#
2 F" V( l( d6 J7 d) I" f
  • Power on the Admin Node.
  • Power on the Ceph Monitor nodes.
  • Power on the Ceph OSD nodes.
  • Unset the previously set noout flag:
    root@master # ceph osd unset nooutCOPY

    5 U- e: a: C3 S- a, }# b
  • Power on all configured gateways.
  • Power on or connect cluster clients.
    9 S6 u4 y0 k& ^' Q" r

* e" ^3 J/ L5 S8 \5 W* v& l" }% _! v8 j  T7 w
$ m0 _8 ?+ X- |" j

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2022-4-12 09:28:21 | 显示全部楼层
16 Operating Ceph Services REPORT DOCUMENTATION BUG#EDIT SOURCE/ O7 k7 t% l0 R" K
16.1 Operating Ceph Cluster Related Services Using systemd8 b2 K8 y6 ^  j
16.2 Restarting Ceph Services Using DeepSea" E7 P, \9 z6 t# p$ I& M
16.3 Shutdown and Start of the Whole Ceph Cluster
1 b$ ~) B# ^. _, z9 aYou can operate Ceph services either using systemd or using DeepSea.
; i7 O% ^6 h* ?0 a; y1 c$ g6 R
16.1 Operating Ceph Cluster Related Services Using systemd REPORT DOCUMENTATION BUG#EDIT SOURCE' h4 x. L5 o) P8 I( |% p; V
Use the systemctl command to operate all Ceph related services. The operation takes place on the node you are currently logged in to. You need to have root privileges to be able to operate on Ceph services./ ?. v  \1 @: w+ v; B6 p  [
; V9 l2 l& N# E' @- `' Q. W$ h
16.1.1 Starting, Stopping, and Restarting Services Using Targets REPORT DOCUMENTATION BUG#EDIT SOURCE! t/ p% e" z7 I
To simplify starting, stopping, and restarting all the services of a particular type (for example all Ceph services, or all MONs, or all OSDs) on a node, Ceph provides the following systemd unit files:
3 V# N7 w9 m1 `% N/ T- K! j
/ l  K, s) O; p# R+ |. Uls /usr/lib/systemd/system/ceph*.target' |' w+ s. |7 a# g1 W3 u) R
ceph.target- b- a. M/ R  p$ C
ceph-osd.target
4 u1 n  P; S( H/ }* E1 ^5 qceph-mon.target
' f$ Q/ j4 ?  |ceph-mgr.target! Z6 ^2 I! g/ d% u! P# I
ceph-mds.target. r! \5 H. p  R: c/ u2 A2 a
ceph-radosgw.target  T) R! D9 H7 r0 f# G+ x8 F+ u& g
ceph-rbd-mirror.target
. a8 k" I+ z, @6 n% `COPY
" W" I! _( d3 ?: H0 Z5 f2 \" nTo start/stop/restart all Ceph services on the node, run:8 M/ f0 D& l7 F6 {4 J: h

. r0 |  Z" y+ `# n0 x3 Qsystemctl start ceph.target
6 {6 h1 N% c  f$ O: p" v3 wsystemctl stop ceph.target
$ c5 N4 n; i' ]( u3 b5 Asystemctl restart ceph.target/ l2 A, d2 a6 l3 h
COPY$ c6 Y* j6 N5 o) ]4 ~, d
To start/stop/restart all OSDs on the node, run:
" \; L, ~5 }9 h$ U
. t% B- s8 {& u7 Psystemctl start ceph-osd.target
& ]2 o* Z5 t6 d) R* Q& w: fsystemctl stop ceph-osd.target. \0 C1 Z2 K" G+ p8 R  m
systemctl restart ceph-osd.target
4 t( k: x. F5 a9 v- DCOPY
9 h; F: [$ @7 {( b( {0 ]Commands for the other targets are analogous.
7 Y4 u4 m- U2 |+ L7 U$ I% P  p9 a; m' n  A3 o* _: p5 m! l
16.1.2 Starting, Stopping, and Restarting Individual Services REPORT DOCUMENTATION BUG#EDIT SOURCE% [; d* @, ]6 X
You can operate individual services using the following parameterized systemd unit files:$ J' _4 D& g# N' D0 r4 Y

. }2 `, D$ a# E+ l- Mceph-osd@.service
' l1 F4 M, j, s& ?' Yceph-mon@.service2 d& T; |0 D+ U, }& H1 G% j
ceph-mds@.service/ d6 B$ z6 t6 O' O/ h  D
ceph-mgr@.service
, x+ D+ a" c, @* J" {ceph-radosgw@.service
; E  E; j) `$ Gceph-rbd-mirror@.service
; P9 s7 h0 I+ ]COPY" v3 \5 t! j0 J* C4 N3 S: E0 R, W/ t' i
To use these commands, you first need to identify the name of the service you want to operate. See Section 16.1.3, “Identifying Individual Services” to learn more about services identification.
% n7 C5 r. B6 o2 m5 ?5 F5 |0 u  W# B# ~% o" x  ^' O/ |/ t0 _! g
To start, stop or restart the osd.1 service, run:& x! z+ \5 A  a, \' i' R# Y, I" u) M
; }3 O) q6 y  [
systemctl start ceph-osd@1.service
+ n$ `$ q6 f3 \# V$ K8 W+ S" \systemctl stop ceph-osd@1.service- l# O& `4 ^% n0 F( b) I" c/ A) t
systemctl restart ceph-osd@1.service
$ i- \* O' h) J' KCOPY1 |* ?1 [& Q7 \- R: n: N1 Q. e3 c
Commands for the other service types are analogous.0 [+ r& n1 }+ u0 [
- K) d" ?5 o0 K8 K# z
16.1.3 Identifying Individual Services REPORT DOCUMENTATION BUG#EDIT SOURCE
9 ~4 _5 r$ x7 G3 aYou can find out the names/numbers of a particular type of service in several ways. The following commands provide results for ceph* services. You can run them on any node of the Ceph cluster.
* U7 O: N. l0 \! g! o* b. W9 E  z9 ?1 A/ E
To list all (even inactive) services of type ceph*, run:- \% t% Q; B4 C! Q; s, V  B, R+ {

$ l+ j, t3 k+ s9 J# g' T3 asystemctl list-units --all --type=service ceph*/ Y  I. M' r. w% N/ a- q1 ~
COPY
) d& y6 T1 E+ o& `5 D; v+ W- |To list only the inactive services, run:+ f. f. O& P# W
+ m% B% z6 K4 t+ y" j
systemctl list-units --all --state=inactive --type=service ceph*$ R) X. A2 P/ f& e
COPY
& W3 `5 A; ]3 d# d$ ~6 JYou can also use salt to query services across multiple nodes:0 r- f, y, y8 z: C+ }6 U( e# U2 E
, V, g% z  j  K. ?# h8 |, B
salt TARGET cmd.shell \
& K, k4 E1 M+ Y "systemctl list-units --all --type=service ceph* | sed -e '/^$/,$ d'"  m5 X) p0 R4 p1 d( `; W( N/ U
COPY
  W: n/ E1 U% N& x4 r$ qQuery storage nodes only:
1 ?& s% c6 b. E) R; G* a5 j/ p; d% B( ]1 y! R8 B7 u- a, [3 H
salt -I 'roles:storage' cmd.shell \. l+ a( ]8 L. T, R
'systemctl list-units --all --type=service ceph*'* G$ M, a4 y8 w9 k3 \* N6 J
COPY
, N( J3 `/ ~. j16.1.4 Service Status REPORT DOCUMENTATION BUG#EDIT SOURCE
" e9 Y5 f2 Y. m4 _! X3 i0 H5 vYou can query systemd for the status of services. For example:, }0 e$ r" z! e

0 A2 y/ N# Z1 p! o. H4 w8 vsystemctl status ceph-osd@1.service
+ `% ?0 `' p6 d& G: l0 x2 v) u* H) fsystemctl status ceph-mon@HOSTNAME.service2 I: G, O8 a1 X6 q" D
COPY6 d7 J8 M* |8 P7 W8 X3 t! E) N$ ~
Replace HOSTNAME with the host name the daemon is running on.0 h! L# w0 X( J( v8 X! D" P1 Q9 }

) [: h8 Y; v+ l* ?# R( rIf you do not know the exact name/number of the service, see Section 16.1.3, “Identifying Individual Services”./ C; ]% n$ T6 i4 G

* l$ x  _4 a  Z+ E  r8 y8 J* f5 v16.2 Restarting Ceph Services Using DeepSea REPORT DOCUMENTATION BUG#EDIT SOURCE  V( O( f! z5 W& A  K( [
After applying updates to the cluster nodes, the affected Ceph related services need to be restarted. Normally, restarts are performed automatically by DeepSea. This section describes how to restart the services manually.- `( `( x. s1 p  h6 F

* l+ ?2 m. a" E+ d& S7 g: s& e9 M: QTipTip: Watching the Restart
2 E5 f1 N9 D4 b& q0 OThe process of restarting the cluster may take some time. You can watch the events by using the Salt event bus by running:5 F& f2 o8 F$ a4 p. ~
  g! u* \! G- B. L6 I
salt-run state.event pretty=True1 }5 v2 o: h$ `, `. v% N9 r
COPY
1 u8 z) L7 L& }8 m4 J3 ?3 f3 IAnother command to monitor active jobs is
+ W, ?! d, h( e) s  Z  `6 @$ O5 X# m7 k& j& g7 O  e* W) \2 j
salt-run jobs.active5 t) w( w1 s5 @& s; Y
COPY8 I5 k/ I- Q9 E2 o& u2 ^/ f
16.2.1 Restarting All Services REPORT DOCUMENTATION BUG#EDIT SOURCE. A. |7 ^) x: Q: T; a) [
WarningWarning: Interruption of Services4 i) K' u  h4 V: K+ f
If Ceph related services—specifically iSCSI or NFS Ganesha—are configured as single points of access with no High Availability setup, restarting them will result in their temporary outage as viewed from the client side.
9 U2 E7 E! {- B( {% Q' r; Z1 K- [, Q4 u4 M* M
TipTip: Samba Not Managed by DeepSea; M5 H/ v8 S7 N. q6 x
Because DeepSea and the Ceph Dashboard do not currently support Samba deployments, you need to manage Samba related services manually. For more details, see Chapter 29, Exporting Ceph Data via Samba.
7 ~( r7 `- N3 w$ `" F+ T8 [
6 C2 W  l# V7 TTo restart all services on the cluster, run the following command:7 ]# n' T9 Y/ h) {
6 P8 H3 R5 o: \, ^  ^% c; s
salt-run state.orch ceph.restart
/ l. a* X7 x$ x8 s+ sCOPY
- {, U$ j+ G  ?; n1 {2 n3 z( x# f: M+ CFor DeepSea prior to version 0.8.4, the Metadata Server, iSCSI Gateway, Object Gateway, and NFS Ganesha services restart in parallel.
1 C0 K2 b- X+ V4 K7 ]. B
  u: P: n4 f# N* ~For DeepSea 0.8.4 and newer, all roles you have configured restart in the following order: Ceph Monitor, Ceph Manager, Ceph OSD, Metadata Server, Object Gateway, iSCSI Gateway, NFS Ganesha. To keep the downtime low and to find potential issues as early as possible, nodes are restarted sequentially. For example, only one monitoring node is restarted at a time.8 Z3 h& v" p5 z& `7 B

7 g' ?2 M% c2 k' K* G: }/ m8 P" ~The command waits for the cluster to recover if the cluster is in a degraded, unhealthy state.
5 q; b2 a) i# v- X
4 A* {) V0 x. S16.2.2 Restarting Specific Services REPORT DOCUMENTATION BUG#EDIT SOURCE- E' L& V- H% x- I: p/ [
To restart a specific service on the cluster, run:) h1 K! X, Z. |8 j4 Z

$ F; {3 L* s' N( b% h) k  Vsalt-run state.orch ceph.restart.service_name3 }. V) P9 L' _, z
COPY
+ k- I0 O) e& e# E' C" s8 v: KFor example, to restart all Object Gateways, run:
8 p2 i" `. H! f4 G( [! _
4 z0 j  T. L! `* u! |: I. ]! ^salt-run state.orch ceph.restart.rgw
) o7 G+ }( M/ kCOPY
( ?/ l0 b0 s) D/ P% o+ ]1 ?You can use the following targets:7 G! V# A, A- A# H1 @: Z2 N8 k3 C
" O' ^& g9 v7 Y. |3 O
salt-run state.orch ceph.restart.mon9 Y% H( N% D+ R  }
COPY- v" U6 ]) y1 ]& _4 k% _2 O$ ?, r
salt-run state.orch ceph.restart.mgr2 J$ _9 |  b& B! ?
COPY
. B7 \: h( H' B" }/ Jsalt-run state.orch ceph.restart.osd- {; e% U5 h$ z4 \& q: J
COPY. N$ y/ Z8 @7 ?" H
salt-run state.orch ceph.restart.mds
. L( m1 v5 t, p: @6 Q/ l3 KCOPY
0 m; {' N3 Y/ p9 N2 osalt-run state.orch ceph.restart.rgw' B' g, Y! k; O8 u4 b) r: Z# d0 D
COPY
, d, h0 p" q$ Usalt-run state.orch ceph.restart.igw9 v& ]" u! G. H" r; X& C8 U+ M
COPY
. i$ O- g, l3 i' A/ ]9 G, r0 Lsalt-run state.orch ceph.restart.ganesha' I% _( z* i& G$ G/ a6 B( [
COPY
! R; W/ i( H& WThe restart orchestration checks if the installated binary is newer than the current one, or if configuration changes exist for this daemon and only restarts in those cases. If you run the above command and nothing happens, this is due to these conditions. See Section 16.1.2, “Starting, Stopping, and Restarting Individual Services” for more information.
  \: r1 o3 J! p# J, d# y* ?, \! N& k: e7 C
16.3 Shutdown and Start of the Whole Ceph Cluster REPORT DOCUMENTATION BUG#EDIT SOURCE1 U) W3 Z  W! |7 z8 P- X
There are occasions when you need to stop all Ceph related services in the cluster in the recommended order, and then be able to simply start them again. For example, in case of a planned power outage.( X) {, F- q6 w/ p

  M: U; ~" }' Z$ @) E$ u: j! `PROCEDURE 16.1: SHUTTING DOWN THE WHOLE CEPH CLUSTER REPORT DOCUMENTATION BUG#) d$ ?* P9 @; |, f' {5 o+ y
Shut down or disconnect any clients accessing the cluster.4 V5 S6 X6 A; ^3 X

7 ~! }. ~+ M/ v, t1 w6 kTo prevent CRUSH from automatically rebalancing the cluster, set the cluster to noout:
1 y* X' _6 F/ A) G* q$ ~% t4 g- ^7 Z  g$ I* P% ~
ceph osd set noout4 X# e0 X/ @& S9 Z) F
COPY
# N$ b6 t7 j& c' S* v0 hDisable safety measures and run the ceph.shutdown runner:
  t# R7 B- z8 y2 o4 D9 {. t1 [0 `! U  y* u0 r9 j" ]
salt-run disengage.safety4 e( L8 ~* X( N$ I3 H! i
salt-run state.orch ceph.shutdown
3 e! J7 a- ]6 o& N! DCOPY6 F$ ^# O% c- ~  C$ l7 e
Power off all cluster nodes:
# e) ]2 }& |+ p$ L: W* D
! H# @% Z3 E4 y, M  {; u/ }( E( asalt -C 'G@deepsea:*' cmd.run "shutdown -h"# U/ \' r  Z, r1 c
COPY" X; `+ r5 T' K2 v1 n& M. ^
PROCEDURE 16.2: STARTING THE WHOLE CEPH CLUSTER REPORT DOCUMENTATION BUG#
/ R: j/ r) n* w6 T7 M/ p4 GPower on the Admin Node.' d' v+ D" C7 ~% p5 q( _, D
1 u9 y: y! Y4 f2 J+ B( R
Power on the Ceph Monitor nodes.7 R8 m$ n* O/ E
- `; t- u/ W$ D9 ^) |  k
Power on the Ceph OSD nodes.
$ J6 e$ {- P6 D" D  s7 d3 O
$ S, o$ Y$ v& a. ~. fUnset the previously set noout flag:
3 a0 B" u* q# W, ~& U/ x0 C- x! V* ?2 s0 e, d  b; u( G% c8 N
ceph osd unset noout
/ f( ]$ I& ^+ [5 T9 x8 n/ R5 PCOPY
  J, H7 E1 o* X$ e' DPower on all configured gateways.
% C$ J) W  k9 Q! s3 L$ b% e# @1 t/ e  D2 C7 w8 N
Power on or connect cluster clients.
您需要登录后才可以回帖 登录 | 注册

本版积分规则

返回首页|Archiver|手机版|小黑屋|易陆发现技术论坛 ( 蜀ICP备2026014127号-1 )

GMT+8, 2026-6-12 00:06 , Processed in 0.023717 second(s), 23 queries .

Powered by Discuz! X5.0

© 2001-2026 Discuz! Team.

快速回复 返回顶部 返回列表