ceph rgw：lifecycle实现

lifecycle policy的存储机制

当我们为一个bucket配置lifecycle policy时，lifecycle相关的数据会存储在2个位置：

在bucket.instance对象的xattr中写入key为user.rgw.lc value为LifecycleConfig的属性（真正的lifecycle rules列表）。
在lc.0 - lc.31共32个对象(这个值可配置)中选择其中一个，向其omap写入lifecycle的状态信息。但其实在该omap对应的header中也有lc相关的信息，比如用于记录当前omap的lifecycle遍历进度的marker，但这些数据不是在设置lc时设置的。

下面我们验证下：

首先，我们通过boto向一个名为bucket1的bucket配置lifecycle policy:

bucket1 = conn.get_bucket('bucket1')
expir = Expiration(days=1)
lc = Lifecycle()
lc.add_rule(
    prefix = "test/",
    expiration = expir,
)
bucket1.configure_lifecycle(lc)

之后，去该bucket1对应的bucket.instance对象的xattr中查看。

$ rados -p default.rgw.meta ls --namespace root
bucket1
.bucket.meta.bucket1:38d08ed7-3883-49de-ab89-0dea7c8c960f.4162.1

$ rados -p default.rgw.meta --namespace root listxattr .bucket.meta.bucket1:38d08ed7-3883-49de-ab89-0dea7c8c960f.4162.1
ceph.objclass.version
user.rgw.acl
user.rgw.lc

user.rgw.lc对应的value就是该bucket的lifecycle rule列表。

然后，再去查看lc.xx对象

$ rados -p default.rgw.log --namespace=lc ls
lc.6
lc.14
lc.29
lc.8
lc.10
lc.26
lc.22
lc.17
lc.27
lc.4
lc.11
lc.18
lc.20
lc.7
lc.2
lc.13
lc.16
lc.12
lc.30
lc.24
lc.9
lc.15
lc.19
lc.21
lc.23
lc.31
lc.25
lc.5
lc.3
lc.28
lc.1
lc.0

RGWPutLC::execute()代码

lifecycle的组织方式也可以在put lc操作的代码中窥见一斑。

void RGWPutLC::execute()
{
  bufferlist bl;
  
  RGWLifecycleConfiguration_S3 *config = NULL;
  RGWLCXMLParser_S3 parser(s->cct);
  RGWLifecycleConfiguration_S3 new_config(s->cct);

  // 从http header中取出md5到content_md5
  content_md5 = s->info.env->get("HTTP_CONTENT_MD5");
  if (content_md5 == nullptr) {
    op_ret = -ERR_INVALID_REQUEST;
    s->err.message = "Missing required header for this request: Content-MD5";
    ldout(s->cct, 5) << s->err.message << dendl;
    return;
  }
  // 将取出的md5从base64解码到content_md5_bin
  std::string content_md5_bin;
  try {
    content_md5_bin = rgw::from_base64(boost::string_view(content_md5));
  } catch (...) {
    s->err.message = "Request header Content-MD5 contains character "
                     "that is not base64 encoded.";
    ldout(s->cct, 5) << s->err.message << dendl;
    op_ret = -ERR_BAD_DIGEST;
    return;
  }

  if (!parser.init()) {
    op_ret = -EINVAL;
    return;
  }
  // 从req_state中解析出put lc所需的参数存入RGWPutLC.data长度为RGWPutLC.len
  op_ret = get_params();
  if (op_ret < 0)
    return;

  ldout(s->cct, 15) << "read len=" << len << " data=" << (data ? data : "") << dendl;

  // 计算params的MD5
  MD5 data_hash;
  unsigned char data_hash_res[CEPH_CRYPTO_MD5_DIGESTSIZE];
  data_hash.Update(reinterpret_cast<const byte*>(data), len);
  data_hash.Final(data_hash_res);
  // 比较计算出的md5和客户端传入的md5是否一致，以判断数据是否损坏
  if (memcmp(data_hash_res, content_md5_bin.c_str(), CEPH_CRYPTO_MD5_DIGESTSIZE) != 0) {
    op_ret = -ERR_BAD_DIGEST;
    s->err.message = "The Content-MD5 you specified did not match what we received.";
    ldout(s->cct, 5) << s->err.message
                     << " Specified content md5: " << content_md5
                     << ", calculated content md5: " << data_hash_res
                     << dendl;
    return;
  }

  // 将data中的参数数据解析到parser对象中
  if (!parser.parse(data, len, 1)) {
    op_ret = -ERR_MALFORMED_XML;
    return;
  }
  // 解析出的xml对象是一颗树结构
  /*
  class XMLObj
  {
    XMLObj *parent;
    ......
    multimap<string, XMLObj *> children;
    ......
  }
  */
  // 如上，每一个标签作为一个节点，分别包含指向其父节点的指针和孩子节点的指针列表
  config = static_cast<RGWLifecycleConfiguration_S3 *>(parser.find_first("LifecycleConfiguration"));
  if (!config) {
    op_ret = -ERR_MALFORMED_XML;
    return;
  }
  // 将config中的rule_map中的rule转存到new_config的rule_map和prefix_map中
  op_ret = config->rebuild(store, new_config);
  if (op_ret < 0)
    return;

  if (s->cct->_conf->subsys.should_gather(ceph_subsys_rgw, 15)) {
    ldout(s->cct, 15) << "New LifecycleConfiguration:";
    new_config.to_xml(*_dout);
    *_dout << dendl;
  }
  // 将rule_map编码存入bl，并copy一个attrs map，增加RGW_ATTR_LC->bl项，
  new_config.encode(bl);
  map<string, bufferlist> attrs;
  attrs = s->bucket_attrs;
  attrs[RGW_ATTR_LC] = bl;
  // 将新的attrs写入bucket.instance对象的xattr中，
  op_ret = rgw_bucket_set_attrs(store, s->bucket_info, attrs, &s->bucket_info.objv_tracker);
  if (op_ret < 0)
    return;

  string shard_id = s->bucket.tenant + ':' + s->bucket.name + ':' + s->bucket.bucket_id;  
  string oid; 
  // 从default.rgw.log pool中的32个lc.xx对象中选择一个，构造oid，xx表示一个0-31的整数
  // get_lc_oid代码如下：
  /*
  static void get_lc_oid(struct req_state *s, string& oid){
    string shard_id = s->bucket.name + ':' +s->bucket.bucket_id;
    int max_objs = (s->cct->_conf->rgw_lc_max_objs > HASH_PRIME)?HASH_PRIME:s->cct->_conf->rgw_lc_max_objs;
    int index = ceph_str_hash_linux(shard_id.c_str(), shard_id.size()) % HASH_PRIME % max_objs;
    oid = lc_oid_prefix;
    char buf[32];
    snprintf(buf, 32, ".%d", index);
    oid.append(buf);
    return;
  }
  */
  get_lc_oid(s, oid);
  // 构造要写入omap的entry内容
  pair<string, int> entry(shard_id, lc_uninitial);
  int max_lock_secs = s->cct->_conf->rgw_lc_lock_max_time;
  rados::cls::lock::Lock l(lc_index_lock_name); 
  utime_t time(max_lock_secs, 0);
  l.set_duration(time);
  l.set_cookie(cookie);
  librados::IoCtx *ctx = store->get_lc_pool_ctx();
  do {
    op_ret = l.lock_exclusive(ctx, oid);
    if (op_ret == -EBUSY) {
      dout(0) << "RGWLC::RGWPutLC() failed to acquire lock on, sleep 5, try again" << oid << dendl;
      sleep(5);
      continue;
    }
    if (op_ret < 0) {
      dout(0) << "RGWLC::RGWPutLC() failed to acquire lock " << oid << op_ret << dendl;
      break;
    }

    // 在lc.xx对象关联的omap中写入entry
    op_ret = cls_rgw_lc_set_entry(*ctx, oid, entry);
    if (op_ret < 0) {
      dout(0) << "RGWLC::RGWPutLC() failed to set entry " << oid << op_ret << dendl;     
    }
    break;
  }while(1);
  l.unlock(ctx, oid);
  return;
}

lifecycle的作用机制

RGWLC类是负责执行lc的类，它会根据用户的配置开启1个或多个worker线程，这些worker线程的任务是在一个无限循环中，每隔一段时间（生产环境是一天一次，测试环境比较频繁）判断一下当前是否应该执行lifecycle的遍历工作，如果是的话，调用RGWLC类的process方法，随机选择32个lc.xx对象中的一个，根据其header中的标记，取出其未遍历的下一个omap entry，更新header中的标记，更新该entry的状态为processing，然后处理该entry，遍历该条entry对应的bucket中的所有对象，根据lc规则删除或转换bucket中过期的object，并写日志。

代码追踪如下：

下面这个函数是worker线程的的执行内容，可以看到，它在一个while循环中，每隔一段时间判断should_work，如果通过的话，那么就调用lc->process()函数进行遍历，然后设置下一次被唤醒的时间，进入阻塞状态。

void *RGWLC::LCWorker::entry() {
  do {
    utime_t start = ceph_clock_now();
    if (should_work(start)) {
      dout(2) << "life cycle: start" << dendl;
      int r = lc->process();
      if (r < 0) {
        dout(0) << "ERROR: do life cycle process() returned error r=" << r << dendl;
      }
      dout(2) << "life cycle: stop" << dendl;
    }
    if (lc->going_down())
      break;

    utime_t end = ceph_clock_now();
    int secs = schedule_next_start_time(start, end);
    utime_t next;
    next.set_from_double(end + secs);

    dout(5) << "schedule life cycle next start time: " << rgw_to_asctime(next) <<dendl;

    lock.Lock();
    cond.WaitInterval(lock, utime_t(secs, 0));
    lock.Unlock();
  } while (!lc->going_down());

  return NULL;
}

在RGWLC::process函数中，主要做了以下几件事：
1.从lc.xx对象的header中获得omap中要遍历的下一个entry
2.将拿到的entry的状态设为processing（正在处理）
3.更新header中记录的下一个entry
4.调用bucket_lc_process函数处理当前的entry对应的lc规则

int RGWLC::process(int index, int max_lock_secs)
{
  rados::cls::lock::Lock l(lc_index_lock_name);
  do {
    utime_t now = ceph_clock_now();
    pair<string, int > entry;//string = bucket_name:bucket_id ,int = LC_BUCKET_STATUS
    if (max_lock_secs <= 0)
      return -EAGAIN;

    utime_t time(max_lock_secs, 0);
    l.set_duration(time);

    int ret = l.lock_exclusive(&store->lc_pool_ctx, obj_names[index]);
    if (ret == -EBUSY) { /* already locked by another lc processor */
      dout(0) << "RGWLC::process() failed to acquire lock on, sleep 5, try again" << obj_names[index] << dendl;
      sleep(5);
      continue;
    }
    if (ret < 0)
      return 0;
    // 读取lc.xx对象的head
    string marker;
    cls_rgw_lc_obj_head head;
    ret = cls_rgw_lc_get_head(store->lc_pool_ctx, obj_names[index], head);
    if (ret < 0) {
      dout(0) << "RGWLC::process() failed to get obj head " << obj_names[index] << ret << dendl;
      goto exit;
    }

    if(!if_already_run_today(head.start_date)) {
      head.start_date = now;
      head.marker.clear();
      ret = bucket_lc_prepare(index);
      if (ret < 0) {
      dout(0) << "RGWLC::process() failed to update lc object " << obj_names[index] << ret << dendl;
      goto exit;
      }
    }
    // 从lc.xx对象的header中获取下一个要遍历的omap entry
    ret = cls_rgw_lc_get_next_entry(store->lc_pool_ctx, obj_names[index], head.marker, entry);
    if (ret < 0) {
      dout(0) << "RGWLC::process() failed to get obj entry " << obj_names[index] << dendl;
      goto exit;
    }

    if (entry.first.empty())
      goto exit;
    // 将该entry的状态设为processing
    entry.second = lc_processing;
    ret = cls_rgw_lc_set_entry(store->lc_pool_ctx, obj_names[index],  entry);
    if (ret < 0) {
      dout(0) << "RGWLC::process() failed to set obj entry " << obj_names[index] << entry.first << entry.second << dendl;
      goto exit;
    }
    // 更新header中的下一个entry标记
    head.marker = entry.first;
    ret = cls_rgw_lc_put_head(store->lc_pool_ctx, obj_names[index],  head);
    if (ret < 0) {
      dout(0) << "RGWLC::process() failed to put head " << obj_names[index] << dendl;
      goto exit;
    }
    l.unlock(&store->lc_pool_ctx, obj_names[index]);
    // 处理当前的entry对应的lc规则
    ret = bucket_lc_process(entry.first);
    bucket_lc_post(index, max_lock_secs, entry, ret);
  }while(1);

exit:
    l.unlock(&store->lc_pool_ctx, obj_names[index]);
    return 0;
}

而bucket_lc_process函数做的则就是最终的处理工作了：遍历某条lc规则对应的bucket的所有objects，根据prefix和tagging找到lc 规则作用的object，然后判断这些objects是否过期，如果过期，做对应的删除处理。

要注意的是，目前L版本的ceph仅支持到期删除的lifecycle，也就是Expiration。不支持Transition。

最后编辑于：2017.12.18 20:28:51

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,547评论 6赞 477
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,399评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,428评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,599评论 1赞 274
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,612评论 5赞 365
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,577评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,941评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,603评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,852评论 1赞 297
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,605评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,693评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,375评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,955评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,936评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,172评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 43,970评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,414评论 2赞 342

ceph rgw：lifecycle实现

lifecycle policy的存储机制

RGWPutLC::execute()代码

lifecycle的作用机制

推荐阅读更多精彩内容