All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: new scrub and repair discussion
@ 2015-11-11 14:43 王志强
  2015-11-11 15:43 ` kefu chai
  0 siblings, 1 reply; 15+ messages in thread
From: 王志强 @ 2015-11-11 14:43 UTC (permalink / raw)
  To: ceph-devel

2015-11-11 19:44 GMT+08:00 kefu chai <tchaikov@gmail.com>:
> currently, scrub and repair are pretty primitive. there are several
> improvements which need to be made:
>
> - user should be able to initialize scrub of a PG or an object
>     - int scrub(pg_t, AioCompletion*)
>     - int scrub(const string& pool, const string& nspace, const
> string& locator, const string& oid, AioCompletion*)
> - we need a way to query the result of the most recent scrub on a pg.
>     - int get_inconsistent_pools(set<uint64_t>* pools);
>     - int get_inconsistent_pgs(uint64_t pool, paged<pg_t>* pgs);
>     - int get_inconsistent(pg_t pgid, epoch_t* cur_interval,
> paged<inconsistent_t>*)
> - the user should be able to query the content of the replica/shard
> objects in the event of an inconsistency.
>     - operate_on_shard(epoch_t interval, pg_shard_t pg_shard,
> ObjectReadOperation *op, bool allow_inconsistent)
> - the user should be able to perform following fixes using a new
> aio_operate_scrub(
>                                           const std::string& oid,
>                                           shard_id_t shard,
>                                           AioCompletion *c,
>                                           ObjectWriteOperation *op)
>     - specify which replica to use for repairing a content inconsistency
>     - delete an object if it can't exist
>     - write_full
>     - omap_set
>     - setattrs
> - the user should be able to repair snapset and object_info_t
>     - ObjectWriteOperation::repair_snapset(...)
>         - set/remove any property/attributes, for example,
>             - to reset snapset.clone_overlap
>             - to set snapset.clone_size
>             - to reset the digests in object_info_t,
> - repair will create a new version so that possibly corrupted copies
> on down OSDs will get fixed naturally.
>

I think this exposes too much things to the user. Usually a user
doesn't have knowledges like this. If we make it too much complicated,
no one will use it at the end.

> so librados will offer enough information and facilities, with which a
> smart librados client/script will be able to fix the inconsistencies
> found in the scrub.
>
> as an example, if we run into a data inconsistency where the 3
> replicas failed to agree with each other after performing a deep
> scrub. probably we'd like to have an election to get the auth copy.
> following pseudo code explains how we will implement this using the
> new rados APIs for scrub and repair.
>
>      # something is not necessarily better than nothing
>      rados.aio_scrub(pg, completion)
>      completion.wait_for_complete()
>      for pool in rados.get_inconsistent_pools():
>           for pg in rados.get_inconsistent_pgs(pool):
>                # rados.get_inconsistent_pgs() throws if "epoch" expires
>
>                for oid, inconsistent in rados.get_inconsistent_pgs(pg,
> epoch).items():
>                     if inconsistent.is_data_digest_mismatch():
>                          votes = defaultdict(int)
>                          for osd, shard_info in inconsistent.shards:
>                               votes[shard_info.object_info.data_digest] += 1
>                          digest, _ = mavotes, key=operator.itemgetter(1))
>                          auth_copy = None
>                          for osd, shard_info in inconsistent.shards.items():
>                               if shard_info.object_info.data_digest == digest:
>                                    auth_copy = osd
>                                    break
>                          repair_op = librados.ObjectWriteOperation()
>                          repair_op.repair_pick(auth_copy,
> inconsistent.ver, epoch)
>                          rados.aio_operate_scrub(oid, repair_op)
>
> this plan was also discussed in the infernalis CDS. see
> http://tracker.ceph.com/projects/ceph/wiki/Osd_-_Scrub_and_Repair.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread
* new scrub and repair discussion
@ 2015-11-11 11:44 kefu chai
  2015-11-11 13:25 ` Sage Weil
  2016-05-23  3:54 ` Shinobu Kinjo
  0 siblings, 2 replies; 15+ messages in thread
From: kefu chai @ 2015-11-11 11:44 UTC (permalink / raw)
  To: ceph-devel

currently, scrub and repair are pretty primitive. there are several
improvements which need to be made:

- user should be able to initialize scrub of a PG or an object
    - int scrub(pg_t, AioCompletion*)
    - int scrub(const string& pool, const string& nspace, const
string& locator, const string& oid, AioCompletion*)
- we need a way to query the result of the most recent scrub on a pg.
    - int get_inconsistent_pools(set<uint64_t>* pools);
    - int get_inconsistent_pgs(uint64_t pool, paged<pg_t>* pgs);
    - int get_inconsistent(pg_t pgid, epoch_t* cur_interval,
paged<inconsistent_t>*)
- the user should be able to query the content of the replica/shard
objects in the event of an inconsistency.
    - operate_on_shard(epoch_t interval, pg_shard_t pg_shard,
ObjectReadOperation *op, bool allow_inconsistent)
- the user should be able to perform following fixes using a new
aio_operate_scrub(
                                          const std::string& oid,
                                          shard_id_t shard,
                                          AioCompletion *c,
                                          ObjectWriteOperation *op)
    - specify which replica to use for repairing a content inconsistency
    - delete an object if it can't exist
    - write_full
    - omap_set
    - setattrs
- the user should be able to repair snapset and object_info_t
    - ObjectWriteOperation::repair_snapset(...)
        - set/remove any property/attributes, for example,
            - to reset snapset.clone_overlap
            - to set snapset.clone_size
            - to reset the digests in object_info_t,
- repair will create a new version so that possibly corrupted copies
on down OSDs will get fixed naturally.

so librados will offer enough information and facilities, with which a
smart librados client/script will be able to fix the inconsistencies
found in the scrub.

as an example, if we run into a data inconsistency where the 3
replicas failed to agree with each other after performing a deep
scrub. probably we'd like to have an election to get the auth copy.
following pseudo code explains how we will implement this using the
new rados APIs for scrub and repair.

     # something is not necessarily better than nothing
     rados.aio_scrub(pg, completion)
     completion.wait_for_complete()
     for pool in rados.get_inconsistent_pools():
          for pg in rados.get_inconsistent_pgs(pool):
               # rados.get_inconsistent_pgs() throws if "epoch" expires

               for oid, inconsistent in rados.get_inconsistent_pgs(pg,
epoch).items():
                    if inconsistent.is_data_digest_mismatch():
                         votes = defaultdict(int)
                         for osd, shard_info in inconsistent.shards:
                              votes[shard_info.object_info.data_digest] += 1
                         digest, _ = mavotes, key=operator.itemgetter(1))
                         auth_copy = None
                         for osd, shard_info in inconsistent.shards.items():
                              if shard_info.object_info.data_digest == digest:
                                   auth_copy = osd
                                   break
                         repair_op = librados.ObjectWriteOperation()
                         repair_op.repair_pick(auth_copy,
inconsistent.ver, epoch)
                         rados.aio_operate_scrub(oid, repair_op)

this plan was also discussed in the infernalis CDS. see
http://tracker.ceph.com/projects/ceph/wiki/Osd_-_Scrub_and_Repair.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-06-07 13:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-11 14:43 new scrub and repair discussion 王志强
2015-11-11 15:43 ` kefu chai
2016-05-19 13:09   ` kefu chai
2016-05-19 17:55     ` Samuel Just
2016-05-20 11:30       ` kefu chai
2016-05-25 17:37         ` Samuel Just
2016-06-07 13:13           ` kefu chai
2016-05-27 12:03     ` Dan van der Ster
2016-06-07 10:44       ` kefu chai
2016-06-07 13:03         ` Sage Weil
  -- strict thread matches above, loose matches on Subject: below --
2015-11-11 11:44 kefu chai
2015-11-11 13:25 ` Sage Weil
2015-11-11 14:53   ` kefu chai
2016-05-23  3:54 ` Shinobu Kinjo
2016-05-25 14:34   ` kefu chai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.