Re: Consistency problem with multiple rgws

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Consistency problem with multiple rgws
       [not found] ` <9193cf3c-3970-56d9-759a-c160626fb27a@redhat.com>
@ 2016-12-15 16:08   ` Casey Bodley
  2016-12-16  7:36   ` sw zhang
  1 sibling, 0 replies; 4+ messages in thread
From: Casey Bodley @ 2016-12-15 16:08 UTC (permalink / raw)
  To: The Sacred Order of the Squid Cybernetic

oops, the list bounced my reply because of html


-------- Forwarded Message --------
Subject: 	Re: Consistency problem with multiple rgws
Date: 	Thu, 15 Dec 2016 11:05:03 -0500
From: 	Casey Bodley <cbodley@redhat.com>
To: 	18896724396 <zhang_shaowen@139.com>, yehuda <yehuda@redhat.com>
CC: 	ceph-devel <ceph-devel@vger.kernel.org>, 郭占东 
<guozhandong@cmss.chinamobile.com>, lvshuhua 
<lvshuhua@cmss.chinamobile.com>



Hi,

On 12/15/2016 02:55 AM, 18896724396 wrote:
> Hi,
> We have two RGWs in master zone and two RGWs in slave zone. We use 
> cosbench to upload 50,000 objs to a single bucket. After the data sync 
> is finished, the bucket stats is not the same between master and slave 
> zone.
The data sync may take a while with that many objects. How are you 
verifying that data sync finished? Have you tried 'radosgw-admin bucket 
sync status --bucket=<name>'?
> Then we test the same case with one RGW in master zone and slave zone, 
> the stats is also not same. At last we test with one RGW and modify 
> the config rgw_num_rados_handles to 1(we set it 2 before), and this 
> time the stats is same and correct. Though multiple RGWs still have 
> the problem.
> According to the code, I find that when we update bucket index, rgw 
> will call cls_rgw_bucket_complete_op to update the bucket stats and at 
> last osd will call rgw_bucket_complete_op. In this function, osd first 
> read the bucket header, and then update the stats, last it write the 
> head back. So I think two concurrent request to update the stats may 
> lead to the consistency problem. And maybe some other operation also 
> have the same problem. How could we solve the consistency problem?
The osd guarantees that two operations in the same placement group won't 
run concurrently, so this kind of logic in cls should be safe. How far 
off are the bucket stats? Can you share some example output?
>
> Best regards.
> Zhang Shaowen

Thanks,
Casey

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Consistency problem with multiple rgws
       [not found] ` <9193cf3c-3970-56d9-759a-c160626fb27a@redhat.com>
  2016-12-15 16:08   ` Consistency problem with multiple rgws Casey Bodley
@ 2016-12-16  7:36   ` sw zhang
  2016-12-21 16:48     ` Casey Bodley
  1 sibling, 1 reply; 4+ messages in thread
From: sw zhang @ 2016-12-16  7:36 UTC (permalink / raw)
  To: Casey Bodley; +Cc: yehuda, ceph-devel, 郭占东, lvshuhua

Hi,
I test it again today that each zone has one RGW with config
'rgw_num_rados_handles=2'. I use cosbench to  upload 50,000 object ,
each object is 4M,
the number of workers is 10.
After the data sync is finished(I use the command 'radosgw-admin
bucket sync status --bucket=<name>' and 'radosgw-admin sync status' to
check that)
Below is the bucket stats result:

Master zone:
[root@ceph36 ~]# radosgw-admin bucket stats --bucket=shard23
{
    "bucket": "shard23",
    "pool": "master.rgw.buckets.data",
    "index_pool": "master.rgw.buckets.index",
    "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
    "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
    "owner": "zsw-test",
    "ver": "0#50039,1#49964",
    "master_ver": "0#0,1#0",
    "mtime": "2016-12-16 10:58:56.174049",
    "max_marker": "0#00000050038.56144.3,1#00000049963.56109.3",
    "usage": {
        "rgw.main": {
            "size_kb": 195300782,
            "size_kb_actual": 195388276,
            "num_objects": 50000
        }
    },
    "bucket_quota": {
        "enabled": false,
        "max_size_kb": -1,
        "max_objects": -1
    }
}

Slave zone:
[root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
{
    "bucket": "shard23",
    "pool": "slave.rgw.buckets.data",
    "index_pool": "slave.rgw.buckets.index",
    "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
    "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
    "owner": "zsw-test",
    "ver": "0#51172,1#51070",
    "master_ver": "0#0,1#0",
    "mtime": "2016-12-16 10:58:56.174049",
    "max_marker": "0#00000051171.112193.3,1#00000051069.79607.3",
    "usage": {
        "rgw.main": {
            "size_kb": 194769532,
            "size_kb_actual": 194856788,
            "num_objects": 49861
        }
    },
    "bucket_quota": {
        "enabled": false,
        "max_size_kb": -1,
        "max_objects": -1
    }
}

We can see that in slave zone, object number in bucket stats is less
than master. But if I use s3cmd to list the bucket in slave zone, the
result is right:
[root@ceph05 ~]# s3cmd ls s3://shard23 | wc -l
50000

And after I list the bucket with s3cmd, I use the bucket stats in
slave zone again:
[root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
{
    "bucket": "shard23",
    "pool": "slave.rgw.buckets.data",
    "index_pool": "slave.rgw.buckets.index",
    "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
    "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
    "owner": "zsw-test",
    "ver": "0#51182,1#51079",
    "master_ver": "0#0,1#0",
    "mtime": "2016-12-16 10:58:56.174049",
    "max_marker": "0#00000051181.112203.9,1#00000051078.79616.9",
    "usage": {
        "rgw.main": {
            "size_kb": 194769532,
            "size_kb_actual": 194856788,
            "num_objects": 50000
        }
    },
    "bucket_quota": {
        "enabled": false,
        "max_size_kb": -1,
        "max_objects": -1
    }
}

We can see that the num_objects is right now. (According to the code ,
list bucket will send the 'dir_suggest_changes' request to the osd. I
think this is why the number is right now.)
If each zone have two rgw with config 'rgw_num_rados_handles=1', the
difference between the bucket stats is smaller, from 10 to 40.
If each zone have one rgw with config 'rgw_num_rados_handles=1', the
bucket stats are same.
My colleague and I have tested that multi times in two different
clusters(Ceph version is jewel), and this problem nearly occurs every
time.


2016-12-16 0:05 GMT+08:00 Casey Bodley <cbodley@redhat.com>:
> Hi,
>
> On 12/15/2016 02:55 AM, 18896724396 wrote:
>
> Hi,
> We have two RGWs in master zone and two RGWs in slave zone. We use cosbench
> to upload 50,000 objs to a single bucket. After the data sync is finished,
> the bucket stats is not the same between master and slave zone.
>
> The data sync may take a while with that many objects. How are you verifying
> that data sync finished? Have you tried 'radosgw-admin bucket sync status
> --bucket=<name>'?
>
> Then we test the same case with one RGW in master zone and slave zone, the
> stats is also not same. At last we test with one RGW and modify the config
> rgw_num_rados_handles to 1(we set it 2 before), and this time the stats is
> same and correct. Though multiple RGWs still have the problem.
> According to the code, I find that when we update bucket index, rgw will
> call cls_rgw_bucket_complete_op to update the bucket stats and at last osd
> will call rgw_bucket_complete_op. In this function, osd first read the
> bucket header, and then update the stats, last it write the head back. So I
> think two concurrent request to update the stats may lead to the consistency
> problem. And maybe some other operation also have the same problem. How
> could we solve the consistency problem?
>
> The osd guarantees that two operations in the same placement group won't run
> concurrently, so this kind of logic in cls should be safe. How far off are
> the bucket stats? Can you share some example output?
>
>
> Best regards.
> Zhang Shaowen
>
>
> Thanks,
> Casey

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Consistency problem with multiple rgws
  2016-12-16  7:36   ` sw zhang
@ 2016-12-21 16:48     ` Casey Bodley
  2016-12-23  1:53       ` sw zhang
  0 siblings, 1 reply; 4+ messages in thread
From: Casey Bodley @ 2016-12-21 16:48 UTC (permalink / raw)
  To: sw zhang; +Cc: yehuda, ceph-devel, 郭占东, lvshuhua


On 12/16/2016 02:36 AM, sw zhang wrote:
> Hi,
> I test it again today that each zone has one RGW with config
> 'rgw_num_rados_handles=2'. I use cosbench to  upload 50,000 object ,
> each object is 4M,
> the number of workers is 10.
> After the data sync is finished(I use the command 'radosgw-admin
> bucket sync status --bucket=<name>' and 'radosgw-admin sync status' to
> check that)
> Below is the bucket stats result:
>
> Master zone:
> [root@ceph36 ~]# radosgw-admin bucket stats --bucket=shard23
> {
>      "bucket": "shard23",
>      "pool": "master.rgw.buckets.data",
>      "index_pool": "master.rgw.buckets.index",
>      "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>      "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>      "owner": "zsw-test",
>      "ver": "0#50039,1#49964",
>      "master_ver": "0#0,1#0",
>      "mtime": "2016-12-16 10:58:56.174049",
>      "max_marker": "0#00000050038.56144.3,1#00000049963.56109.3",
>      "usage": {
>          "rgw.main": {
>              "size_kb": 195300782,
>              "size_kb_actual": 195388276,
>              "num_objects": 50000
>          }
>      },
>      "bucket_quota": {
>          "enabled": false,
>          "max_size_kb": -1,
>          "max_objects": -1
>      }
> }
>
> Slave zone:
> [root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
> {
>      "bucket": "shard23",
>      "pool": "slave.rgw.buckets.data",
>      "index_pool": "slave.rgw.buckets.index",
>      "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>      "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>      "owner": "zsw-test",
>      "ver": "0#51172,1#51070",
>      "master_ver": "0#0,1#0",
>      "mtime": "2016-12-16 10:58:56.174049",
>      "max_marker": "0#00000051171.112193.3,1#00000051069.79607.3",
>      "usage": {
>          "rgw.main": {
>              "size_kb": 194769532,
>              "size_kb_actual": 194856788,
>              "num_objects": 49861
>          }
>      },
>      "bucket_quota": {
>          "enabled": false,
>          "max_size_kb": -1,
>          "max_objects": -1
>      }
> }
>
> We can see that in slave zone, object number in bucket stats is less
> than master. But if I use s3cmd to list the bucket in slave zone, the
> result is right:
> [root@ceph05 ~]# s3cmd ls s3://shard23 | wc -l
> 50000
>
> And after I list the bucket with s3cmd, I use the bucket stats in
> slave zone again:
> [root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
> {
>      "bucket": "shard23",
>      "pool": "slave.rgw.buckets.data",
>      "index_pool": "slave.rgw.buckets.index",
>      "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>      "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>      "owner": "zsw-test",
>      "ver": "0#51182,1#51079",
>      "master_ver": "0#0,1#0",
>      "mtime": "2016-12-16 10:58:56.174049",
>      "max_marker": "0#00000051181.112203.9,1#00000051078.79616.9",
>      "usage": {
>          "rgw.main": {
>              "size_kb": 194769532,
>              "size_kb_actual": 194856788,
>              "num_objects": 50000
>          }
>      },
>      "bucket_quota": {
>          "enabled": false,
>          "max_size_kb": -1,
>          "max_objects": -1
>      }
> }
>
> We can see that the num_objects is right now. (According to the code ,
> list bucket will send the 'dir_suggest_changes' request to the osd. I
> think this is why the number is right now.)
> If each zone have two rgw with config 'rgw_num_rados_handles=1', the
> difference between the bucket stats is smaller, from 10 to 40.
> If each zone have one rgw with config 'rgw_num_rados_handles=1', the
> bucket stats are same.
> My colleague and I have tested that multi times in two different
> clusters(Ceph version is jewel), and this problem nearly occurs every
> time.
>
Thanks for the extra info, I'll look into this. Could you please open a 
ticket at http://tracker.ceph.com/projects/rgw/issues/new and include 
this output?

> 2016-12-16 0:05 GMT+08:00 Casey Bodley <cbodley@redhat.com>:
>> Hi,
>>
>> On 12/15/2016 02:55 AM, 18896724396 wrote:
>>
>> Hi,
>> We have two RGWs in master zone and two RGWs in slave zone. We use cosbench
>> to upload 50,000 objs to a single bucket. After the data sync is finished,
>> the bucket stats is not the same between master and slave zone.
>>
>> The data sync may take a while with that many objects. How are you verifying
>> that data sync finished? Have you tried 'radosgw-admin bucket sync status
>> --bucket=<name>'?
>>
>> Then we test the same case with one RGW in master zone and slave zone, the
>> stats is also not same. At last we test with one RGW and modify the config
>> rgw_num_rados_handles to 1(we set it 2 before), and this time the stats is
>> same and correct. Though multiple RGWs still have the problem.
>> According to the code, I find that when we update bucket index, rgw will
>> call cls_rgw_bucket_complete_op to update the bucket stats and at last osd
>> will call rgw_bucket_complete_op. In this function, osd first read the
>> bucket header, and then update the stats, last it write the head back. So I
>> think two concurrent request to update the stats may lead to the consistency
>> problem. And maybe some other operation also have the same problem. How
>> could we solve the consistency problem?
>>
>> The osd guarantees that two operations in the same placement group won't run
>> concurrently, so this kind of logic in cls should be safe. How far off are
>> the bucket stats? Can you share some example output?
>>
>>
>> Best regards.
>> Zhang Shaowen
>>
>>
>> Thanks,
>> Casey


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Consistency problem with multiple rgws
  2016-12-21 16:48     ` Casey Bodley
@ 2016-12-23  1:53       ` sw zhang
  0 siblings, 0 replies; 4+ messages in thread
From: sw zhang @ 2016-12-23  1:53 UTC (permalink / raw)
  To: Casey Bodley; +Cc: yehuda, ceph-devel, 郭占东, lvshuhua

My colleague already has opened a ticket, so I add the extra info to it.
http://tracker.ceph.com/issues/18260

2016-12-22 0:48 GMT+08:00 Casey Bodley <cbodley@redhat.com>:
>
> On 12/16/2016 02:36 AM, sw zhang wrote:
>>
>> Hi,
>> I test it again today that each zone has one RGW with config
>> 'rgw_num_rados_handles=2'. I use cosbench to  upload 50,000 object ,
>> each object is 4M,
>> the number of workers is 10.
>> After the data sync is finished(I use the command 'radosgw-admin
>> bucket sync status --bucket=<name>' and 'radosgw-admin sync status' to
>> check that)
>> Below is the bucket stats result:
>>
>> Master zone:
>> [root@ceph36 ~]# radosgw-admin bucket stats --bucket=shard23
>> {
>>      "bucket": "shard23",
>>      "pool": "master.rgw.buckets.data",
>>      "index_pool": "master.rgw.buckets.index",
>>      "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "owner": "zsw-test",
>>      "ver": "0#50039,1#49964",
>>      "master_ver": "0#0,1#0",
>>      "mtime": "2016-12-16 10:58:56.174049",
>>      "max_marker": "0#00000050038.56144.3,1#00000049963.56109.3",
>>      "usage": {
>>          "rgw.main": {
>>              "size_kb": 195300782,
>>              "size_kb_actual": 195388276,
>>              "num_objects": 50000
>>          }
>>      },
>>      "bucket_quota": {
>>          "enabled": false,
>>          "max_size_kb": -1,
>>          "max_objects": -1
>>      }
>> }
>>
>> Slave zone:
>> [root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
>> {
>>      "bucket": "shard23",
>>      "pool": "slave.rgw.buckets.data",
>>      "index_pool": "slave.rgw.buckets.index",
>>      "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "owner": "zsw-test",
>>      "ver": "0#51172,1#51070",
>>      "master_ver": "0#0,1#0",
>>      "mtime": "2016-12-16 10:58:56.174049",
>>      "max_marker": "0#00000051171.112193.3,1#00000051069.79607.3",
>>      "usage": {
>>          "rgw.main": {
>>              "size_kb": 194769532,
>>              "size_kb_actual": 194856788,
>>              "num_objects": 49861
>>          }
>>      },
>>      "bucket_quota": {
>>          "enabled": false,
>>          "max_size_kb": -1,
>>          "max_objects": -1
>>      }
>> }
>>
>> We can see that in slave zone, object number in bucket stats is less
>> than master. But if I use s3cmd to list the bucket in slave zone, the
>> result is right:
>> [root@ceph05 ~]# s3cmd ls s3://shard23 | wc -l
>> 50000
>>
>> And after I list the bucket with s3cmd, I use the bucket stats in
>> slave zone again:
>> [root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
>> {
>>      "bucket": "shard23",
>>      "pool": "slave.rgw.buckets.data",
>>      "index_pool": "slave.rgw.buckets.index",
>>      "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "owner": "zsw-test",
>>      "ver": "0#51182,1#51079",
>>      "master_ver": "0#0,1#0",
>>      "mtime": "2016-12-16 10:58:56.174049",
>>      "max_marker": "0#00000051181.112203.9,1#00000051078.79616.9",
>>      "usage": {
>>          "rgw.main": {
>>              "size_kb": 194769532,
>>              "size_kb_actual": 194856788,
>>              "num_objects": 50000
>>          }
>>      },
>>      "bucket_quota": {
>>          "enabled": false,
>>          "max_size_kb": -1,
>>          "max_objects": -1
>>      }
>> }
>>
>> We can see that the num_objects is right now. (According to the code ,
>> list bucket will send the 'dir_suggest_changes' request to the osd. I
>> think this is why the number is right now.)
>> If each zone have two rgw with config 'rgw_num_rados_handles=1', the
>> difference between the bucket stats is smaller, from 10 to 40.
>> If each zone have one rgw with config 'rgw_num_rados_handles=1', the
>> bucket stats are same.
>> My colleague and I have tested that multi times in two different
>> clusters(Ceph version is jewel), and this problem nearly occurs every
>> time.
>>
> Thanks for the extra info, I'll look into this. Could you please open a
> ticket at http://tracker.ceph.com/projects/rgw/issues/new and include this
> output?
>
>
>> 2016-12-16 0:05 GMT+08:00 Casey Bodley <cbodley@redhat.com>:
>>>
>>> Hi,
>>>
>>> On 12/15/2016 02:55 AM, 18896724396 wrote:
>>>
>>> Hi,
>>> We have two RGWs in master zone and two RGWs in slave zone. We use
>>> cosbench
>>> to upload 50,000 objs to a single bucket. After the data sync is
>>> finished,
>>> the bucket stats is not the same between master and slave zone.
>>>
>>> The data sync may take a while with that many objects. How are you
>>> verifying
>>> that data sync finished? Have you tried 'radosgw-admin bucket sync status
>>> --bucket=<name>'?
>>>
>>> Then we test the same case with one RGW in master zone and slave zone,
>>> the
>>> stats is also not same. At last we test with one RGW and modify the
>>> config
>>> rgw_num_rados_handles to 1(we set it 2 before), and this time the stats
>>> is
>>> same and correct. Though multiple RGWs still have the problem.
>>> According to the code, I find that when we update bucket index, rgw will
>>> call cls_rgw_bucket_complete_op to update the bucket stats and at last
>>> osd
>>> will call rgw_bucket_complete_op. In this function, osd first read the
>>> bucket header, and then update the stats, last it write the head back. So
>>> I
>>> think two concurrent request to update the stats may lead to the
>>> consistency
>>> problem. And maybe some other operation also have the same problem. How
>>> could we solve the consistency problem?
>>>
>>> The osd guarantees that two operations in the same placement group won't
>>> run
>>> concurrently, so this kind of logic in cls should be safe. How far off
>>> are
>>> the bucket stats? Can you share some example output?
>>>
>>>
>>> Best regards.
>>> Zhang Shaowen
>>>
>>>
>>> Thanks,
>>> Casey
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-12-23  1:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2b1658523debcec-00007.Richmail.00098135627749810422@139.com>
     [not found] ` <9193cf3c-3970-56d9-759a-c160626fb27a@redhat.com>
2016-12-15 16:08   ` Consistency problem with multiple rgws Casey Bodley
2016-12-16  7:36   ` sw zhang
2016-12-21 16:48     ` Casey Bodley
2016-12-23  1:53       ` sw zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.