All of lore.kernel.org
 help / color / mirror / Atom feed
* Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
@ 2017-08-23  7:24 Xuehan Xu
  2017-08-23  7:25 ` Xuehan Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Xuehan Xu @ 2017-08-23  7:24 UTC (permalink / raw)
  To: ceph-devel

Hi, everyone.

Recently, we did a test as follows:

We enabled cache tier and added a cache pool "vms_back_cache" on top
of the base pool "vms_back". we first created an object, and then
created a snap in the base pool and writing to that object again,
which would make the object be promoted into the cache pool. At this
time, we used "ceph-objectstore-tool" to dump the object, and the
result is as follows:

{
    "id": {
        "oid": "test.obj.6",
        "key": "",
        "snapid": -2,
        "hash": 750422257,
        "max": 0,
        "pool": 11,
        "namespace": "",
        "max": 0
    },
    "info": {
        "oid": {
            "oid": "test.obj.6",
            "key": "",
            "snapid": -2,
            "hash": 750422257,
            "max": 0,
            "pool": 11,
            "namespace": ""
        },
        "version": "5010'5",
        "prior_version": "4991'3",
        "last_reqid": "client.175338.0:1",
        "user_version": 5,
        "size": 4194303,
        "mtime": "2017-08-23 15:09:03.459892",
        "local_mtime": "2017-08-23 15:09:03.461111",
        "lost": 0,
        "flags": 4,
        "snaps": [],
        "truncate_seq": 0,
        "truncate_size": 0,
        "data_digest": 4294967295,
        "omap_digest": 4294967295,
        "watchers": {}
    },
    "stat": {
        "size": 4194303,
        "blksize": 4096,
        "blocks": 8200,
        "nlink": 1
    },
    "SnapSet": {
        "snap_context": {
            "seq": 13,
            "snaps": [
                13
            ]
        },
        "head_exists": 1,
        "clones": [
            {
                "snap": 13,
                "size": 4194303,
                "overlap": "[0~100,115~4194188]"
            }
        ]
    }
}

Then we did cache-flush and cache-evict to flush that object down to
the base pool, and, again, used "ceph-objectstore-tool" to dump the
object in the base pool:

{
    "id": {
        "oid": "test.obj.6",
        "key": "",
        "snapid": -2,
        "hash": 750422257,
        "max": 0,
        "pool": 10,
        "namespace": "",
        "max": 0
    },
    "info": {
        "oid": {
            "oid": "test.obj.6",
            "key": "",
            "snapid": -2,
            "hash": 750422257,
            "max": 0,
            "pool": 10,
            "namespace": ""
        },
        "version": "5015'4",
        "prior_version": "4991'2",
        "last_reqid": "osd.34.5013:1",
        "user_version": 5,
        "size": 4194303,
        "mtime": "2017-08-23 15:09:03.459892",
        "local_mtime": "2017-08-23 15:10:48.122138",
        "lost": 0,
        "flags": 52,
        "snaps": [],
        "truncate_seq": 0,
        "truncate_size": 0,
        "data_digest": 163942140,
        "omap_digest": 4294967295,
        "watchers": {}
    },
    "stat": {
        "size": 4194303,
        "blksize": 4096,
        "blocks": 8200,
        "nlink": 1
    },
    "SnapSet": {
        "snap_context": {
            "seq": 13,
            "snaps": [
                13
            ]
        },
        "head_exists": 1,
        "clones": [
            {
                "snap": 13,
                "size": 4194303,
                "overlap": "[]"
            }
        ]
    }
}

As is shown, the "overlap" field is empty.
In the osd log, we found the following records:

2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
[0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
attrs
2017-08-23 12:46:36.083021 7f675c704700 15
filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
0~8
2017-08-23 12:46:36.083398 7f675c704700 10
filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
FileStore::read
3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
0~8/8
2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
[0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
data
2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
[0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
bytes in 0 keys 0 reqids
2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
[0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
dropping ondisk_read_lock
2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
[0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
do_osd_op_effects osd.0 con 0x7f67874f0d00
2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
[0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
log_op_stats osd_op(osd.0.6:2 3.92edb2bb
test-rados-api-xxh02v.ops.corp

It seems that, when doing "copy-get", no extensive attributes are
copied. We believe that it's the following code that led to this
result:

int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
        map<string, bufferlist> *out,
        bool user_only) {
    int r = 0;
    if (pool.info.require_rollback()) {
        if (out)
            *out = obc->attr_cache;
    } else {
        r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
    }
    if (out && user_only) {
        map<string, bufferlist> tmp;
        for (map<string, bufferlist>::iterator i = out->begin();
                i != out->end(); ++i) {
            if (i->first.size() > 1 && i->first[0] == '_')
                tmp[i->first.substr(1, i->first.size())].claim(i->second);
        }
        tmp.swap(*out);
    }
    return r;
}

It seems that when "user_only" is true, extensive attributes without a
'_' as the starting character in its name would be filtered out. Is it
supposed to be doing things in this way?
And we found that there are only two places in the source code that
invoked ReplicatedPG::getattrs_maybe_cache, in both of which
"user_only" is true. Why add this parameter?

By the way, we also found that these codes are added in commit
78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
purpose of it.

Thank you:-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-23  7:24 Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool Xuehan Xu
@ 2017-08-23  7:25 ` Xuehan Xu
  2017-08-23  7:40   ` Xuehan Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Xuehan Xu @ 2017-08-23  7:25 UTC (permalink / raw)
  To: ceph-devel

I submitted an issue for this:
http://tracker.ceph.com/issues/21072?next_issue_id=21071

On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> Hi, everyone.
>
> Recently, we did a test as follows:
>
> We enabled cache tier and added a cache pool "vms_back_cache" on top
> of the base pool "vms_back". we first created an object, and then
> created a snap in the base pool and writing to that object again,
> which would make the object be promoted into the cache pool. At this
> time, we used "ceph-objectstore-tool" to dump the object, and the
> result is as follows:
>
> {
>     "id": {
>         "oid": "test.obj.6",
>         "key": "",
>         "snapid": -2,
>         "hash": 750422257,
>         "max": 0,
>         "pool": 11,
>         "namespace": "",
>         "max": 0
>     },
>     "info": {
>         "oid": {
>             "oid": "test.obj.6",
>             "key": "",
>             "snapid": -2,
>             "hash": 750422257,
>             "max": 0,
>             "pool": 11,
>             "namespace": ""
>         },
>         "version": "5010'5",
>         "prior_version": "4991'3",
>         "last_reqid": "client.175338.0:1",
>         "user_version": 5,
>         "size": 4194303,
>         "mtime": "2017-08-23 15:09:03.459892",
>         "local_mtime": "2017-08-23 15:09:03.461111",
>         "lost": 0,
>         "flags": 4,
>         "snaps": [],
>         "truncate_seq": 0,
>         "truncate_size": 0,
>         "data_digest": 4294967295,
>         "omap_digest": 4294967295,
>         "watchers": {}
>     },
>     "stat": {
>         "size": 4194303,
>         "blksize": 4096,
>         "blocks": 8200,
>         "nlink": 1
>     },
>     "SnapSet": {
>         "snap_context": {
>             "seq": 13,
>             "snaps": [
>                 13
>             ]
>         },
>         "head_exists": 1,
>         "clones": [
>             {
>                 "snap": 13,
>                 "size": 4194303,
>                 "overlap": "[0~100,115~4194188]"
>             }
>         ]
>     }
> }
>
> Then we did cache-flush and cache-evict to flush that object down to
> the base pool, and, again, used "ceph-objectstore-tool" to dump the
> object in the base pool:
>
> {
>     "id": {
>         "oid": "test.obj.6",
>         "key": "",
>         "snapid": -2,
>         "hash": 750422257,
>         "max": 0,
>         "pool": 10,
>         "namespace": "",
>         "max": 0
>     },
>     "info": {
>         "oid": {
>             "oid": "test.obj.6",
>             "key": "",
>             "snapid": -2,
>             "hash": 750422257,
>             "max": 0,
>             "pool": 10,
>             "namespace": ""
>         },
>         "version": "5015'4",
>         "prior_version": "4991'2",
>         "last_reqid": "osd.34.5013:1",
>         "user_version": 5,
>         "size": 4194303,
>         "mtime": "2017-08-23 15:09:03.459892",
>         "local_mtime": "2017-08-23 15:10:48.122138",
>         "lost": 0,
>         "flags": 52,
>         "snaps": [],
>         "truncate_seq": 0,
>         "truncate_size": 0,
>         "data_digest": 163942140,
>         "omap_digest": 4294967295,
>         "watchers": {}
>     },
>     "stat": {
>         "size": 4194303,
>         "blksize": 4096,
>         "blocks": 8200,
>         "nlink": 1
>     },
>     "SnapSet": {
>         "snap_context": {
>             "seq": 13,
>             "snaps": [
>                 13
>             ]
>         },
>         "head_exists": 1,
>         "clones": [
>             {
>                 "snap": 13,
>                 "size": 4194303,
>                 "overlap": "[]"
>             }
>         ]
>     }
> }
>
> As is shown, the "overlap" field is empty.
> In the osd log, we found the following records:
>
> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> attrs
> 2017-08-23 12:46:36.083021 7f675c704700 15
> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> 0~8
> 2017-08-23 12:46:36.083398 7f675c704700 10
> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
> FileStore::read
> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> 0~8/8
> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> data
> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
> bytes in 0 keys 0 reqids
> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> dropping ondisk_read_lock
> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> do_osd_op_effects osd.0 con 0x7f67874f0d00
> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
> test-rados-api-xxh02v.ops.corp
>
> It seems that, when doing "copy-get", no extensive attributes are
> copied. We believe that it's the following code that led to this
> result:
>
> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
>         map<string, bufferlist> *out,
>         bool user_only) {
>     int r = 0;
>     if (pool.info.require_rollback()) {
>         if (out)
>             *out = obc->attr_cache;
>     } else {
>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
>     }
>     if (out && user_only) {
>         map<string, bufferlist> tmp;
>         for (map<string, bufferlist>::iterator i = out->begin();
>                 i != out->end(); ++i) {
>             if (i->first.size() > 1 && i->first[0] == '_')
>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
>         }
>         tmp.swap(*out);
>     }
>     return r;
> }
>
> It seems that when "user_only" is true, extensive attributes without a
> '_' as the starting character in its name would be filtered out. Is it
> supposed to be doing things in this way?
> And we found that there are only two places in the source code that
> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
> "user_only" is true. Why add this parameter?
>
> By the way, we also found that these codes are added in commit
> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
> purpose of it.
>
> Thank you:-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-23  7:25 ` Xuehan Xu
@ 2017-08-23  7:40   ` Xuehan Xu
  2017-08-23  9:32     ` Xuehan Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Xuehan Xu @ 2017-08-23  7:40 UTC (permalink / raw)
  To: ceph-devel

It seems that when calling ReplicatedPG::getattrs_maybe_cache in
ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this
right?

On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> I submitted an issue for this:
> http://tracker.ceph.com/issues/21072?next_issue_id=21071
>
> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> Hi, everyone.
>>
>> Recently, we did a test as follows:
>>
>> We enabled cache tier and added a cache pool "vms_back_cache" on top
>> of the base pool "vms_back". we first created an object, and then
>> created a snap in the base pool and writing to that object again,
>> which would make the object be promoted into the cache pool. At this
>> time, we used "ceph-objectstore-tool" to dump the object, and the
>> result is as follows:
>>
>> {
>>     "id": {
>>         "oid": "test.obj.6",
>>         "key": "",
>>         "snapid": -2,
>>         "hash": 750422257,
>>         "max": 0,
>>         "pool": 11,
>>         "namespace": "",
>>         "max": 0
>>     },
>>     "info": {
>>         "oid": {
>>             "oid": "test.obj.6",
>>             "key": "",
>>             "snapid": -2,
>>             "hash": 750422257,
>>             "max": 0,
>>             "pool": 11,
>>             "namespace": ""
>>         },
>>         "version": "5010'5",
>>         "prior_version": "4991'3",
>>         "last_reqid": "client.175338.0:1",
>>         "user_version": 5,
>>         "size": 4194303,
>>         "mtime": "2017-08-23 15:09:03.459892",
>>         "local_mtime": "2017-08-23 15:09:03.461111",
>>         "lost": 0,
>>         "flags": 4,
>>         "snaps": [],
>>         "truncate_seq": 0,
>>         "truncate_size": 0,
>>         "data_digest": 4294967295,
>>         "omap_digest": 4294967295,
>>         "watchers": {}
>>     },
>>     "stat": {
>>         "size": 4194303,
>>         "blksize": 4096,
>>         "blocks": 8200,
>>         "nlink": 1
>>     },
>>     "SnapSet": {
>>         "snap_context": {
>>             "seq": 13,
>>             "snaps": [
>>                 13
>>             ]
>>         },
>>         "head_exists": 1,
>>         "clones": [
>>             {
>>                 "snap": 13,
>>                 "size": 4194303,
>>                 "overlap": "[0~100,115~4194188]"
>>             }
>>         ]
>>     }
>> }
>>
>> Then we did cache-flush and cache-evict to flush that object down to
>> the base pool, and, again, used "ceph-objectstore-tool" to dump the
>> object in the base pool:
>>
>> {
>>     "id": {
>>         "oid": "test.obj.6",
>>         "key": "",
>>         "snapid": -2,
>>         "hash": 750422257,
>>         "max": 0,
>>         "pool": 10,
>>         "namespace": "",
>>         "max": 0
>>     },
>>     "info": {
>>         "oid": {
>>             "oid": "test.obj.6",
>>             "key": "",
>>             "snapid": -2,
>>             "hash": 750422257,
>>             "max": 0,
>>             "pool": 10,
>>             "namespace": ""
>>         },
>>         "version": "5015'4",
>>         "prior_version": "4991'2",
>>         "last_reqid": "osd.34.5013:1",
>>         "user_version": 5,
>>         "size": 4194303,
>>         "mtime": "2017-08-23 15:09:03.459892",
>>         "local_mtime": "2017-08-23 15:10:48.122138",
>>         "lost": 0,
>>         "flags": 52,
>>         "snaps": [],
>>         "truncate_seq": 0,
>>         "truncate_size": 0,
>>         "data_digest": 163942140,
>>         "omap_digest": 4294967295,
>>         "watchers": {}
>>     },
>>     "stat": {
>>         "size": 4194303,
>>         "blksize": 4096,
>>         "blocks": 8200,
>>         "nlink": 1
>>     },
>>     "SnapSet": {
>>         "snap_context": {
>>             "seq": 13,
>>             "snaps": [
>>                 13
>>             ]
>>         },
>>         "head_exists": 1,
>>         "clones": [
>>             {
>>                 "snap": 13,
>>                 "size": 4194303,
>>                 "overlap": "[]"
>>             }
>>         ]
>>     }
>> }
>>
>> As is shown, the "overlap" field is empty.
>> In the osd log, we found the following records:
>>
>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>> attrs
>> 2017-08-23 12:46:36.083021 7f675c704700 15
>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>> 0~8
>> 2017-08-23 12:46:36.083398 7f675c704700 10
>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
>> FileStore::read
>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>> 0~8/8
>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>> data
>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
>> bytes in 0 keys 0 reqids
>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> dropping ondisk_read_lock
>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> do_osd_op_effects osd.0 con 0x7f67874f0d00
>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
>> test-rados-api-xxh02v.ops.corp
>>
>> It seems that, when doing "copy-get", no extensive attributes are
>> copied. We believe that it's the following code that led to this
>> result:
>>
>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
>>         map<string, bufferlist> *out,
>>         bool user_only) {
>>     int r = 0;
>>     if (pool.info.require_rollback()) {
>>         if (out)
>>             *out = obc->attr_cache;
>>     } else {
>>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
>>     }
>>     if (out && user_only) {
>>         map<string, bufferlist> tmp;
>>         for (map<string, bufferlist>::iterator i = out->begin();
>>                 i != out->end(); ++i) {
>>             if (i->first.size() > 1 && i->first[0] == '_')
>>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
>>         }
>>         tmp.swap(*out);
>>     }
>>     return r;
>> }
>>
>> It seems that when "user_only" is true, extensive attributes without a
>> '_' as the starting character in its name would be filtered out. Is it
>> supposed to be doing things in this way?
>> And we found that there are only two places in the source code that
>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
>> "user_only" is true. Why add this parameter?
>>
>> By the way, we also found that these codes are added in commit
>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
>> purpose of it.
>>
>> Thank you:-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-23  7:40   ` Xuehan Xu
@ 2017-08-23  9:32     ` Xuehan Xu
  2017-08-23 16:07       ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Xuehan Xu @ 2017-08-23  9:32 UTC (permalink / raw)
  To: ceph-devel

Hi, everyone.

In ReplicatedPG::_write_copy_chunk method, I saw the following code:

if (!cop->temp_cursor.attr_complete) {
    t->touch(cop->results.temp_oid);
    for (map<string, bufferlist>::iterator p = cop->attrs.begin();
    p != cop->attrs.end(); ++p) {
    cop->results.attrs[string("_") + p->first] = p->second;
    t->setattr(cop->results.temp_oid, string("_") + p->first,
       p->second);
}
    cop->attrs.clear();
}

It seems that user specified attrs are prefixed by "_", but why add
"_" here, in ReplicatedPG::_write_copy_chunk? It seems that this
method is used for copying objects in the RADOS cluster,.

On 23 August 2017 at 15:40, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> It seems that when calling ReplicatedPG::getattrs_maybe_cache in
> ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this
> right?
>
> On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> I submitted an issue for this:
>> http://tracker.ceph.com/issues/21072?next_issue_id=21071
>>
>> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>>> Hi, everyone.
>>>
>>> Recently, we did a test as follows:
>>>
>>> We enabled cache tier and added a cache pool "vms_back_cache" on top
>>> of the base pool "vms_back". we first created an object, and then
>>> created a snap in the base pool and writing to that object again,
>>> which would make the object be promoted into the cache pool. At this
>>> time, we used "ceph-objectstore-tool" to dump the object, and the
>>> result is as follows:
>>>
>>> {
>>>     "id": {
>>>         "oid": "test.obj.6",
>>>         "key": "",
>>>         "snapid": -2,
>>>         "hash": 750422257,
>>>         "max": 0,
>>>         "pool": 11,
>>>         "namespace": "",
>>>         "max": 0
>>>     },
>>>     "info": {
>>>         "oid": {
>>>             "oid": "test.obj.6",
>>>             "key": "",
>>>             "snapid": -2,
>>>             "hash": 750422257,
>>>             "max": 0,
>>>             "pool": 11,
>>>             "namespace": ""
>>>         },
>>>         "version": "5010'5",
>>>         "prior_version": "4991'3",
>>>         "last_reqid": "client.175338.0:1",
>>>         "user_version": 5,
>>>         "size": 4194303,
>>>         "mtime": "2017-08-23 15:09:03.459892",
>>>         "local_mtime": "2017-08-23 15:09:03.461111",
>>>         "lost": 0,
>>>         "flags": 4,
>>>         "snaps": [],
>>>         "truncate_seq": 0,
>>>         "truncate_size": 0,
>>>         "data_digest": 4294967295,
>>>         "omap_digest": 4294967295,
>>>         "watchers": {}
>>>     },
>>>     "stat": {
>>>         "size": 4194303,
>>>         "blksize": 4096,
>>>         "blocks": 8200,
>>>         "nlink": 1
>>>     },
>>>     "SnapSet": {
>>>         "snap_context": {
>>>             "seq": 13,
>>>             "snaps": [
>>>                 13
>>>             ]
>>>         },
>>>         "head_exists": 1,
>>>         "clones": [
>>>             {
>>>                 "snap": 13,
>>>                 "size": 4194303,
>>>                 "overlap": "[0~100,115~4194188]"
>>>             }
>>>         ]
>>>     }
>>> }
>>>
>>> Then we did cache-flush and cache-evict to flush that object down to
>>> the base pool, and, again, used "ceph-objectstore-tool" to dump the
>>> object in the base pool:
>>>
>>> {
>>>     "id": {
>>>         "oid": "test.obj.6",
>>>         "key": "",
>>>         "snapid": -2,
>>>         "hash": 750422257,
>>>         "max": 0,
>>>         "pool": 10,
>>>         "namespace": "",
>>>         "max": 0
>>>     },
>>>     "info": {
>>>         "oid": {
>>>             "oid": "test.obj.6",
>>>             "key": "",
>>>             "snapid": -2,
>>>             "hash": 750422257,
>>>             "max": 0,
>>>             "pool": 10,
>>>             "namespace": ""
>>>         },
>>>         "version": "5015'4",
>>>         "prior_version": "4991'2",
>>>         "last_reqid": "osd.34.5013:1",
>>>         "user_version": 5,
>>>         "size": 4194303,
>>>         "mtime": "2017-08-23 15:09:03.459892",
>>>         "local_mtime": "2017-08-23 15:10:48.122138",
>>>         "lost": 0,
>>>         "flags": 52,
>>>         "snaps": [],
>>>         "truncate_seq": 0,
>>>         "truncate_size": 0,
>>>         "data_digest": 163942140,
>>>         "omap_digest": 4294967295,
>>>         "watchers": {}
>>>     },
>>>     "stat": {
>>>         "size": 4194303,
>>>         "blksize": 4096,
>>>         "blocks": 8200,
>>>         "nlink": 1
>>>     },
>>>     "SnapSet": {
>>>         "snap_context": {
>>>             "seq": 13,
>>>             "snaps": [
>>>                 13
>>>             ]
>>>         },
>>>         "head_exists": 1,
>>>         "clones": [
>>>             {
>>>                 "snap": 13,
>>>                 "size": 4194303,
>>>                 "overlap": "[]"
>>>             }
>>>         ]
>>>     }
>>> }
>>>
>>> As is shown, the "overlap" field is empty.
>>> In the osd log, we found the following records:
>>>
>>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>>> attrs
>>> 2017-08-23 12:46:36.083021 7f675c704700 15
>>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
>>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>>> 0~8
>>> 2017-08-23 12:46:36.083398 7f675c704700 10
>>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
>>> FileStore::read
>>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>>> 0~8/8
>>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>>> data
>>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
>>> bytes in 0 keys 0 reqids
>>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
>>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>>> dropping ondisk_read_lock
>>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>>> do_osd_op_effects osd.0 con 0x7f67874f0d00
>>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
>>> test-rados-api-xxh02v.ops.corp
>>>
>>> It seems that, when doing "copy-get", no extensive attributes are
>>> copied. We believe that it's the following code that led to this
>>> result:
>>>
>>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
>>>         map<string, bufferlist> *out,
>>>         bool user_only) {
>>>     int r = 0;
>>>     if (pool.info.require_rollback()) {
>>>         if (out)
>>>             *out = obc->attr_cache;
>>>     } else {
>>>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
>>>     }
>>>     if (out && user_only) {
>>>         map<string, bufferlist> tmp;
>>>         for (map<string, bufferlist>::iterator i = out->begin();
>>>                 i != out->end(); ++i) {
>>>             if (i->first.size() > 1 && i->first[0] == '_')
>>>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
>>>         }
>>>         tmp.swap(*out);
>>>     }
>>>     return r;
>>> }
>>>
>>> It seems that when "user_only" is true, extensive attributes without a
>>> '_' as the starting character in its name would be filtered out. Is it
>>> supposed to be doing things in this way?
>>> And we found that there are only two places in the source code that
>>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
>>> "user_only" is true. Why add this parameter?
>>>
>>> By the way, we also found that these codes are added in commit
>>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
>>> purpose of it.
>>>
>>> Thank you:-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-23  9:32     ` Xuehan Xu
@ 2017-08-23 16:07       ` Sage Weil
  2017-08-23 16:41         ` Xuehan Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2017-08-23 16:07 UTC (permalink / raw)
  To: Xuehan Xu; +Cc: ceph-devel

On Wed, 23 Aug 2017, Xuehan Xu wrote:
> Hi, everyone.
> 
> In ReplicatedPG::_write_copy_chunk method, I saw the following code:
> 
> if (!cop->temp_cursor.attr_complete) {
>     t->touch(cop->results.temp_oid);
>     for (map<string, bufferlist>::iterator p = cop->attrs.begin();
>     p != cop->attrs.end(); ++p) {
>     cop->results.attrs[string("_") + p->first] = p->second;
>     t->setattr(cop->results.temp_oid, string("_") + p->first,
>        p->second);
> }
>     cop->attrs.clear();
> }
> 
> It seems that user specified attrs are prefixed by "_", but why add
> "_" here, in ReplicatedPG::_write_copy_chunk? It seems that this
> method is used for copying objects in the RADOS cluster,.

That is by design.  The non-_ attributes are managed locally.

In this case, the clone_overlap is not preserved because there is no code 
to attempt to preserve it across the cache tier.  (And blindly copying the 
_ attribute wouldn't work.)

I'm not sure what the issues would be with passing that metadata across 
the promote and flush operations...  I imagine it will be a bit fragile, 
though.

Is this something that is really worth fixing?

sage


> 
> On 23 August 2017 at 15:40, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> > It seems that when calling ReplicatedPG::getattrs_maybe_cache in
> > ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this
> > right?
> >
> > On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> I submitted an issue for this:
> >> http://tracker.ceph.com/issues/21072?next_issue_id=21071
> >>
> >> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >>> Hi, everyone.
> >>>
> >>> Recently, we did a test as follows:
> >>>
> >>> We enabled cache tier and added a cache pool "vms_back_cache" on top
> >>> of the base pool "vms_back". we first created an object, and then
> >>> created a snap in the base pool and writing to that object again,
> >>> which would make the object be promoted into the cache pool. At this
> >>> time, we used "ceph-objectstore-tool" to dump the object, and the
> >>> result is as follows:
> >>>
> >>> {
> >>>     "id": {
> >>>         "oid": "test.obj.6",
> >>>         "key": "",
> >>>         "snapid": -2,
> >>>         "hash": 750422257,
> >>>         "max": 0,
> >>>         "pool": 11,
> >>>         "namespace": "",
> >>>         "max": 0
> >>>     },
> >>>     "info": {
> >>>         "oid": {
> >>>             "oid": "test.obj.6",
> >>>             "key": "",
> >>>             "snapid": -2,
> >>>             "hash": 750422257,
> >>>             "max": 0,
> >>>             "pool": 11,
> >>>             "namespace": ""
> >>>         },
> >>>         "version": "5010'5",
> >>>         "prior_version": "4991'3",
> >>>         "last_reqid": "client.175338.0:1",
> >>>         "user_version": 5,
> >>>         "size": 4194303,
> >>>         "mtime": "2017-08-23 15:09:03.459892",
> >>>         "local_mtime": "2017-08-23 15:09:03.461111",
> >>>         "lost": 0,
> >>>         "flags": 4,
> >>>         "snaps": [],
> >>>         "truncate_seq": 0,
> >>>         "truncate_size": 0,
> >>>         "data_digest": 4294967295,
> >>>         "omap_digest": 4294967295,
> >>>         "watchers": {}
> >>>     },
> >>>     "stat": {
> >>>         "size": 4194303,
> >>>         "blksize": 4096,
> >>>         "blocks": 8200,
> >>>         "nlink": 1
> >>>     },
> >>>     "SnapSet": {
> >>>         "snap_context": {
> >>>             "seq": 13,
> >>>             "snaps": [
> >>>                 13
> >>>             ]
> >>>         },
> >>>         "head_exists": 1,
> >>>         "clones": [
> >>>             {
> >>>                 "snap": 13,
> >>>                 "size": 4194303,
> >>>                 "overlap": "[0~100,115~4194188]"
> >>>             }
> >>>         ]
> >>>     }
> >>> }
> >>>
> >>> Then we did cache-flush and cache-evict to flush that object down to
> >>> the base pool, and, again, used "ceph-objectstore-tool" to dump the
> >>> object in the base pool:
> >>>
> >>> {
> >>>     "id": {
> >>>         "oid": "test.obj.6",
> >>>         "key": "",
> >>>         "snapid": -2,
> >>>         "hash": 750422257,
> >>>         "max": 0,
> >>>         "pool": 10,
> >>>         "namespace": "",
> >>>         "max": 0
> >>>     },
> >>>     "info": {
> >>>         "oid": {
> >>>             "oid": "test.obj.6",
> >>>             "key": "",
> >>>             "snapid": -2,
> >>>             "hash": 750422257,
> >>>             "max": 0,
> >>>             "pool": 10,
> >>>             "namespace": ""
> >>>         },
> >>>         "version": "5015'4",
> >>>         "prior_version": "4991'2",
> >>>         "last_reqid": "osd.34.5013:1",
> >>>         "user_version": 5,
> >>>         "size": 4194303,
> >>>         "mtime": "2017-08-23 15:09:03.459892",
> >>>         "local_mtime": "2017-08-23 15:10:48.122138",
> >>>         "lost": 0,
> >>>         "flags": 52,
> >>>         "snaps": [],
> >>>         "truncate_seq": 0,
> >>>         "truncate_size": 0,
> >>>         "data_digest": 163942140,
> >>>         "omap_digest": 4294967295,
> >>>         "watchers": {}
> >>>     },
> >>>     "stat": {
> >>>         "size": 4194303,
> >>>         "blksize": 4096,
> >>>         "blocks": 8200,
> >>>         "nlink": 1
> >>>     },
> >>>     "SnapSet": {
> >>>         "snap_context": {
> >>>             "seq": 13,
> >>>             "snaps": [
> >>>                 13
> >>>             ]
> >>>         },
> >>>         "head_exists": 1,
> >>>         "clones": [
> >>>             {
> >>>                 "snap": 13,
> >>>                 "size": 4194303,
> >>>                 "overlap": "[]"
> >>>             }
> >>>         ]
> >>>     }
> >>> }
> >>>
> >>> As is shown, the "overlap" field is empty.
> >>> In the osd log, we found the following records:
> >>>
> >>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> >>> attrs
> >>> 2017-08-23 12:46:36.083021 7f675c704700 15
> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> >>> 0~8
> >>> 2017-08-23 12:46:36.083398 7f675c704700 10
> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
> >>> FileStore::read
> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> >>> 0~8/8
> >>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> >>> data
> >>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
> >>> bytes in 0 keys 0 reqids
> >>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >>> dropping ondisk_read_lock
> >>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >>> do_osd_op_effects osd.0 con 0x7f67874f0d00
> >>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
> >>> test-rados-api-xxh02v.ops.corp
> >>>
> >>> It seems that, when doing "copy-get", no extensive attributes are
> >>> copied. We believe that it's the following code that led to this
> >>> result:
> >>>
> >>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
> >>>         map<string, bufferlist> *out,
> >>>         bool user_only) {
> >>>     int r = 0;
> >>>     if (pool.info.require_rollback()) {
> >>>         if (out)
> >>>             *out = obc->attr_cache;
> >>>     } else {
> >>>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
> >>>     }
> >>>     if (out && user_only) {
> >>>         map<string, bufferlist> tmp;
> >>>         for (map<string, bufferlist>::iterator i = out->begin();
> >>>                 i != out->end(); ++i) {
> >>>             if (i->first.size() > 1 && i->first[0] == '_')
> >>>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
> >>>         }
> >>>         tmp.swap(*out);
> >>>     }
> >>>     return r;
> >>> }
> >>>
> >>> It seems that when "user_only" is true, extensive attributes without a
> >>> '_' as the starting character in its name would be filtered out. Is it
> >>> supposed to be doing things in this way?
> >>> And we found that there are only two places in the source code that
> >>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
> >>> "user_only" is true. Why add this parameter?
> >>>
> >>> By the way, we also found that these codes are added in commit
> >>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
> >>> purpose of it.
> >>>
> >>> Thank you:-)
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-23 16:07       ` Sage Weil
@ 2017-08-23 16:41         ` Xuehan Xu
  2017-08-23 16:54           ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Xuehan Xu @ 2017-08-23 16:41 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi, sage.

Thanks for your reply:-)

Actually, our problem is as follows: we are trying to backup the rbd
data in a ceph cluster which is cache-tier enabled using "rbd
export-diff", however, due to the missing of some clone overlap, the
export-diff operation can't be right.

After debugging, we found that there are two reasons for this missing,
one of which is as mentioned in this thread:
http://tracker.ceph.com/issues/21072, which would led to the overlap
recorded in the extensive attrs not to be "copied" to the base tier.
And when we try to fix it, we found that the "_" policy is in our way,
because the clone overlap is stored in "snapset" attr which has a non
"_" prefixed name, and even if we modify the code to ignore the "_"
policy when getting the extensive attrs, a "_" will be added when
storing them into the base tier, which would make the following
"list-snaps" operation not able to find the clone overlap.

And the other reason that made the export-diff operate incorrectly is
as described in http://tracker.ceph.com/issues/20896. To fix this, we
are planning to add a "cache_clone_overlap" in SnapSet to track the
cache stats while leaving the orignal "clone_overlap" to track
modifications to the object. Is this the right way to go?

By the way, last month, when trying to fix the bug:
http://tracker.ceph.com/issues/17445, you asked for a test case in
master branch to confirm that master doesn't have the same bug. I've
submitted that test case in the PR:
https://github.com/ceph/ceph/pull/17017 and cherry-pick it into PR:
https://github.com/ceph/ceph/pull/16355. I'm very sorry for taking so
long, I was very busy during the past month trying to solve the above
problems. Could you help me review it when you are free, please?

Thanks very much:-)

On 24 August 2017 at 00:07, Sage Weil <sage@newdream.net> wrote:
> On Wed, 23 Aug 2017, Xuehan Xu wrote:
>> Hi, everyone.
>>
>> In ReplicatedPG::_write_copy_chunk method, I saw the following code:
>>
>> if (!cop->temp_cursor.attr_complete) {
>>     t->touch(cop->results.temp_oid);
>>     for (map<string, bufferlist>::iterator p = cop->attrs.begin();
>>     p != cop->attrs.end(); ++p) {
>>     cop->results.attrs[string("_") + p->first] = p->second;
>>     t->setattr(cop->results.temp_oid, string("_") + p->first,
>>        p->second);
>> }
>>     cop->attrs.clear();
>> }
>>
>> It seems that user specified attrs are prefixed by "_", but why add
>> "_" here, in ReplicatedPG::_write_copy_chunk? It seems that this
>> method is used for copying objects in the RADOS cluster,.
>
> That is by design.  The non-_ attributes are managed locally.
>
> In this case, the clone_overlap is not preserved because there is no code
> to attempt to preserve it across the cache tier.  (And blindly copying the
> _ attribute wouldn't work.)
>
> I'm not sure what the issues would be with passing that metadata across
> the promote and flush operations...  I imagine it will be a bit fragile,
> though.
>
> Is this something that is really worth fixing?
>
> sage
>
>
>>
>> On 23 August 2017 at 15:40, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> > It seems that when calling ReplicatedPG::getattrs_maybe_cache in
>> > ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this
>> > right?
>> >
>> > On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> >> I submitted an issue for this:
>> >> http://tracker.ceph.com/issues/21072?next_issue_id=21071
>> >>
>> >> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> >>> Hi, everyone.
>> >>>
>> >>> Recently, we did a test as follows:
>> >>>
>> >>> We enabled cache tier and added a cache pool "vms_back_cache" on top
>> >>> of the base pool "vms_back". we first created an object, and then
>> >>> created a snap in the base pool and writing to that object again,
>> >>> which would make the object be promoted into the cache pool. At this
>> >>> time, we used "ceph-objectstore-tool" to dump the object, and the
>> >>> result is as follows:
>> >>>
>> >>> {
>> >>>     "id": {
>> >>>         "oid": "test.obj.6",
>> >>>         "key": "",
>> >>>         "snapid": -2,
>> >>>         "hash": 750422257,
>> >>>         "max": 0,
>> >>>         "pool": 11,
>> >>>         "namespace": "",
>> >>>         "max": 0
>> >>>     },
>> >>>     "info": {
>> >>>         "oid": {
>> >>>             "oid": "test.obj.6",
>> >>>             "key": "",
>> >>>             "snapid": -2,
>> >>>             "hash": 750422257,
>> >>>             "max": 0,
>> >>>             "pool": 11,
>> >>>             "namespace": ""
>> >>>         },
>> >>>         "version": "5010'5",
>> >>>         "prior_version": "4991'3",
>> >>>         "last_reqid": "client.175338.0:1",
>> >>>         "user_version": 5,
>> >>>         "size": 4194303,
>> >>>         "mtime": "2017-08-23 15:09:03.459892",
>> >>>         "local_mtime": "2017-08-23 15:09:03.461111",
>> >>>         "lost": 0,
>> >>>         "flags": 4,
>> >>>         "snaps": [],
>> >>>         "truncate_seq": 0,
>> >>>         "truncate_size": 0,
>> >>>         "data_digest": 4294967295,
>> >>>         "omap_digest": 4294967295,
>> >>>         "watchers": {}
>> >>>     },
>> >>>     "stat": {
>> >>>         "size": 4194303,
>> >>>         "blksize": 4096,
>> >>>         "blocks": 8200,
>> >>>         "nlink": 1
>> >>>     },
>> >>>     "SnapSet": {
>> >>>         "snap_context": {
>> >>>             "seq": 13,
>> >>>             "snaps": [
>> >>>                 13
>> >>>             ]
>> >>>         },
>> >>>         "head_exists": 1,
>> >>>         "clones": [
>> >>>             {
>> >>>                 "snap": 13,
>> >>>                 "size": 4194303,
>> >>>                 "overlap": "[0~100,115~4194188]"
>> >>>             }
>> >>>         ]
>> >>>     }
>> >>> }
>> >>>
>> >>> Then we did cache-flush and cache-evict to flush that object down to
>> >>> the base pool, and, again, used "ceph-objectstore-tool" to dump the
>> >>> object in the base pool:
>> >>>
>> >>> {
>> >>>     "id": {
>> >>>         "oid": "test.obj.6",
>> >>>         "key": "",
>> >>>         "snapid": -2,
>> >>>         "hash": 750422257,
>> >>>         "max": 0,
>> >>>         "pool": 10,
>> >>>         "namespace": "",
>> >>>         "max": 0
>> >>>     },
>> >>>     "info": {
>> >>>         "oid": {
>> >>>             "oid": "test.obj.6",
>> >>>             "key": "",
>> >>>             "snapid": -2,
>> >>>             "hash": 750422257,
>> >>>             "max": 0,
>> >>>             "pool": 10,
>> >>>             "namespace": ""
>> >>>         },
>> >>>         "version": "5015'4",
>> >>>         "prior_version": "4991'2",
>> >>>         "last_reqid": "osd.34.5013:1",
>> >>>         "user_version": 5,
>> >>>         "size": 4194303,
>> >>>         "mtime": "2017-08-23 15:09:03.459892",
>> >>>         "local_mtime": "2017-08-23 15:10:48.122138",
>> >>>         "lost": 0,
>> >>>         "flags": 52,
>> >>>         "snaps": [],
>> >>>         "truncate_seq": 0,
>> >>>         "truncate_size": 0,
>> >>>         "data_digest": 163942140,
>> >>>         "omap_digest": 4294967295,
>> >>>         "watchers": {}
>> >>>     },
>> >>>     "stat": {
>> >>>         "size": 4194303,
>> >>>         "blksize": 4096,
>> >>>         "blocks": 8200,
>> >>>         "nlink": 1
>> >>>     },
>> >>>     "SnapSet": {
>> >>>         "snap_context": {
>> >>>             "seq": 13,
>> >>>             "snaps": [
>> >>>                 13
>> >>>             ]
>> >>>         },
>> >>>         "head_exists": 1,
>> >>>         "clones": [
>> >>>             {
>> >>>                 "snap": 13,
>> >>>                 "size": 4194303,
>> >>>                 "overlap": "[]"
>> >>>             }
>> >>>         ]
>> >>>     }
>> >>> }
>> >>>
>> >>> As is shown, the "overlap" field is empty.
>> >>> In the osd log, we found the following records:
>> >>>
>> >>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>> >>> attrs
>> >>> 2017-08-23 12:46:36.083021 7f675c704700 15
>> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
>> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>> >>> 0~8
>> >>> 2017-08-23 12:46:36.083398 7f675c704700 10
>> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
>> >>> FileStore::read
>> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>> >>> 0~8/8
>> >>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>> >>> data
>> >>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
>> >>> bytes in 0 keys 0 reqids
>> >>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
>> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >>> dropping ondisk_read_lock
>> >>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >>> do_osd_op_effects osd.0 con 0x7f67874f0d00
>> >>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
>> >>> test-rados-api-xxh02v.ops.corp
>> >>>
>> >>> It seems that, when doing "copy-get", no extensive attributes are
>> >>> copied. We believe that it's the following code that led to this
>> >>> result:
>> >>>
>> >>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
>> >>>         map<string, bufferlist> *out,
>> >>>         bool user_only) {
>> >>>     int r = 0;
>> >>>     if (pool.info.require_rollback()) {
>> >>>         if (out)
>> >>>             *out = obc->attr_cache;
>> >>>     } else {
>> >>>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
>> >>>     }
>> >>>     if (out && user_only) {
>> >>>         map<string, bufferlist> tmp;
>> >>>         for (map<string, bufferlist>::iterator i = out->begin();
>> >>>                 i != out->end(); ++i) {
>> >>>             if (i->first.size() > 1 && i->first[0] == '_')
>> >>>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
>> >>>         }
>> >>>         tmp.swap(*out);
>> >>>     }
>> >>>     return r;
>> >>> }
>> >>>
>> >>> It seems that when "user_only" is true, extensive attributes without a
>> >>> '_' as the starting character in its name would be filtered out. Is it
>> >>> supposed to be doing things in this way?
>> >>> And we found that there are only two places in the source code that
>> >>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
>> >>> "user_only" is true. Why add this parameter?
>> >>>
>> >>> By the way, we also found that these codes are added in commit
>> >>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
>> >>> purpose of it.
>> >>>
>> >>> Thank you:-)
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-23 16:41         ` Xuehan Xu
@ 2017-08-23 16:54           ` Sage Weil
  2017-08-24  7:47             ` Xuehan Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2017-08-23 16:54 UTC (permalink / raw)
  To: Xuehan Xu; +Cc: ceph-devel

On Thu, 24 Aug 2017, Xuehan Xu wrote:
> Hi, sage.
> 
> Thanks for your reply:-)
> 
> Actually, our problem is as follows: we are trying to backup the rbd
> data in a ceph cluster which is cache-tier enabled using "rbd
> export-diff", however, due to the missing of some clone overlap, the
> export-diff operation can't be right.
> 
> After debugging, we found that there are two reasons for this missing,
> one of which is as mentioned in this thread:
> http://tracker.ceph.com/issues/21072, which would led to the overlap
> recorded in the extensive attrs not to be "copied" to the base tier.
> And when we try to fix it, we found that the "_" policy is in our way,
> because the clone overlap is stored in "snapset" attr which has a non
> "_" prefixed name, and even if we modify the code to ignore the "_"
> policy when getting the extensive attrs, a "_" will be added when
> storing them into the base tier, which would make the following
> "list-snaps" operation not able to find the clone overlap.

Okay.  The fix will need to explicitly copy the overlap metadata as part 
of copy-get, though... not the _ attr(s).  Those are managed locally in 
the pool and not meant to be copied across pools (you'll get all sorts of 
bugs if you try to do that!).
 
> And the other reason that made the export-diff operate incorrectly is
> as described in http://tracker.ceph.com/issues/20896. To fix this, we
> are planning to add a "cache_clone_overlap" in SnapSet to track the
> cache stats while leaving the orignal "clone_overlap" to track
> modifications to the object. Is this the right way to go?

I see.  I think that would work.  It might make sense to take a fresh look 
at what the cache stats are, though, to see if it is possible to get rid 
of that condition and avoiding maintaining a 
duplicate-but-slightly-different structure just for the cache tier.  I 
don't like the conditional behavior there... it really seems like 
clone_overlap should not vary between the cache and base tiers, and any 
adjustments for missing clones should happen only with the stats 
calculation but not with clone_overlap itself.


> By the way, last month, when trying to fix the bug:
> http://tracker.ceph.com/issues/17445, you asked for a test case in
> master branch to confirm that master doesn't have the same bug. I've
> submitted that test case in the PR:
> https://github.com/ceph/ceph/pull/17017 and cherry-pick it into PR:
> https://github.com/ceph/ceph/pull/16355. I'm very sorry for taking so
> long, I was very busy during the past month trying to solve the above
> problems. Could you help me review it when you are free, please?

Thanks!
sage


> 
> Thanks very much:-)
> 
> On 24 August 2017 at 00:07, Sage Weil <sage@newdream.net> wrote:
> > On Wed, 23 Aug 2017, Xuehan Xu wrote:
> >> Hi, everyone.
> >>
> >> In ReplicatedPG::_write_copy_chunk method, I saw the following code:
> >>
> >> if (!cop->temp_cursor.attr_complete) {
> >>     t->touch(cop->results.temp_oid);
> >>     for (map<string, bufferlist>::iterator p = cop->attrs.begin();
> >>     p != cop->attrs.end(); ++p) {
> >>     cop->results.attrs[string("_") + p->first] = p->second;
> >>     t->setattr(cop->results.temp_oid, string("_") + p->first,
> >>        p->second);
> >> }
> >>     cop->attrs.clear();
> >> }
> >>
> >> It seems that user specified attrs are prefixed by "_", but why add
> >> "_" here, in ReplicatedPG::_write_copy_chunk? It seems that this
> >> method is used for copying objects in the RADOS cluster,.
> >
> > That is by design.  The non-_ attributes are managed locally.
> >
> > In this case, the clone_overlap is not preserved because there is no code
> > to attempt to preserve it across the cache tier.  (And blindly copying the
> > _ attribute wouldn't work.)
> >
> > I'm not sure what the issues would be with passing that metadata across
> > the promote and flush operations...  I imagine it will be a bit fragile,
> > though.
> >
> > Is this something that is really worth fixing?
> >
> > sage
> >
> >
> >>
> >> On 23 August 2017 at 15:40, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> > It seems that when calling ReplicatedPG::getattrs_maybe_cache in
> >> > ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this
> >> > right?
> >> >
> >> > On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> >> I submitted an issue for this:
> >> >> http://tracker.ceph.com/issues/21072?next_issue_id=21071
> >> >>
> >> >> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> >>> Hi, everyone.
> >> >>>
> >> >>> Recently, we did a test as follows:
> >> >>>
> >> >>> We enabled cache tier and added a cache pool "vms_back_cache" on top
> >> >>> of the base pool "vms_back". we first created an object, and then
> >> >>> created a snap in the base pool and writing to that object again,
> >> >>> which would make the object be promoted into the cache pool. At this
> >> >>> time, we used "ceph-objectstore-tool" to dump the object, and the
> >> >>> result is as follows:
> >> >>>
> >> >>> {
> >> >>>     "id": {
> >> >>>         "oid": "test.obj.6",
> >> >>>         "key": "",
> >> >>>         "snapid": -2,
> >> >>>         "hash": 750422257,
> >> >>>         "max": 0,
> >> >>>         "pool": 11,
> >> >>>         "namespace": "",
> >> >>>         "max": 0
> >> >>>     },
> >> >>>     "info": {
> >> >>>         "oid": {
> >> >>>             "oid": "test.obj.6",
> >> >>>             "key": "",
> >> >>>             "snapid": -2,
> >> >>>             "hash": 750422257,
> >> >>>             "max": 0,
> >> >>>             "pool": 11,
> >> >>>             "namespace": ""
> >> >>>         },
> >> >>>         "version": "5010'5",
> >> >>>         "prior_version": "4991'3",
> >> >>>         "last_reqid": "client.175338.0:1",
> >> >>>         "user_version": 5,
> >> >>>         "size": 4194303,
> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
> >> >>>         "local_mtime": "2017-08-23 15:09:03.461111",
> >> >>>         "lost": 0,
> >> >>>         "flags": 4,
> >> >>>         "snaps": [],
> >> >>>         "truncate_seq": 0,
> >> >>>         "truncate_size": 0,
> >> >>>         "data_digest": 4294967295,
> >> >>>         "omap_digest": 4294967295,
> >> >>>         "watchers": {}
> >> >>>     },
> >> >>>     "stat": {
> >> >>>         "size": 4194303,
> >> >>>         "blksize": 4096,
> >> >>>         "blocks": 8200,
> >> >>>         "nlink": 1
> >> >>>     },
> >> >>>     "SnapSet": {
> >> >>>         "snap_context": {
> >> >>>             "seq": 13,
> >> >>>             "snaps": [
> >> >>>                 13
> >> >>>             ]
> >> >>>         },
> >> >>>         "head_exists": 1,
> >> >>>         "clones": [
> >> >>>             {
> >> >>>                 "snap": 13,
> >> >>>                 "size": 4194303,
> >> >>>                 "overlap": "[0~100,115~4194188]"
> >> >>>             }
> >> >>>         ]
> >> >>>     }
> >> >>> }
> >> >>>
> >> >>> Then we did cache-flush and cache-evict to flush that object down to
> >> >>> the base pool, and, again, used "ceph-objectstore-tool" to dump the
> >> >>> object in the base pool:
> >> >>>
> >> >>> {
> >> >>>     "id": {
> >> >>>         "oid": "test.obj.6",
> >> >>>         "key": "",
> >> >>>         "snapid": -2,
> >> >>>         "hash": 750422257,
> >> >>>         "max": 0,
> >> >>>         "pool": 10,
> >> >>>         "namespace": "",
> >> >>>         "max": 0
> >> >>>     },
> >> >>>     "info": {
> >> >>>         "oid": {
> >> >>>             "oid": "test.obj.6",
> >> >>>             "key": "",
> >> >>>             "snapid": -2,
> >> >>>             "hash": 750422257,
> >> >>>             "max": 0,
> >> >>>             "pool": 10,
> >> >>>             "namespace": ""
> >> >>>         },
> >> >>>         "version": "5015'4",
> >> >>>         "prior_version": "4991'2",
> >> >>>         "last_reqid": "osd.34.5013:1",
> >> >>>         "user_version": 5,
> >> >>>         "size": 4194303,
> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
> >> >>>         "local_mtime": "2017-08-23 15:10:48.122138",
> >> >>>         "lost": 0,
> >> >>>         "flags": 52,
> >> >>>         "snaps": [],
> >> >>>         "truncate_seq": 0,
> >> >>>         "truncate_size": 0,
> >> >>>         "data_digest": 163942140,
> >> >>>         "omap_digest": 4294967295,
> >> >>>         "watchers": {}
> >> >>>     },
> >> >>>     "stat": {
> >> >>>         "size": 4194303,
> >> >>>         "blksize": 4096,
> >> >>>         "blocks": 8200,
> >> >>>         "nlink": 1
> >> >>>     },
> >> >>>     "SnapSet": {
> >> >>>         "snap_context": {
> >> >>>             "seq": 13,
> >> >>>             "snaps": [
> >> >>>                 13
> >> >>>             ]
> >> >>>         },
> >> >>>         "head_exists": 1,
> >> >>>         "clones": [
> >> >>>             {
> >> >>>                 "snap": 13,
> >> >>>                 "size": 4194303,
> >> >>>                 "overlap": "[]"
> >> >>>             }
> >> >>>         ]
> >> >>>     }
> >> >>> }
> >> >>>
> >> >>> As is shown, the "overlap" field is empty.
> >> >>> In the osd log, we found the following records:
> >> >>>
> >> >>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> >> >>> attrs
> >> >>> 2017-08-23 12:46:36.083021 7f675c704700 15
> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> >> >>> 0~8
> >> >>> 2017-08-23 12:46:36.083398 7f675c704700 10
> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
> >> >>> FileStore::read
> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> >> >>> 0~8/8
> >> >>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> >> >>> data
> >> >>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
> >> >>> bytes in 0 keys 0 reqids
> >> >>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >>> dropping ondisk_read_lock
> >> >>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >>> do_osd_op_effects osd.0 con 0x7f67874f0d00
> >> >>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
> >> >>> test-rados-api-xxh02v.ops.corp
> >> >>>
> >> >>> It seems that, when doing "copy-get", no extensive attributes are
> >> >>> copied. We believe that it's the following code that led to this
> >> >>> result:
> >> >>>
> >> >>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
> >> >>>         map<string, bufferlist> *out,
> >> >>>         bool user_only) {
> >> >>>     int r = 0;
> >> >>>     if (pool.info.require_rollback()) {
> >> >>>         if (out)
> >> >>>             *out = obc->attr_cache;
> >> >>>     } else {
> >> >>>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
> >> >>>     }
> >> >>>     if (out && user_only) {
> >> >>>         map<string, bufferlist> tmp;
> >> >>>         for (map<string, bufferlist>::iterator i = out->begin();
> >> >>>                 i != out->end(); ++i) {
> >> >>>             if (i->first.size() > 1 && i->first[0] == '_')
> >> >>>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
> >> >>>         }
> >> >>>         tmp.swap(*out);
> >> >>>     }
> >> >>>     return r;
> >> >>> }
> >> >>>
> >> >>> It seems that when "user_only" is true, extensive attributes without a
> >> >>> '_' as the starting character in its name would be filtered out. Is it
> >> >>> supposed to be doing things in this way?
> >> >>> And we found that there are only two places in the source code that
> >> >>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
> >> >>> "user_only" is true. Why add this parameter?
> >> >>>
> >> >>> By the way, we also found that these codes are added in commit
> >> >>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
> >> >>> purpose of it.
> >> >>>
> >> >>> Thank you:-)
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-23 16:54           ` Sage Weil
@ 2017-08-24  7:47             ` Xuehan Xu
  2017-08-24 13:20               ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Xuehan Xu @ 2017-08-24  7:47 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi, Sage.

It seems that, when calculating a clone object's bytes, the amount of
data recorded in clone_overlap is subtracted from the clone object's
size, as demonstrated in SnapSet::get_clone_bytes. This seems to be
right when the underlying ObjectStore is using COW clone, but, for
ObjectStores like filestore, this seems not to make sense since all
the object's data is copied. Is this right?

Thank you:-)

On 24 August 2017 at 00:54, Sage Weil <sage@newdream.net> wrote:
> On Thu, 24 Aug 2017, Xuehan Xu wrote:
>> Hi, sage.
>>
>> Thanks for your reply:-)
>>
>> Actually, our problem is as follows: we are trying to backup the rbd
>> data in a ceph cluster which is cache-tier enabled using "rbd
>> export-diff", however, due to the missing of some clone overlap, the
>> export-diff operation can't be right.
>>
>> After debugging, we found that there are two reasons for this missing,
>> one of which is as mentioned in this thread:
>> http://tracker.ceph.com/issues/21072, which would led to the overlap
>> recorded in the extensive attrs not to be "copied" to the base tier.
>> And when we try to fix it, we found that the "_" policy is in our way,
>> because the clone overlap is stored in "snapset" attr which has a non
>> "_" prefixed name, and even if we modify the code to ignore the "_"
>> policy when getting the extensive attrs, a "_" will be added when
>> storing them into the base tier, which would make the following
>> "list-snaps" operation not able to find the clone overlap.
>
> Okay.  The fix will need to explicitly copy the overlap metadata as part
> of copy-get, though... not the _ attr(s).  Those are managed locally in
> the pool and not meant to be copied across pools (you'll get all sorts of
> bugs if you try to do that!).
>
>> And the other reason that made the export-diff operate incorrectly is
>> as described in http://tracker.ceph.com/issues/20896. To fix this, we
>> are planning to add a "cache_clone_overlap" in SnapSet to track the
>> cache stats while leaving the orignal "clone_overlap" to track
>> modifications to the object. Is this the right way to go?
>
> I see.  I think that would work.  It might make sense to take a fresh look
> at what the cache stats are, though, to see if it is possible to get rid
> of that condition and avoiding maintaining a
> duplicate-but-slightly-different structure just for the cache tier.  I
> don't like the conditional behavior there... it really seems like
> clone_overlap should not vary between the cache and base tiers, and any
> adjustments for missing clones should happen only with the stats
> calculation but not with clone_overlap itself.
>
>
>> By the way, last month, when trying to fix the bug:
>> http://tracker.ceph.com/issues/17445, you asked for a test case in
>> master branch to confirm that master doesn't have the same bug. I've
>> submitted that test case in the PR:
>> https://github.com/ceph/ceph/pull/17017 and cherry-pick it into PR:
>> https://github.com/ceph/ceph/pull/16355. I'm very sorry for taking so
>> long, I was very busy during the past month trying to solve the above
>> problems. Could you help me review it when you are free, please?
>
> Thanks!
> sage
>
>
>>
>> Thanks very much:-)
>>
>> On 24 August 2017 at 00:07, Sage Weil <sage@newdream.net> wrote:
>> > On Wed, 23 Aug 2017, Xuehan Xu wrote:
>> >> Hi, everyone.
>> >>
>> >> In ReplicatedPG::_write_copy_chunk method, I saw the following code:
>> >>
>> >> if (!cop->temp_cursor.attr_complete) {
>> >>     t->touch(cop->results.temp_oid);
>> >>     for (map<string, bufferlist>::iterator p = cop->attrs.begin();
>> >>     p != cop->attrs.end(); ++p) {
>> >>     cop->results.attrs[string("_") + p->first] = p->second;
>> >>     t->setattr(cop->results.temp_oid, string("_") + p->first,
>> >>        p->second);
>> >> }
>> >>     cop->attrs.clear();
>> >> }
>> >>
>> >> It seems that user specified attrs are prefixed by "_", but why add
>> >> "_" here, in ReplicatedPG::_write_copy_chunk? It seems that this
>> >> method is used for copying objects in the RADOS cluster,.
>> >
>> > That is by design.  The non-_ attributes are managed locally.
>> >
>> > In this case, the clone_overlap is not preserved because there is no code
>> > to attempt to preserve it across the cache tier.  (And blindly copying the
>> > _ attribute wouldn't work.)
>> >
>> > I'm not sure what the issues would be with passing that metadata across
>> > the promote and flush operations...  I imagine it will be a bit fragile,
>> > though.
>> >
>> > Is this something that is really worth fixing?
>> >
>> > sage
>> >
>> >
>> >>
>> >> On 23 August 2017 at 15:40, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> >> > It seems that when calling ReplicatedPG::getattrs_maybe_cache in
>> >> > ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this
>> >> > right?
>> >> >
>> >> > On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> >> >> I submitted an issue for this:
>> >> >> http://tracker.ceph.com/issues/21072?next_issue_id=21071
>> >> >>
>> >> >> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> >> >>> Hi, everyone.
>> >> >>>
>> >> >>> Recently, we did a test as follows:
>> >> >>>
>> >> >>> We enabled cache tier and added a cache pool "vms_back_cache" on top
>> >> >>> of the base pool "vms_back". we first created an object, and then
>> >> >>> created a snap in the base pool and writing to that object again,
>> >> >>> which would make the object be promoted into the cache pool. At this
>> >> >>> time, we used "ceph-objectstore-tool" to dump the object, and the
>> >> >>> result is as follows:
>> >> >>>
>> >> >>> {
>> >> >>>     "id": {
>> >> >>>         "oid": "test.obj.6",
>> >> >>>         "key": "",
>> >> >>>         "snapid": -2,
>> >> >>>         "hash": 750422257,
>> >> >>>         "max": 0,
>> >> >>>         "pool": 11,
>> >> >>>         "namespace": "",
>> >> >>>         "max": 0
>> >> >>>     },
>> >> >>>     "info": {
>> >> >>>         "oid": {
>> >> >>>             "oid": "test.obj.6",
>> >> >>>             "key": "",
>> >> >>>             "snapid": -2,
>> >> >>>             "hash": 750422257,
>> >> >>>             "max": 0,
>> >> >>>             "pool": 11,
>> >> >>>             "namespace": ""
>> >> >>>         },
>> >> >>>         "version": "5010'5",
>> >> >>>         "prior_version": "4991'3",
>> >> >>>         "last_reqid": "client.175338.0:1",
>> >> >>>         "user_version": 5,
>> >> >>>         "size": 4194303,
>> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
>> >> >>>         "local_mtime": "2017-08-23 15:09:03.461111",
>> >> >>>         "lost": 0,
>> >> >>>         "flags": 4,
>> >> >>>         "snaps": [],
>> >> >>>         "truncate_seq": 0,
>> >> >>>         "truncate_size": 0,
>> >> >>>         "data_digest": 4294967295,
>> >> >>>         "omap_digest": 4294967295,
>> >> >>>         "watchers": {}
>> >> >>>     },
>> >> >>>     "stat": {
>> >> >>>         "size": 4194303,
>> >> >>>         "blksize": 4096,
>> >> >>>         "blocks": 8200,
>> >> >>>         "nlink": 1
>> >> >>>     },
>> >> >>>     "SnapSet": {
>> >> >>>         "snap_context": {
>> >> >>>             "seq": 13,
>> >> >>>             "snaps": [
>> >> >>>                 13
>> >> >>>             ]
>> >> >>>         },
>> >> >>>         "head_exists": 1,
>> >> >>>         "clones": [
>> >> >>>             {
>> >> >>>                 "snap": 13,
>> >> >>>                 "size": 4194303,
>> >> >>>                 "overlap": "[0~100,115~4194188]"
>> >> >>>             }
>> >> >>>         ]
>> >> >>>     }
>> >> >>> }
>> >> >>>
>> >> >>> Then we did cache-flush and cache-evict to flush that object down to
>> >> >>> the base pool, and, again, used "ceph-objectstore-tool" to dump the
>> >> >>> object in the base pool:
>> >> >>>
>> >> >>> {
>> >> >>>     "id": {
>> >> >>>         "oid": "test.obj.6",
>> >> >>>         "key": "",
>> >> >>>         "snapid": -2,
>> >> >>>         "hash": 750422257,
>> >> >>>         "max": 0,
>> >> >>>         "pool": 10,
>> >> >>>         "namespace": "",
>> >> >>>         "max": 0
>> >> >>>     },
>> >> >>>     "info": {
>> >> >>>         "oid": {
>> >> >>>             "oid": "test.obj.6",
>> >> >>>             "key": "",
>> >> >>>             "snapid": -2,
>> >> >>>             "hash": 750422257,
>> >> >>>             "max": 0,
>> >> >>>             "pool": 10,
>> >> >>>             "namespace": ""
>> >> >>>         },
>> >> >>>         "version": "5015'4",
>> >> >>>         "prior_version": "4991'2",
>> >> >>>         "last_reqid": "osd.34.5013:1",
>> >> >>>         "user_version": 5,
>> >> >>>         "size": 4194303,
>> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
>> >> >>>         "local_mtime": "2017-08-23 15:10:48.122138",
>> >> >>>         "lost": 0,
>> >> >>>         "flags": 52,
>> >> >>>         "snaps": [],
>> >> >>>         "truncate_seq": 0,
>> >> >>>         "truncate_size": 0,
>> >> >>>         "data_digest": 163942140,
>> >> >>>         "omap_digest": 4294967295,
>> >> >>>         "watchers": {}
>> >> >>>     },
>> >> >>>     "stat": {
>> >> >>>         "size": 4194303,
>> >> >>>         "blksize": 4096,
>> >> >>>         "blocks": 8200,
>> >> >>>         "nlink": 1
>> >> >>>     },
>> >> >>>     "SnapSet": {
>> >> >>>         "snap_context": {
>> >> >>>             "seq": 13,
>> >> >>>             "snaps": [
>> >> >>>                 13
>> >> >>>             ]
>> >> >>>         },
>> >> >>>         "head_exists": 1,
>> >> >>>         "clones": [
>> >> >>>             {
>> >> >>>                 "snap": 13,
>> >> >>>                 "size": 4194303,
>> >> >>>                 "overlap": "[]"
>> >> >>>             }
>> >> >>>         ]
>> >> >>>     }
>> >> >>> }
>> >> >>>
>> >> >>> As is shown, the "overlap" field is empty.
>> >> >>> In the osd log, we found the following records:
>> >> >>>
>> >> >>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>> >> >>> attrs
>> >> >>> 2017-08-23 12:46:36.083021 7f675c704700 15
>> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
>> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>> >> >>> 0~8
>> >> >>> 2017-08-23 12:46:36.083398 7f675c704700 10
>> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
>> >> >>> FileStore::read
>> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>> >> >>> 0~8/8
>> >> >>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>> >> >>> data
>> >> >>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >> >>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
>> >> >>> bytes in 0 keys 0 reqids
>> >> >>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
>> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >> >>> dropping ondisk_read_lock
>> >> >>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >> >>> do_osd_op_effects osd.0 con 0x7f67874f0d00
>> >> >>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >> >>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
>> >> >>> test-rados-api-xxh02v.ops.corp
>> >> >>>
>> >> >>> It seems that, when doing "copy-get", no extensive attributes are
>> >> >>> copied. We believe that it's the following code that led to this
>> >> >>> result:
>> >> >>>
>> >> >>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
>> >> >>>         map<string, bufferlist> *out,
>> >> >>>         bool user_only) {
>> >> >>>     int r = 0;
>> >> >>>     if (pool.info.require_rollback()) {
>> >> >>>         if (out)
>> >> >>>             *out = obc->attr_cache;
>> >> >>>     } else {
>> >> >>>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
>> >> >>>     }
>> >> >>>     if (out && user_only) {
>> >> >>>         map<string, bufferlist> tmp;
>> >> >>>         for (map<string, bufferlist>::iterator i = out->begin();
>> >> >>>                 i != out->end(); ++i) {
>> >> >>>             if (i->first.size() > 1 && i->first[0] == '_')
>> >> >>>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
>> >> >>>         }
>> >> >>>         tmp.swap(*out);
>> >> >>>     }
>> >> >>>     return r;
>> >> >>> }
>> >> >>>
>> >> >>> It seems that when "user_only" is true, extensive attributes without a
>> >> >>> '_' as the starting character in its name would be filtered out. Is it
>> >> >>> supposed to be doing things in this way?
>> >> >>> And we found that there are only two places in the source code that
>> >> >>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
>> >> >>> "user_only" is true. Why add this parameter?
>> >> >>>
>> >> >>> By the way, we also found that these codes are added in commit
>> >> >>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
>> >> >>> purpose of it.
>> >> >>>
>> >> >>> Thank you:-)
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>>
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-24  7:47             ` Xuehan Xu
@ 2017-08-24 13:20               ` Sage Weil
  2017-08-28  7:50                 ` Xuehan Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2017-08-24 13:20 UTC (permalink / raw)
  To: Xuehan Xu; +Cc: ceph-devel

On Thu, 24 Aug 2017, Xuehan Xu wrote:
> Hi, Sage.
> 
> It seems that, when calculating a clone object's bytes, the amount of
> data recorded in clone_overlap is subtracted from the clone object's
> size, as demonstrated in SnapSet::get_clone_bytes. This seems to be
> right when the underlying ObjectStore is using COW clone, but, for
> ObjectStores like filestore, this seems not to make sense since all
> the object's data is copied. Is this right?

Right.  It was done that way when we were targetting FileStore on top of 
btrfs, which would do the copy-on-write. BlueStore does it now too, so we 
probably want to preserve the behavior.

sage


 > 
> Thank you:-)
> 
> On 24 August 2017 at 00:54, Sage Weil <sage@newdream.net> wrote:
> > On Thu, 24 Aug 2017, Xuehan Xu wrote:
> >> Hi, sage.
> >>
> >> Thanks for your reply:-)
> >>
> >> Actually, our problem is as follows: we are trying to backup the rbd
> >> data in a ceph cluster which is cache-tier enabled using "rbd
> >> export-diff", however, due to the missing of some clone overlap, the
> >> export-diff operation can't be right.
> >>
> >> After debugging, we found that there are two reasons for this missing,
> >> one of which is as mentioned in this thread:
> >> http://tracker.ceph.com/issues/21072, which would led to the overlap
> >> recorded in the extensive attrs not to be "copied" to the base tier.
> >> And when we try to fix it, we found that the "_" policy is in our way,
> >> because the clone overlap is stored in "snapset" attr which has a non
> >> "_" prefixed name, and even if we modify the code to ignore the "_"
> >> policy when getting the extensive attrs, a "_" will be added when
> >> storing them into the base tier, which would make the following
> >> "list-snaps" operation not able to find the clone overlap.
> >
> > Okay.  The fix will need to explicitly copy the overlap metadata as part
> > of copy-get, though... not the _ attr(s).  Those are managed locally in
> > the pool and not meant to be copied across pools (you'll get all sorts of
> > bugs if you try to do that!).
> >
> >> And the other reason that made the export-diff operate incorrectly is
> >> as described in http://tracker.ceph.com/issues/20896. To fix this, we
> >> are planning to add a "cache_clone_overlap" in SnapSet to track the
> >> cache stats while leaving the orignal "clone_overlap" to track
> >> modifications to the object. Is this the right way to go?
> >
> > I see.  I think that would work.  It might make sense to take a fresh look
> > at what the cache stats are, though, to see if it is possible to get rid
> > of that condition and avoiding maintaining a
> > duplicate-but-slightly-different structure just for the cache tier.  I
> > don't like the conditional behavior there... it really seems like
> > clone_overlap should not vary between the cache and base tiers, and any
> > adjustments for missing clones should happen only with the stats
> > calculation but not with clone_overlap itself.
> >
> >
> >> By the way, last month, when trying to fix the bug:
> >> http://tracker.ceph.com/issues/17445, you asked for a test case in
> >> master branch to confirm that master doesn't have the same bug. I've
> >> submitted that test case in the PR:
> >> https://github.com/ceph/ceph/pull/17017 and cherry-pick it into PR:
> >> https://github.com/ceph/ceph/pull/16355. I'm very sorry for taking so
> >> long, I was very busy during the past month trying to solve the above
> >> problems. Could you help me review it when you are free, please?
> >
> > Thanks!
> > sage
> >
> >
> >>
> >> Thanks very much:-)
> >>
> >> On 24 August 2017 at 00:07, Sage Weil <sage@newdream.net> wrote:
> >> > On Wed, 23 Aug 2017, Xuehan Xu wrote:
> >> >> Hi, everyone.
> >> >>
> >> >> In ReplicatedPG::_write_copy_chunk method, I saw the following code:
> >> >>
> >> >> if (!cop->temp_cursor.attr_complete) {
> >> >>     t->touch(cop->results.temp_oid);
> >> >>     for (map<string, bufferlist>::iterator p = cop->attrs.begin();
> >> >>     p != cop->attrs.end(); ++p) {
> >> >>     cop->results.attrs[string("_") + p->first] = p->second;
> >> >>     t->setattr(cop->results.temp_oid, string("_") + p->first,
> >> >>        p->second);
> >> >> }
> >> >>     cop->attrs.clear();
> >> >> }
> >> >>
> >> >> It seems that user specified attrs are prefixed by "_", but why add
> >> >> "_" here, in ReplicatedPG::_write_copy_chunk? It seems that this
> >> >> method is used for copying objects in the RADOS cluster,.
> >> >
> >> > That is by design.  The non-_ attributes are managed locally.
> >> >
> >> > In this case, the clone_overlap is not preserved because there is no code
> >> > to attempt to preserve it across the cache tier.  (And blindly copying the
> >> > _ attribute wouldn't work.)
> >> >
> >> > I'm not sure what the issues would be with passing that metadata across
> >> > the promote and flush operations...  I imagine it will be a bit fragile,
> >> > though.
> >> >
> >> > Is this something that is really worth fixing?
> >> >
> >> > sage
> >> >
> >> >
> >> >>
> >> >> On 23 August 2017 at 15:40, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> >> > It seems that when calling ReplicatedPG::getattrs_maybe_cache in
> >> >> > ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this
> >> >> > right?
> >> >> >
> >> >> > On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> >> >> I submitted an issue for this:
> >> >> >> http://tracker.ceph.com/issues/21072?next_issue_id=21071
> >> >> >>
> >> >> >> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> >> >>> Hi, everyone.
> >> >> >>>
> >> >> >>> Recently, we did a test as follows:
> >> >> >>>
> >> >> >>> We enabled cache tier and added a cache pool "vms_back_cache" on top
> >> >> >>> of the base pool "vms_back". we first created an object, and then
> >> >> >>> created a snap in the base pool and writing to that object again,
> >> >> >>> which would make the object be promoted into the cache pool. At this
> >> >> >>> time, we used "ceph-objectstore-tool" to dump the object, and the
> >> >> >>> result is as follows:
> >> >> >>>
> >> >> >>> {
> >> >> >>>     "id": {
> >> >> >>>         "oid": "test.obj.6",
> >> >> >>>         "key": "",
> >> >> >>>         "snapid": -2,
> >> >> >>>         "hash": 750422257,
> >> >> >>>         "max": 0,
> >> >> >>>         "pool": 11,
> >> >> >>>         "namespace": "",
> >> >> >>>         "max": 0
> >> >> >>>     },
> >> >> >>>     "info": {
> >> >> >>>         "oid": {
> >> >> >>>             "oid": "test.obj.6",
> >> >> >>>             "key": "",
> >> >> >>>             "snapid": -2,
> >> >> >>>             "hash": 750422257,
> >> >> >>>             "max": 0,
> >> >> >>>             "pool": 11,
> >> >> >>>             "namespace": ""
> >> >> >>>         },
> >> >> >>>         "version": "5010'5",
> >> >> >>>         "prior_version": "4991'3",
> >> >> >>>         "last_reqid": "client.175338.0:1",
> >> >> >>>         "user_version": 5,
> >> >> >>>         "size": 4194303,
> >> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
> >> >> >>>         "local_mtime": "2017-08-23 15:09:03.461111",
> >> >> >>>         "lost": 0,
> >> >> >>>         "flags": 4,
> >> >> >>>         "snaps": [],
> >> >> >>>         "truncate_seq": 0,
> >> >> >>>         "truncate_size": 0,
> >> >> >>>         "data_digest": 4294967295,
> >> >> >>>         "omap_digest": 4294967295,
> >> >> >>>         "watchers": {}
> >> >> >>>     },
> >> >> >>>     "stat": {
> >> >> >>>         "size": 4194303,
> >> >> >>>         "blksize": 4096,
> >> >> >>>         "blocks": 8200,
> >> >> >>>         "nlink": 1
> >> >> >>>     },
> >> >> >>>     "SnapSet": {
> >> >> >>>         "snap_context": {
> >> >> >>>             "seq": 13,
> >> >> >>>             "snaps": [
> >> >> >>>                 13
> >> >> >>>             ]
> >> >> >>>         },
> >> >> >>>         "head_exists": 1,
> >> >> >>>         "clones": [
> >> >> >>>             {
> >> >> >>>                 "snap": 13,
> >> >> >>>                 "size": 4194303,
> >> >> >>>                 "overlap": "[0~100,115~4194188]"
> >> >> >>>             }
> >> >> >>>         ]
> >> >> >>>     }
> >> >> >>> }
> >> >> >>>
> >> >> >>> Then we did cache-flush and cache-evict to flush that object down to
> >> >> >>> the base pool, and, again, used "ceph-objectstore-tool" to dump the
> >> >> >>> object in the base pool:
> >> >> >>>
> >> >> >>> {
> >> >> >>>     "id": {
> >> >> >>>         "oid": "test.obj.6",
> >> >> >>>         "key": "",
> >> >> >>>         "snapid": -2,
> >> >> >>>         "hash": 750422257,
> >> >> >>>         "max": 0,
> >> >> >>>         "pool": 10,
> >> >> >>>         "namespace": "",
> >> >> >>>         "max": 0
> >> >> >>>     },
> >> >> >>>     "info": {
> >> >> >>>         "oid": {
> >> >> >>>             "oid": "test.obj.6",
> >> >> >>>             "key": "",
> >> >> >>>             "snapid": -2,
> >> >> >>>             "hash": 750422257,
> >> >> >>>             "max": 0,
> >> >> >>>             "pool": 10,
> >> >> >>>             "namespace": ""
> >> >> >>>         },
> >> >> >>>         "version": "5015'4",
> >> >> >>>         "prior_version": "4991'2",
> >> >> >>>         "last_reqid": "osd.34.5013:1",
> >> >> >>>         "user_version": 5,
> >> >> >>>         "size": 4194303,
> >> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
> >> >> >>>         "local_mtime": "2017-08-23 15:10:48.122138",
> >> >> >>>         "lost": 0,
> >> >> >>>         "flags": 52,
> >> >> >>>         "snaps": [],
> >> >> >>>         "truncate_seq": 0,
> >> >> >>>         "truncate_size": 0,
> >> >> >>>         "data_digest": 163942140,
> >> >> >>>         "omap_digest": 4294967295,
> >> >> >>>         "watchers": {}
> >> >> >>>     },
> >> >> >>>     "stat": {
> >> >> >>>         "size": 4194303,
> >> >> >>>         "blksize": 4096,
> >> >> >>>         "blocks": 8200,
> >> >> >>>         "nlink": 1
> >> >> >>>     },
> >> >> >>>     "SnapSet": {
> >> >> >>>         "snap_context": {
> >> >> >>>             "seq": 13,
> >> >> >>>             "snaps": [
> >> >> >>>                 13
> >> >> >>>             ]
> >> >> >>>         },
> >> >> >>>         "head_exists": 1,
> >> >> >>>         "clones": [
> >> >> >>>             {
> >> >> >>>                 "snap": 13,
> >> >> >>>                 "size": 4194303,
> >> >> >>>                 "overlap": "[]"
> >> >> >>>             }
> >> >> >>>         ]
> >> >> >>>     }
> >> >> >>> }
> >> >> >>>
> >> >> >>> As is shown, the "overlap" field is empty.
> >> >> >>> In the osd log, we found the following records:
> >> >> >>>
> >> >> >>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> >> >> >>> attrs
> >> >> >>> 2017-08-23 12:46:36.083021 7f675c704700 15
> >> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
> >> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> >> >> >>> 0~8
> >> >> >>> 2017-08-23 12:46:36.083398 7f675c704700 10
> >> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
> >> >> >>> FileStore::read
> >> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> >> >> >>> 0~8/8
> >> >> >>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> >> >> >>> data
> >> >> >>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >> >>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
> >> >> >>> bytes in 0 keys 0 reqids
> >> >> >>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >> >>> dropping ondisk_read_lock
> >> >> >>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >> >>> do_osd_op_effects osd.0 con 0x7f67874f0d00
> >> >> >>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >> >>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
> >> >> >>> test-rados-api-xxh02v.ops.corp
> >> >> >>>
> >> >> >>> It seems that, when doing "copy-get", no extensive attributes are
> >> >> >>> copied. We believe that it's the following code that led to this
> >> >> >>> result:
> >> >> >>>
> >> >> >>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
> >> >> >>>         map<string, bufferlist> *out,
> >> >> >>>         bool user_only) {
> >> >> >>>     int r = 0;
> >> >> >>>     if (pool.info.require_rollback()) {
> >> >> >>>         if (out)
> >> >> >>>             *out = obc->attr_cache;
> >> >> >>>     } else {
> >> >> >>>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
> >> >> >>>     }
> >> >> >>>     if (out && user_only) {
> >> >> >>>         map<string, bufferlist> tmp;
> >> >> >>>         for (map<string, bufferlist>::iterator i = out->begin();
> >> >> >>>                 i != out->end(); ++i) {
> >> >> >>>             if (i->first.size() > 1 && i->first[0] == '_')
> >> >> >>>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
> >> >> >>>         }
> >> >> >>>         tmp.swap(*out);
> >> >> >>>     }
> >> >> >>>     return r;
> >> >> >>> }
> >> >> >>>
> >> >> >>> It seems that when "user_only" is true, extensive attributes without a
> >> >> >>> '_' as the starting character in its name would be filtered out. Is it
> >> >> >>> supposed to be doing things in this way?
> >> >> >>> And we found that there are only two places in the source code that
> >> >> >>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
> >> >> >>> "user_only" is true. Why add this parameter?
> >> >> >>>
> >> >> >>> By the way, we also found that these codes are added in commit
> >> >> >>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
> >> >> >>> purpose of it.
> >> >> >>>
> >> >> >>> Thank you:-)
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-24 13:20               ` Sage Weil
@ 2017-08-28  7:50                 ` Xuehan Xu
  2017-08-28 13:12                   ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Xuehan Xu @ 2017-08-28  7:50 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi,sage.

About the PR: https://github.com/ceph/ceph/pull/17017, I've tested the
test case in our test environment multiple times, and it never failed.
Where can I find the logs of the OSDs on which the test case failed?
It might be helpful for me to find the cause of the failure.

Thanks:-)


On 24 August 2017 at 21:20, Sage Weil <sage@newdream.net> wrote:
> On Thu, 24 Aug 2017, Xuehan Xu wrote:
>> Hi, Sage.
>>
>> It seems that, when calculating a clone object's bytes, the amount of
>> data recorded in clone_overlap is subtracted from the clone object's
>> size, as demonstrated in SnapSet::get_clone_bytes. This seems to be
>> right when the underlying ObjectStore is using COW clone, but, for
>> ObjectStores like filestore, this seems not to make sense since all
>> the object's data is copied. Is this right?
>
> Right.  It was done that way when we were targetting FileStore on top of
> btrfs, which would do the copy-on-write. BlueStore does it now too, so we
> probably want to preserve the behavior.
>
> sage
>
>
>  >
>> Thank you:-)
>>
>> On 24 August 2017 at 00:54, Sage Weil <sage@newdream.net> wrote:
>> > On Thu, 24 Aug 2017, Xuehan Xu wrote:
>> >> Hi, sage.
>> >>
>> >> Thanks for your reply:-)
>> >>
>> >> Actually, our problem is as follows: we are trying to backup the rbd
>> >> data in a ceph cluster which is cache-tier enabled using "rbd
>> >> export-diff", however, due to the missing of some clone overlap, the
>> >> export-diff operation can't be right.
>> >>
>> >> After debugging, we found that there are two reasons for this missing,
>> >> one of which is as mentioned in this thread:
>> >> http://tracker.ceph.com/issues/21072, which would led to the overlap
>> >> recorded in the extensive attrs not to be "copied" to the base tier.
>> >> And when we try to fix it, we found that the "_" policy is in our way,
>> >> because the clone overlap is stored in "snapset" attr which has a non
>> >> "_" prefixed name, and even if we modify the code to ignore the "_"
>> >> policy when getting the extensive attrs, a "_" will be added when
>> >> storing them into the base tier, which would make the following
>> >> "list-snaps" operation not able to find the clone overlap.
>> >
>> > Okay.  The fix will need to explicitly copy the overlap metadata as part
>> > of copy-get, though... not the _ attr(s).  Those are managed locally in
>> > the pool and not meant to be copied across pools (you'll get all sorts of
>> > bugs if you try to do that!).
>> >
>> >> And the other reason that made the export-diff operate incorrectly is
>> >> as described in http://tracker.ceph.com/issues/20896. To fix this, we
>> >> are planning to add a "cache_clone_overlap" in SnapSet to track the
>> >> cache stats while leaving the orignal "clone_overlap" to track
>> >> modifications to the object. Is this the right way to go?
>> >
>> > I see.  I think that would work.  It might make sense to take a fresh look
>> > at what the cache stats are, though, to see if it is possible to get rid
>> > of that condition and avoiding maintaining a
>> > duplicate-but-slightly-different structure just for the cache tier.  I
>> > don't like the conditional behavior there... it really seems like
>> > clone_overlap should not vary between the cache and base tiers, and any
>> > adjustments for missing clones should happen only with the stats
>> > calculation but not with clone_overlap itself.
>> >
>> >
>> >> By the way, last month, when trying to fix the bug:
>> >> http://tracker.ceph.com/issues/17445, you asked for a test case in
>> >> master branch to confirm that master doesn't have the same bug. I've
>> >> submitted that test case in the PR:
>> >> https://github.com/ceph/ceph/pull/17017 and cherry-pick it into PR:
>> >> https://github.com/ceph/ceph/pull/16355. I'm very sorry for taking so
>> >> long, I was very busy during the past month trying to solve the above
>> >> problems. Could you help me review it when you are free, please?
>> >
>> > Thanks!
>> > sage
>> >
>> >
>> >>
>> >> Thanks very much:-)
>> >>
>> >> On 24 August 2017 at 00:07, Sage Weil <sage@newdream.net> wrote:
>> >> > On Wed, 23 Aug 2017, Xuehan Xu wrote:
>> >> >> Hi, everyone.
>> >> >>
>> >> >> In ReplicatedPG::_write_copy_chunk method, I saw the following code:
>> >> >>
>> >> >> if (!cop->temp_cursor.attr_complete) {
>> >> >>     t->touch(cop->results.temp_oid);
>> >> >>     for (map<string, bufferlist>::iterator p = cop->attrs.begin();
>> >> >>     p != cop->attrs.end(); ++p) {
>> >> >>     cop->results.attrs[string("_") + p->first] = p->second;
>> >> >>     t->setattr(cop->results.temp_oid, string("_") + p->first,
>> >> >>        p->second);
>> >> >> }
>> >> >>     cop->attrs.clear();
>> >> >> }
>> >> >>
>> >> >> It seems that user specified attrs are prefixed by "_", but why add
>> >> >> "_" here, in ReplicatedPG::_write_copy_chunk? It seems that this
>> >> >> method is used for copying objects in the RADOS cluster,.
>> >> >
>> >> > That is by design.  The non-_ attributes are managed locally.
>> >> >
>> >> > In this case, the clone_overlap is not preserved because there is no code
>> >> > to attempt to preserve it across the cache tier.  (And blindly copying the
>> >> > _ attribute wouldn't work.)
>> >> >
>> >> > I'm not sure what the issues would be with passing that metadata across
>> >> > the promote and flush operations...  I imagine it will be a bit fragile,
>> >> > though.
>> >> >
>> >> > Is this something that is really worth fixing?
>> >> >
>> >> > sage
>> >> >
>> >> >
>> >> >>
>> >> >> On 23 August 2017 at 15:40, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> >> >> > It seems that when calling ReplicatedPG::getattrs_maybe_cache in
>> >> >> > ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this
>> >> >> > right?
>> >> >> >
>> >> >> > On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> >> >> >> I submitted an issue for this:
>> >> >> >> http://tracker.ceph.com/issues/21072?next_issue_id=21071
>> >> >> >>
>> >> >> >> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
>> >> >> >>> Hi, everyone.
>> >> >> >>>
>> >> >> >>> Recently, we did a test as follows:
>> >> >> >>>
>> >> >> >>> We enabled cache tier and added a cache pool "vms_back_cache" on top
>> >> >> >>> of the base pool "vms_back". we first created an object, and then
>> >> >> >>> created a snap in the base pool and writing to that object again,
>> >> >> >>> which would make the object be promoted into the cache pool. At this
>> >> >> >>> time, we used "ceph-objectstore-tool" to dump the object, and the
>> >> >> >>> result is as follows:
>> >> >> >>>
>> >> >> >>> {
>> >> >> >>>     "id": {
>> >> >> >>>         "oid": "test.obj.6",
>> >> >> >>>         "key": "",
>> >> >> >>>         "snapid": -2,
>> >> >> >>>         "hash": 750422257,
>> >> >> >>>         "max": 0,
>> >> >> >>>         "pool": 11,
>> >> >> >>>         "namespace": "",
>> >> >> >>>         "max": 0
>> >> >> >>>     },
>> >> >> >>>     "info": {
>> >> >> >>>         "oid": {
>> >> >> >>>             "oid": "test.obj.6",
>> >> >> >>>             "key": "",
>> >> >> >>>             "snapid": -2,
>> >> >> >>>             "hash": 750422257,
>> >> >> >>>             "max": 0,
>> >> >> >>>             "pool": 11,
>> >> >> >>>             "namespace": ""
>> >> >> >>>         },
>> >> >> >>>         "version": "5010'5",
>> >> >> >>>         "prior_version": "4991'3",
>> >> >> >>>         "last_reqid": "client.175338.0:1",
>> >> >> >>>         "user_version": 5,
>> >> >> >>>         "size": 4194303,
>> >> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
>> >> >> >>>         "local_mtime": "2017-08-23 15:09:03.461111",
>> >> >> >>>         "lost": 0,
>> >> >> >>>         "flags": 4,
>> >> >> >>>         "snaps": [],
>> >> >> >>>         "truncate_seq": 0,
>> >> >> >>>         "truncate_size": 0,
>> >> >> >>>         "data_digest": 4294967295,
>> >> >> >>>         "omap_digest": 4294967295,
>> >> >> >>>         "watchers": {}
>> >> >> >>>     },
>> >> >> >>>     "stat": {
>> >> >> >>>         "size": 4194303,
>> >> >> >>>         "blksize": 4096,
>> >> >> >>>         "blocks": 8200,
>> >> >> >>>         "nlink": 1
>> >> >> >>>     },
>> >> >> >>>     "SnapSet": {
>> >> >> >>>         "snap_context": {
>> >> >> >>>             "seq": 13,
>> >> >> >>>             "snaps": [
>> >> >> >>>                 13
>> >> >> >>>             ]
>> >> >> >>>         },
>> >> >> >>>         "head_exists": 1,
>> >> >> >>>         "clones": [
>> >> >> >>>             {
>> >> >> >>>                 "snap": 13,
>> >> >> >>>                 "size": 4194303,
>> >> >> >>>                 "overlap": "[0~100,115~4194188]"
>> >> >> >>>             }
>> >> >> >>>         ]
>> >> >> >>>     }
>> >> >> >>> }
>> >> >> >>>
>> >> >> >>> Then we did cache-flush and cache-evict to flush that object down to
>> >> >> >>> the base pool, and, again, used "ceph-objectstore-tool" to dump the
>> >> >> >>> object in the base pool:
>> >> >> >>>
>> >> >> >>> {
>> >> >> >>>     "id": {
>> >> >> >>>         "oid": "test.obj.6",
>> >> >> >>>         "key": "",
>> >> >> >>>         "snapid": -2,
>> >> >> >>>         "hash": 750422257,
>> >> >> >>>         "max": 0,
>> >> >> >>>         "pool": 10,
>> >> >> >>>         "namespace": "",
>> >> >> >>>         "max": 0
>> >> >> >>>     },
>> >> >> >>>     "info": {
>> >> >> >>>         "oid": {
>> >> >> >>>             "oid": "test.obj.6",
>> >> >> >>>             "key": "",
>> >> >> >>>             "snapid": -2,
>> >> >> >>>             "hash": 750422257,
>> >> >> >>>             "max": 0,
>> >> >> >>>             "pool": 10,
>> >> >> >>>             "namespace": ""
>> >> >> >>>         },
>> >> >> >>>         "version": "5015'4",
>> >> >> >>>         "prior_version": "4991'2",
>> >> >> >>>         "last_reqid": "osd.34.5013:1",
>> >> >> >>>         "user_version": 5,
>> >> >> >>>         "size": 4194303,
>> >> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
>> >> >> >>>         "local_mtime": "2017-08-23 15:10:48.122138",
>> >> >> >>>         "lost": 0,
>> >> >> >>>         "flags": 52,
>> >> >> >>>         "snaps": [],
>> >> >> >>>         "truncate_seq": 0,
>> >> >> >>>         "truncate_size": 0,
>> >> >> >>>         "data_digest": 163942140,
>> >> >> >>>         "omap_digest": 4294967295,
>> >> >> >>>         "watchers": {}
>> >> >> >>>     },
>> >> >> >>>     "stat": {
>> >> >> >>>         "size": 4194303,
>> >> >> >>>         "blksize": 4096,
>> >> >> >>>         "blocks": 8200,
>> >> >> >>>         "nlink": 1
>> >> >> >>>     },
>> >> >> >>>     "SnapSet": {
>> >> >> >>>         "snap_context": {
>> >> >> >>>             "seq": 13,
>> >> >> >>>             "snaps": [
>> >> >> >>>                 13
>> >> >> >>>             ]
>> >> >> >>>         },
>> >> >> >>>         "head_exists": 1,
>> >> >> >>>         "clones": [
>> >> >> >>>             {
>> >> >> >>>                 "snap": 13,
>> >> >> >>>                 "size": 4194303,
>> >> >> >>>                 "overlap": "[]"
>> >> >> >>>             }
>> >> >> >>>         ]
>> >> >> >>>     }
>> >> >> >>> }
>> >> >> >>>
>> >> >> >>> As is shown, the "overlap" field is empty.
>> >> >> >>> In the osd log, we found the following records:
>> >> >> >>>
>> >> >> >>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>> >> >> >>> attrs
>> >> >> >>> 2017-08-23 12:46:36.083021 7f675c704700 15
>> >> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
>> >> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>> >> >> >>> 0~8
>> >> >> >>> 2017-08-23 12:46:36.083398 7f675c704700 10
>> >> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
>> >> >> >>> FileStore::read
>> >> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
>> >> >> >>> 0~8/8
>> >> >> >>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
>> >> >> >>> data
>> >> >> >>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
>> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >> >> >>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
>> >> >> >>> bytes in 0 keys 0 reqids
>> >> >> >>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
>> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >> >> >>> dropping ondisk_read_lock
>> >> >> >>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >> >> >>> do_osd_op_effects osd.0 con 0x7f67874f0d00
>> >> >> >>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
>> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
>> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
>> >> >> >>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
>> >> >> >>> test-rados-api-xxh02v.ops.corp
>> >> >> >>>
>> >> >> >>> It seems that, when doing "copy-get", no extensive attributes are
>> >> >> >>> copied. We believe that it's the following code that led to this
>> >> >> >>> result:
>> >> >> >>>
>> >> >> >>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
>> >> >> >>>         map<string, bufferlist> *out,
>> >> >> >>>         bool user_only) {
>> >> >> >>>     int r = 0;
>> >> >> >>>     if (pool.info.require_rollback()) {
>> >> >> >>>         if (out)
>> >> >> >>>             *out = obc->attr_cache;
>> >> >> >>>     } else {
>> >> >> >>>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
>> >> >> >>>     }
>> >> >> >>>     if (out && user_only) {
>> >> >> >>>         map<string, bufferlist> tmp;
>> >> >> >>>         for (map<string, bufferlist>::iterator i = out->begin();
>> >> >> >>>                 i != out->end(); ++i) {
>> >> >> >>>             if (i->first.size() > 1 && i->first[0] == '_')
>> >> >> >>>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
>> >> >> >>>         }
>> >> >> >>>         tmp.swap(*out);
>> >> >> >>>     }
>> >> >> >>>     return r;
>> >> >> >>> }
>> >> >> >>>
>> >> >> >>> It seems that when "user_only" is true, extensive attributes without a
>> >> >> >>> '_' as the starting character in its name would be filtered out. Is it
>> >> >> >>> supposed to be doing things in this way?
>> >> >> >>> And we found that there are only two places in the source code that
>> >> >> >>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
>> >> >> >>> "user_only" is true. Why add this parameter?
>> >> >> >>>
>> >> >> >>> By the way, we also found that these codes are added in commit
>> >> >> >>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
>> >> >> >>> purpose of it.
>> >> >> >>>
>> >> >> >>> Thank you:-)
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool.
  2017-08-28  7:50                 ` Xuehan Xu
@ 2017-08-28 13:12                   ` Sage Weil
  0 siblings, 0 replies; 11+ messages in thread
From: Sage Weil @ 2017-08-28 13:12 UTC (permalink / raw)
  To: Xuehan Xu; +Cc: ceph-devel

On Mon, 28 Aug 2017, Xuehan Xu wrote:
> Hi,sage.
> 
> About the PR: https://github.com/ceph/ceph/pull/17017, I've tested the
> test case in our test environment multiple times, and it never failed.
> Where can I find the logs of the OSDs on which the test case failed?
> It might be helpful for me to find the cause of the failure.

http://pulpito.ceph.com/yuriw-2017-08-24_14:50:56-rados-wip-yuri-testing4_2017_8_24-distro-basic-smithi/

has several failures.  For example,

http://qa-proxy.ceph.com/teuthology/yuriw-2017-08-24_14:50:56-rados-wip-yuri-testing4_2017_8_24-distro-basic-smithi/1559338/

You'll find all the logs in remote/*/log

Thanks!
sage

> 
> Thanks:-)
> 
> 
> On 24 August 2017 at 21:20, Sage Weil <sage@newdream.net> wrote:
> > On Thu, 24 Aug 2017, Xuehan Xu wrote:
> >> Hi, Sage.
> >>
> >> It seems that, when calculating a clone object's bytes, the amount of
> >> data recorded in clone_overlap is subtracted from the clone object's
> >> size, as demonstrated in SnapSet::get_clone_bytes. This seems to be
> >> right when the underlying ObjectStore is using COW clone, but, for
> >> ObjectStores like filestore, this seems not to make sense since all
> >> the object's data is copied. Is this right?
> >
> > Right.  It was done that way when we were targetting FileStore on top of
> > btrfs, which would do the copy-on-write. BlueStore does it now too, so we
> > probably want to preserve the behavior.
> >
> > sage
> >
> >
> >  >
> >> Thank you:-)
> >>
> >> On 24 August 2017 at 00:54, Sage Weil <sage@newdream.net> wrote:
> >> > On Thu, 24 Aug 2017, Xuehan Xu wrote:
> >> >> Hi, sage.
> >> >>
> >> >> Thanks for your reply:-)
> >> >>
> >> >> Actually, our problem is as follows: we are trying to backup the rbd
> >> >> data in a ceph cluster which is cache-tier enabled using "rbd
> >> >> export-diff", however, due to the missing of some clone overlap, the
> >> >> export-diff operation can't be right.
> >> >>
> >> >> After debugging, we found that there are two reasons for this missing,
> >> >> one of which is as mentioned in this thread:
> >> >> http://tracker.ceph.com/issues/21072, which would led to the overlap
> >> >> recorded in the extensive attrs not to be "copied" to the base tier.
> >> >> And when we try to fix it, we found that the "_" policy is in our way,
> >> >> because the clone overlap is stored in "snapset" attr which has a non
> >> >> "_" prefixed name, and even if we modify the code to ignore the "_"
> >> >> policy when getting the extensive attrs, a "_" will be added when
> >> >> storing them into the base tier, which would make the following
> >> >> "list-snaps" operation not able to find the clone overlap.
> >> >
> >> > Okay.  The fix will need to explicitly copy the overlap metadata as part
> >> > of copy-get, though... not the _ attr(s).  Those are managed locally in
> >> > the pool and not meant to be copied across pools (you'll get all sorts of
> >> > bugs if you try to do that!).
> >> >
> >> >> And the other reason that made the export-diff operate incorrectly is
> >> >> as described in http://tracker.ceph.com/issues/20896. To fix this, we
> >> >> are planning to add a "cache_clone_overlap" in SnapSet to track the
> >> >> cache stats while leaving the orignal "clone_overlap" to track
> >> >> modifications to the object. Is this the right way to go?
> >> >
> >> > I see.  I think that would work.  It might make sense to take a fresh look
> >> > at what the cache stats are, though, to see if it is possible to get rid
> >> > of that condition and avoiding maintaining a
> >> > duplicate-but-slightly-different structure just for the cache tier.  I
> >> > don't like the conditional behavior there... it really seems like
> >> > clone_overlap should not vary between the cache and base tiers, and any
> >> > adjustments for missing clones should happen only with the stats
> >> > calculation but not with clone_overlap itself.
> >> >
> >> >
> >> >> By the way, last month, when trying to fix the bug:
> >> >> http://tracker.ceph.com/issues/17445, you asked for a test case in
> >> >> master branch to confirm that master doesn't have the same bug. I've
> >> >> submitted that test case in the PR:
> >> >> https://github.com/ceph/ceph/pull/17017 and cherry-pick it into PR:
> >> >> https://github.com/ceph/ceph/pull/16355. I'm very sorry for taking so
> >> >> long, I was very busy during the past month trying to solve the above
> >> >> problems. Could you help me review it when you are free, please?
> >> >
> >> > Thanks!
> >> > sage
> >> >
> >> >
> >> >>
> >> >> Thanks very much:-)
> >> >>
> >> >> On 24 August 2017 at 00:07, Sage Weil <sage@newdream.net> wrote:
> >> >> > On Wed, 23 Aug 2017, Xuehan Xu wrote:
> >> >> >> Hi, everyone.
> >> >> >>
> >> >> >> In ReplicatedPG::_write_copy_chunk method, I saw the following code:
> >> >> >>
> >> >> >> if (!cop->temp_cursor.attr_complete) {
> >> >> >>     t->touch(cop->results.temp_oid);
> >> >> >>     for (map<string, bufferlist>::iterator p = cop->attrs.begin();
> >> >> >>     p != cop->attrs.end(); ++p) {
> >> >> >>     cop->results.attrs[string("_") + p->first] = p->second;
> >> >> >>     t->setattr(cop->results.temp_oid, string("_") + p->first,
> >> >> >>        p->second);
> >> >> >> }
> >> >> >>     cop->attrs.clear();
> >> >> >> }
> >> >> >>
> >> >> >> It seems that user specified attrs are prefixed by "_", but why add
> >> >> >> "_" here, in ReplicatedPG::_write_copy_chunk? It seems that this
> >> >> >> method is used for copying objects in the RADOS cluster,.
> >> >> >
> >> >> > That is by design.  The non-_ attributes are managed locally.
> >> >> >
> >> >> > In this case, the clone_overlap is not preserved because there is no code
> >> >> > to attempt to preserve it across the cache tier.  (And blindly copying the
> >> >> > _ attribute wouldn't work.)
> >> >> >
> >> >> > I'm not sure what the issues would be with passing that metadata across
> >> >> > the promote and flush operations...  I imagine it will be a bit fragile,
> >> >> > though.
> >> >> >
> >> >> > Is this something that is really worth fixing?
> >> >> >
> >> >> > sage
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> On 23 August 2017 at 15:40, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> >> >> > It seems that when calling ReplicatedPG::getattrs_maybe_cache in
> >> >> >> > ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this
> >> >> >> > right?
> >> >> >> >
> >> >> >> > On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> >> >> >> I submitted an issue for this:
> >> >> >> >> http://tracker.ceph.com/issues/21072?next_issue_id=21071
> >> >> >> >>
> >> >> >> >> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@gmail.com> wrote:
> >> >> >> >>> Hi, everyone.
> >> >> >> >>>
> >> >> >> >>> Recently, we did a test as follows:
> >> >> >> >>>
> >> >> >> >>> We enabled cache tier and added a cache pool "vms_back_cache" on top
> >> >> >> >>> of the base pool "vms_back". we first created an object, and then
> >> >> >> >>> created a snap in the base pool and writing to that object again,
> >> >> >> >>> which would make the object be promoted into the cache pool. At this
> >> >> >> >>> time, we used "ceph-objectstore-tool" to dump the object, and the
> >> >> >> >>> result is as follows:
> >> >> >> >>>
> >> >> >> >>> {
> >> >> >> >>>     "id": {
> >> >> >> >>>         "oid": "test.obj.6",
> >> >> >> >>>         "key": "",
> >> >> >> >>>         "snapid": -2,
> >> >> >> >>>         "hash": 750422257,
> >> >> >> >>>         "max": 0,
> >> >> >> >>>         "pool": 11,
> >> >> >> >>>         "namespace": "",
> >> >> >> >>>         "max": 0
> >> >> >> >>>     },
> >> >> >> >>>     "info": {
> >> >> >> >>>         "oid": {
> >> >> >> >>>             "oid": "test.obj.6",
> >> >> >> >>>             "key": "",
> >> >> >> >>>             "snapid": -2,
> >> >> >> >>>             "hash": 750422257,
> >> >> >> >>>             "max": 0,
> >> >> >> >>>             "pool": 11,
> >> >> >> >>>             "namespace": ""
> >> >> >> >>>         },
> >> >> >> >>>         "version": "5010'5",
> >> >> >> >>>         "prior_version": "4991'3",
> >> >> >> >>>         "last_reqid": "client.175338.0:1",
> >> >> >> >>>         "user_version": 5,
> >> >> >> >>>         "size": 4194303,
> >> >> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
> >> >> >> >>>         "local_mtime": "2017-08-23 15:09:03.461111",
> >> >> >> >>>         "lost": 0,
> >> >> >> >>>         "flags": 4,
> >> >> >> >>>         "snaps": [],
> >> >> >> >>>         "truncate_seq": 0,
> >> >> >> >>>         "truncate_size": 0,
> >> >> >> >>>         "data_digest": 4294967295,
> >> >> >> >>>         "omap_digest": 4294967295,
> >> >> >> >>>         "watchers": {}
> >> >> >> >>>     },
> >> >> >> >>>     "stat": {
> >> >> >> >>>         "size": 4194303,
> >> >> >> >>>         "blksize": 4096,
> >> >> >> >>>         "blocks": 8200,
> >> >> >> >>>         "nlink": 1
> >> >> >> >>>     },
> >> >> >> >>>     "SnapSet": {
> >> >> >> >>>         "snap_context": {
> >> >> >> >>>             "seq": 13,
> >> >> >> >>>             "snaps": [
> >> >> >> >>>                 13
> >> >> >> >>>             ]
> >> >> >> >>>         },
> >> >> >> >>>         "head_exists": 1,
> >> >> >> >>>         "clones": [
> >> >> >> >>>             {
> >> >> >> >>>                 "snap": 13,
> >> >> >> >>>                 "size": 4194303,
> >> >> >> >>>                 "overlap": "[0~100,115~4194188]"
> >> >> >> >>>             }
> >> >> >> >>>         ]
> >> >> >> >>>     }
> >> >> >> >>> }
> >> >> >> >>>
> >> >> >> >>> Then we did cache-flush and cache-evict to flush that object down to
> >> >> >> >>> the base pool, and, again, used "ceph-objectstore-tool" to dump the
> >> >> >> >>> object in the base pool:
> >> >> >> >>>
> >> >> >> >>> {
> >> >> >> >>>     "id": {
> >> >> >> >>>         "oid": "test.obj.6",
> >> >> >> >>>         "key": "",
> >> >> >> >>>         "snapid": -2,
> >> >> >> >>>         "hash": 750422257,
> >> >> >> >>>         "max": 0,
> >> >> >> >>>         "pool": 10,
> >> >> >> >>>         "namespace": "",
> >> >> >> >>>         "max": 0
> >> >> >> >>>     },
> >> >> >> >>>     "info": {
> >> >> >> >>>         "oid": {
> >> >> >> >>>             "oid": "test.obj.6",
> >> >> >> >>>             "key": "",
> >> >> >> >>>             "snapid": -2,
> >> >> >> >>>             "hash": 750422257,
> >> >> >> >>>             "max": 0,
> >> >> >> >>>             "pool": 10,
> >> >> >> >>>             "namespace": ""
> >> >> >> >>>         },
> >> >> >> >>>         "version": "5015'4",
> >> >> >> >>>         "prior_version": "4991'2",
> >> >> >> >>>         "last_reqid": "osd.34.5013:1",
> >> >> >> >>>         "user_version": 5,
> >> >> >> >>>         "size": 4194303,
> >> >> >> >>>         "mtime": "2017-08-23 15:09:03.459892",
> >> >> >> >>>         "local_mtime": "2017-08-23 15:10:48.122138",
> >> >> >> >>>         "lost": 0,
> >> >> >> >>>         "flags": 52,
> >> >> >> >>>         "snaps": [],
> >> >> >> >>>         "truncate_seq": 0,
> >> >> >> >>>         "truncate_size": 0,
> >> >> >> >>>         "data_digest": 163942140,
> >> >> >> >>>         "omap_digest": 4294967295,
> >> >> >> >>>         "watchers": {}
> >> >> >> >>>     },
> >> >> >> >>>     "stat": {
> >> >> >> >>>         "size": 4194303,
> >> >> >> >>>         "blksize": 4096,
> >> >> >> >>>         "blocks": 8200,
> >> >> >> >>>         "nlink": 1
> >> >> >> >>>     },
> >> >> >> >>>     "SnapSet": {
> >> >> >> >>>         "snap_context": {
> >> >> >> >>>             "seq": 13,
> >> >> >> >>>             "snaps": [
> >> >> >> >>>                 13
> >> >> >> >>>             ]
> >> >> >> >>>         },
> >> >> >> >>>         "head_exists": 1,
> >> >> >> >>>         "clones": [
> >> >> >> >>>             {
> >> >> >> >>>                 "snap": 13,
> >> >> >> >>>                 "size": 4194303,
> >> >> >> >>>                 "overlap": "[]"
> >> >> >> >>>             }
> >> >> >> >>>         ]
> >> >> >> >>>     }
> >> >> >> >>> }
> >> >> >> >>>
> >> >> >> >>> As is shown, the "overlap" field is empty.
> >> >> >> >>> In the osd log, we found the following records:
> >> >> >> >>>
> >> >> >> >>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> >> >> >> >>> attrs
> >> >> >> >>> 2017-08-23 12:46:36.083021 7f675c704700 15
> >> >> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read
> >> >> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> >> >> >> >>> 0~8
> >> >> >> >>> 2017-08-23 12:46:36.083398 7f675c704700 10
> >> >> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0)
> >> >> >> >>> FileStore::read
> >> >> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head#
> >> >> >> >>> 0~8/8
> >> >> >> >>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]  got
> >> >> >> >>> data
> >> >> >> >>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >> >> >>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data
> >> >> >> >>> bytes in 0 keys 0 reqids
> >> >> >> >>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >> >> >>> dropping ondisk_read_lock
> >> >> >> >>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >> >> >>> do_osd_op_effects osd.0 con 0x7f67874f0d00
> >> >> >> >>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3(
> >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14)
> >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean]
> >> >> >> >>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb
> >> >> >> >>> test-rados-api-xxh02v.ops.corp
> >> >> >> >>>
> >> >> >> >>> It seems that, when doing "copy-get", no extensive attributes are
> >> >> >> >>> copied. We believe that it's the following code that led to this
> >> >> >> >>> result:
> >> >> >> >>>
> >> >> >> >>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc,
> >> >> >> >>>         map<string, bufferlist> *out,
> >> >> >> >>>         bool user_only) {
> >> >> >> >>>     int r = 0;
> >> >> >> >>>     if (pool.info.require_rollback()) {
> >> >> >> >>>         if (out)
> >> >> >> >>>             *out = obc->attr_cache;
> >> >> >> >>>     } else {
> >> >> >> >>>         r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out);
> >> >> >> >>>     }
> >> >> >> >>>     if (out && user_only) {
> >> >> >> >>>         map<string, bufferlist> tmp;
> >> >> >> >>>         for (map<string, bufferlist>::iterator i = out->begin();
> >> >> >> >>>                 i != out->end(); ++i) {
> >> >> >> >>>             if (i->first.size() > 1 && i->first[0] == '_')
> >> >> >> >>>                 tmp[i->first.substr(1, i->first.size())].claim(i->second);
> >> >> >> >>>         }
> >> >> >> >>>         tmp.swap(*out);
> >> >> >> >>>     }
> >> >> >> >>>     return r;
> >> >> >> >>> }
> >> >> >> >>>
> >> >> >> >>> It seems that when "user_only" is true, extensive attributes without a
> >> >> >> >>> '_' as the starting character in its name would be filtered out. Is it
> >> >> >> >>> supposed to be doing things in this way?
> >> >> >> >>> And we found that there are only two places in the source code that
> >> >> >> >>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which
> >> >> >> >>> "user_only" is true. Why add this parameter?
> >> >> >> >>>
> >> >> >> >>> By the way, we also found that these codes are added in commit
> >> >> >> >>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the
> >> >> >> >>> purpose of it.
> >> >> >> >>>
> >> >> >> >>> Thank you:-)
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >>
> >> >> >>
> >> >>
> >> >>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-08-28 13:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-23  7:24 Extensive attributes not getting copied when flushing HEAD objects from cache pool to base pool Xuehan Xu
2017-08-23  7:25 ` Xuehan Xu
2017-08-23  7:40   ` Xuehan Xu
2017-08-23  9:32     ` Xuehan Xu
2017-08-23 16:07       ` Sage Weil
2017-08-23 16:41         ` Xuehan Xu
2017-08-23 16:54           ` Sage Weil
2017-08-24  7:47             ` Xuehan Xu
2017-08-24 13:20               ` Sage Weil
2017-08-28  7:50                 ` Xuehan Xu
2017-08-28 13:12                   ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.