All of lore.kernel.org
 help / color / mirror / Atom feed
* Client io blocked when removing snapshot
@ 2015-12-10  6:52 Wukongming
       [not found] ` <47D132BF400BE64BAE6D71033F7D3D7503DE0DF4-JwQOC20i6vT3cnzPNjVLboSsE/coCuR8pWgKQ6/u3Fg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Wukongming @ 2015-12-10  6:52 UTC (permalink / raw)
  To: ceph-devel, ceph-users; +Cc: &RD-STOR-FIRE

Hi, All

I used a rbd command to create a 6TB-size image, And then created a snapshot of this image. After that, I kept writing something like modifying files so the snapshots would be cloned one by one.
At this time, I did the fellow 2 ops simultaneously.

1. keep client io to this image.
2. excute a rbd snap rm command to delete snapshot.

Finally ,I found client io blocked for quite a long time. I used SATA disk to test, and felt that ceph makes it a priority to remove snapshot.
Also we use iostat tool to help watch the disk state, and it runs in full workload.

So, should we have a priority to deal with client io instead of removing snapshot?
---------------------------------------------
wukongming ID: 12019
Tel:0571-86760239
Dept:2014 UIS2 ONEStor

-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Client io blocked when removing snapshot
       [not found] ` <47D132BF400BE64BAE6D71033F7D3D7503DE0DF4-JwQOC20i6vT3cnzPNjVLboSsE/coCuR8pWgKQ6/u3Fg@public.gmane.org>
@ 2015-12-10  8:01   ` Florent Manens
  2015-12-10 10:42   ` Jan Schermer
  1 sibling, 0 replies; 6+ messages in thread
From: Florent Manens @ 2015-12-10  8:01 UTC (permalink / raw)
  To: Wukongming
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	ceph-users-idqoXFIVOFJgJs9I8MT0rw, &RD-STOR-FIRE-vVzyEvZLFYE


[-- Attachment #1.1: Type: text/plain, Size: 2377 bytes --]

Hi, 

Can you try modifying osd_snap_trim_sleep ? The default value is 0, I have good results with 0.25 with a ceph cluster using SATA disks : 
ceph tell osd.* injectargs -- --osd_snap_trim_sleep 0.25 

Best regards, 

----- Le 10 Déc 15, à 7:52, Wukongming <wu.kongming-vVzyEvZLFYE@public.gmane.org> a écrit : 

> Hi, All

> I used a rbd command to create a 6TB-size image, And then created a snapshot of
> this image. After that, I kept writing something like modifying files so the
> snapshots would be cloned one by one.
> At this time, I did the fellow 2 ops simultaneously.

> 1. keep client io to this image.
> 2. excute a rbd snap rm command to delete snapshot.

> Finally ,I found client io blocked for quite a long time. I used SATA disk to
> test, and felt that ceph makes it a priority to remove snapshot.
> Also we use iostat tool to help watch the disk state, and it runs in full
> workload.

> So, should we have a priority to deal with client io instead of removing
> snapshot?
> ---------------------------------------------
> wukongming ID: 12019
> Tel:0571-86760239
> Dept:2014 UIS2 ONEStor

> -------------------------------------------------------------------------------------------------------------------------------------
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, which
> is
> intended only for the person or entity whose address is listed above. Any use of
> the
> information contained herein in any way (including, but not limited to, total or
> partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify
> the sender
> by phone or email immediately and delete it!
> N嫥叉靣笡y氊b瞂千v豝�)藓{.n�+壏渮榏z鳐妠ay�蕠跈�,jf"穐殝鄗�畐ア�⒎:+v墾妛鑚豰稛��珣赙zZ+凒殠娸"濟!秈

-- 
Florent Manens 
BeeZim 

[-- Attachment #1.2: Type: text/html, Size: 3219 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Client io blocked when removing snapshot
       [not found] ` <47D132BF400BE64BAE6D71033F7D3D7503DE0DF4-JwQOC20i6vT3cnzPNjVLboSsE/coCuR8pWgKQ6/u3Fg@public.gmane.org>
  2015-12-10  8:01   ` Florent Manens
@ 2015-12-10 10:42   ` Jan Schermer
  2015-12-10 11:27     ` 答复: [ceph-users] " Wukongming
  2015-12-10 14:14     ` Sage Weil
  1 sibling, 2 replies; 6+ messages in thread
From: Jan Schermer @ 2015-12-10 10:42 UTC (permalink / raw)
  To: Wukongming
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	ceph-users-idqoXFIVOFJgJs9I8MT0rw, &RD-STOR-FIRE-vVzyEvZLFYE

Removing snapshot means looking for every *potential* object the snapshot can have, and this takes a very long time (6TB snapshot will consist of 1.5M objects (in one replica) assuming the default 4MB object size). The same applies to large thin volumes (don't try creating and then dropping a 1 EiB volume, even if you only have 1GB of physical space :)).
Doing this is simply expensive and might saturate your OSDs. If you don't have enough RAM to cache the structure then all the "is there a file /var/lib/ceph/...." will go to disk and that can hurt a lot.
I don't think there's any priority to this (is there?), so it competes with everything else.

I'm not sure how snapshots are exactly coded in Ceph, but in a COW filesystem you simply don't dereference blocks of the parent of the  snapshot when doing writes to it and that's cheap, but Ceph stores "blocks" in files with computable names and has no pointers to them that could be modified,  so by creating a snapshot you hurt the performance a lot (you need to create a copy of the 4MB object into the snapshot(s) when you dirty a byte in there). Though I remember reading that the logic is actually reversed and it is the snapshot that gets the original blocks(??)...
Anyway if you are removing snapshot at the same time as writing to the parent there could be potentionaly a problem in what gets done first. Is Ceph smart enough to not care about snapshots that are getting deleted? I have no idea but I think it must be because we use snapshots a lot and haven't had that any issues with it.

Jan

> On 10 Dec 2015, at 07:52, Wukongming <wu.kongming@h3c.com> wrote:
> 
> Hi, All
> 
> I used a rbd command to create a 6TB-size image, And then created a snapshot of this image. After that, I kept writing something like modifying files so the snapshots would be cloned one by one.
> At this time, I did the fellow 2 ops simultaneously.
> 
> 1. keep client io to this image.
> 2. excute a rbd snap rm command to delete snapshot.
> 
> Finally ,I found client io blocked for quite a long time. I used SATA disk to test, and felt that ceph makes it a priority to remove snapshot.
> Also we use iostat tool to help watch the disk state, and it runs in full workload.
> 
> So, should we have a priority to deal with client io instead of removing snapshot?
> ---------------------------------------------
> wukongming ID: 12019
> Tel:0571-86760239
> Dept:2014 UIS2 ONEStor
> 
> -------------------------------------------------------------------------------------------------------------------------------------
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* 答复: [ceph-users] Client io blocked when removing snapshot
  2015-12-10 10:42   ` Jan Schermer
@ 2015-12-10 11:27     ` Wukongming
  2015-12-10 14:14     ` Sage Weil
  1 sibling, 0 replies; 6+ messages in thread
From: Wukongming @ 2015-12-10 11:27 UTC (permalink / raw)
  To: Jan Schermer; +Cc: ceph-devel, ceph-users, &RD-STOR-FIRE

When I adjusted the third parameter of OPTION(osd_snap_trim_sleep, OPT_FLOAT, 0) from 0 to 1, the issue could be fixed. I tried again with the value 0.1, it would not cause any problem either. 
So what is the best choice, Have you got a commended value?

Thanks!!
              Kongming Wu
---------------------------------------------
wukongming ID: 12019
Tel:0571-86760239
Dept:2014 UIS2 ONEStor

-----邮件原件-----
发件人: Jan Schermer [mailto:jan@schermer.cz] 
发送时间: 2015年12月10日 18:43
收件人: wukongming 12019 (RD)
抄送: ceph-devel@vger.kernel.org; ceph-users@lists.ceph.com; &RD-STOR-FIRE
主题: Re: [ceph-users] Client io blocked when removing snapshot

Removing snapshot means looking for every *potential* object the snapshot can have, and this takes a very long time (6TB snapshot will consist of 1.5M objects (in one replica) assuming the default 4MB object size). The same applies to large thin volumes (don't try creating and then dropping a 1 EiB volume, even if you only have 1GB of physical space :)).
Doing this is simply expensive and might saturate your OSDs. If you don't have enough RAM to cache the structure then all the "is there a file /var/lib/ceph/...." will go to disk and that can hurt a lot.
I don't think there's any priority to this (is there?), so it competes with everything else.

I'm not sure how snapshots are exactly coded in Ceph, but in a COW filesystem you simply don't dereference blocks of the parent of the  snapshot when doing writes to it and that's cheap, but Ceph stores "blocks" in files with computable names and has no pointers to them that could be modified,  so by creating a snapshot you hurt the performance a lot (you need to create a copy of the 4MB object into the snapshot(s) when you dirty a byte in there). Though I remember reading that the logic is actually reversed and it is the snapshot that gets the original blocks(??)...
Anyway if you are removing snapshot at the same time as writing to the parent there could be potentionaly a problem in what gets done first. Is Ceph smart enough to not care about snapshots that are getting deleted? I have no idea but I think it must be because we use snapshots a lot and haven't had that any issues with it.

Jan

> On 10 Dec 2015, at 07:52, Wukongming <wu.kongming@h3c.com> wrote:
> 
> Hi, All
> 
> I used a rbd command to create a 6TB-size image, And then created a snapshot of this image. After that, I kept writing something like modifying files so the snapshots would be cloned one by one.
> At this time, I did the fellow 2 ops simultaneously.
> 
> 1. keep client io to this image.
> 2. excute a rbd snap rm command to delete snapshot.
> 
> Finally ,I found client io blocked for quite a long time. I used SATA disk to test, and felt that ceph makes it a priority to remove snapshot.
> Also we use iostat tool to help watch the disk state, and it runs in full workload.
> 
> So, should we have a priority to deal with client io instead of removing snapshot?
> ---------------------------------------------
> wukongming ID: 12019
> Tel:0571-86760239
> Dept:2014 UIS2 ONEStor
> 
> ----------------------------------------------------------------------
> ---------------------------------------------------------------
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from 
> H3C, which is intended only for the person or entity whose address is 
> listed above. Any use of the information contained herein in any way 
> (including, but not limited to, total or partial disclosure, 
> reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, 
> please notify the sender by phone or email immediately and delete it!
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ceph-users] Client io blocked when removing snapshot
  2015-12-10 10:42   ` Jan Schermer
  2015-12-10 11:27     ` 答复: [ceph-users] " Wukongming
@ 2015-12-10 14:14     ` Sage Weil
       [not found]       ` <alpine.DEB.2.00.1512100613120.19170-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Sage Weil @ 2015-12-10 14:14 UTC (permalink / raw)
  To: Jan Schermer; +Cc: Wukongming, ceph-devel, ceph-users, &RD-STOR-FIRE

On Thu, 10 Dec 2015, Jan Schermer wrote:
> Removing snapshot means looking for every *potential* object the snapshot can have, and this takes a very long time (6TB snapshot will consist of 1.5M objects (in one replica) assuming the default 4MB object size). The same applies to large thin volumes (don't try creating and then dropping a 1 EiB volume, even if you only have 1GB of physical space :)).
> Doing this is simply expensive and might saturate your OSDs. If you don't have enough RAM to cache the structure then all the "is there a file /var/lib/ceph/...." will go to disk and that can hurt a lot.
> I don't think there's any priority to this (is there?), so it competes with everything else.
> 
> I'm not sure how snapshots are exactly coded in Ceph, but in a COW filesystem you simply don't dereference blocks of the parent of the  snapshot when doing writes to it and that's cheap, but Ceph stores "blocks" in files with computable names and has no pointers to them that could be modified,  so by creating a snapshot you hurt the performance a lot (you need to create a copy of the 4MB object into the snapshot(s) when you dirty a byte in there). Though I remember reading that the logic is actually reversed and it is the snapshot that gets the original blocks(??)...
> Anyway if you are removing snapshot at the same time as writing to the parent there could be potentionaly a problem in what gets done first. Is Ceph smart enough to not care about snapshots that are getting deleted? I have no idea but I think it must be because we use snapshots a lot and haven't had that any issues with it.

It's not quite so bad... the OSD maintains a map (in leveldb) of the 
objects that are referenced by a snapshot, so the amount of work is 
proportional to the number of objects that were cloned for that snapshot.

There is certainly room for improvement in terms of the impact on client 
IO, though.  :)

sage

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Client io blocked when removing snapshot
       [not found]       ` <alpine.DEB.2.00.1512100613120.19170-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-12-10 14:21         ` Jan Schermer
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Schermer @ 2015-12-10 14:21 UTC (permalink / raw)
  To: Sage Weil
  Cc: &RD-STOR-FIRE-vVzyEvZLFYE, ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	ceph-users-idqoXFIVOFJgJs9I8MT0rw, Wukongming


> On 10 Dec 2015, at 15:14, Sage Weil <sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org> wrote:
> 
> On Thu, 10 Dec 2015, Jan Schermer wrote:
>> Removing snapshot means looking for every *potential* object the snapshot can have, and this takes a very long time (6TB snapshot will consist of 1.5M objects (in one replica) assuming the default 4MB object size). The same applies to large thin volumes (don't try creating and then dropping a 1 EiB volume, even if you only have 1GB of physical space :)).
>> Doing this is simply expensive and might saturate your OSDs. If you don't have enough RAM to cache the structure then all the "is there a file /var/lib/ceph/...." will go to disk and that can hurt a lot.
>> I don't think there's any priority to this (is there?), so it competes with everything else.
>> 
>> I'm not sure how snapshots are exactly coded in Ceph, but in a COW filesystem you simply don't dereference blocks of the parent of the  snapshot when doing writes to it and that's cheap, but Ceph stores "blocks" in files with computable names and has no pointers to them that could be modified,  so by creating a snapshot you hurt the performance a lot (you need to create a copy of the 4MB object into the snapshot(s) when you dirty a byte in there). Though I remember reading that the logic is actually reversed and it is the snapshot that gets the original blocks(??)...
>> Anyway if you are removing snapshot at the same time as writing to the parent there could be potentionaly a problem in what gets done first. Is Ceph smart enough to not care about snapshots that are getting deleted? I have no idea but I think it must be because we use snapshots a lot and haven't had that any issues with it.
> 
> It's not quite so bad... the OSD maintains a map (in leveldb) of the 
> objects that are referenced by a snapshot, so the amount of work is 
> proportional to the number of objects that were cloned for that snapshot.
> 


Nice. I saw a blueprint somewhere earlier this year, so that's a pretty new thing (Hammer or Infernalis?)
And is it a map (with pointers to objects) or just a bitmap of the overlay?

Jan

> There is certainly room for improvement in terms of the impact on client 
> IO, though.  :)
> 
> sage

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-12-10 14:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-10  6:52 Client io blocked when removing snapshot Wukongming
     [not found] ` <47D132BF400BE64BAE6D71033F7D3D7503DE0DF4-JwQOC20i6vT3cnzPNjVLboSsE/coCuR8pWgKQ6/u3Fg@public.gmane.org>
2015-12-10  8:01   ` Florent Manens
2015-12-10 10:42   ` Jan Schermer
2015-12-10 11:27     ` 答复: [ceph-users] " Wukongming
2015-12-10 14:14     ` Sage Weil
     [not found]       ` <alpine.DEB.2.00.1512100613120.19170-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-12-10 14:21         ` Jan Schermer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.