All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sage Weil <sage@newdream.net>
To: Jan Schermer <jan@schermer.cz>
Cc: Wukongming <wu.kongming@h3c.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>,
	"&RD-STOR-FIRE@h3c.com" <&RD-STOR-FIRE@h3c.com>
Subject: Re: [ceph-users] Client io blocked when removing snapshot
Date: Thu, 10 Dec 2015 06:14:24 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1512100613120.19170@cobra.newdream.net> (raw)
In-Reply-To: <34168A89-37E0-4FCE-96EC-EBA0EC6CA904@schermer.cz>

On Thu, 10 Dec 2015, Jan Schermer wrote:
> Removing snapshot means looking for every *potential* object the snapshot can have, and this takes a very long time (6TB snapshot will consist of 1.5M objects (in one replica) assuming the default 4MB object size). The same applies to large thin volumes (don't try creating and then dropping a 1 EiB volume, even if you only have 1GB of physical space :)).
> Doing this is simply expensive and might saturate your OSDs. If you don't have enough RAM to cache the structure then all the "is there a file /var/lib/ceph/...." will go to disk and that can hurt a lot.
> I don't think there's any priority to this (is there?), so it competes with everything else.
> 
> I'm not sure how snapshots are exactly coded in Ceph, but in a COW filesystem you simply don't dereference blocks of the parent of the  snapshot when doing writes to it and that's cheap, but Ceph stores "blocks" in files with computable names and has no pointers to them that could be modified,  so by creating a snapshot you hurt the performance a lot (you need to create a copy of the 4MB object into the snapshot(s) when you dirty a byte in there). Though I remember reading that the logic is actually reversed and it is the snapshot that gets the original blocks(??)...
> Anyway if you are removing snapshot at the same time as writing to the parent there could be potentionaly a problem in what gets done first. Is Ceph smart enough to not care about snapshots that are getting deleted? I have no idea but I think it must be because we use snapshots a lot and haven't had that any issues with it.

It's not quite so bad... the OSD maintains a map (in leveldb) of the 
objects that are referenced by a snapshot, so the amount of work is 
proportional to the number of objects that were cloned for that snapshot.

There is certainly room for improvement in terms of the impact on client 
IO, though.  :)

sage

  parent reply	other threads:[~2015-12-10 14:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-10  6:52 Client io blocked when removing snapshot Wukongming
     [not found] ` <47D132BF400BE64BAE6D71033F7D3D7503DE0DF4-JwQOC20i6vT3cnzPNjVLboSsE/coCuR8pWgKQ6/u3Fg@public.gmane.org>
2015-12-10  8:01   ` Florent Manens
2015-12-10 10:42   ` Jan Schermer
2015-12-10 11:27     ` 答复: [ceph-users] " Wukongming
2015-12-10 14:14     ` Sage Weil [this message]
     [not found]       ` <alpine.DEB.2.00.1512100613120.19170-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-12-10 14:21         ` Jan Schermer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1512100613120.19170@cobra.newdream.net \
    --to=sage@newdream.net \
    --cc=&RD-STOR-FIRE@h3c.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph-users@lists.ceph.com \
    --cc=jan@schermer.cz \
    --cc=wu.kongming@h3c.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.