All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sage Weil <sage@newdream.net>
To: "Piotr Dałek" <piotr.dalek@corp.ovh.com>
Cc: "Paweł Sadowski" <ceph@sadziu.pl>,
	ceph-devel <ceph-devel@vger.kernel.org>,
	ceph-users <ceph-users@ceph.com>
Subject: Re: Sparse file info in filestore not propagated to other OSDs
Date: Wed, 21 Jun 2017 13:24:55 +0000 (UTC)	[thread overview]
Message-ID: <alpine.DEB.2.11.1706211322360.32269@piezo.novalocal> (raw)
In-Reply-To: <e89cc0ed-14f0-b092-784f-784c8c967e9d@corp.ovh.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2971 bytes --]

On Wed, 21 Jun 2017, Piotr Dałek wrote:
> On 17-06-14 03:44 PM, Sage Weil wrote:
> > On Wed, 14 Jun 2017, Paweł Sadowski wrote:
> > > On 04/13/2017 04:23 PM, Piotr Dałek wrote:
> > > > On 04/06/2017 03:25 PM, Sage Weil wrote:
> > > > > On Thu, 6 Apr 2017, Piotr Dałek wrote:
> > > > > > [snip]
> > > > > 
> > > > > I think the solution here is to use sparse_read during recovery.  The
> > > > > PushOp data representation already supports it; it's just a matter of
> > > > > skipping the zeros.  The recovery code could also have an option to
> > > > > check
> > > > > for fully-zero regions of the data and turn those into holes as
> > > > > well.  For
> > > > > ReplicatedBackend, see build_push_op().
> > > > 
> > > > So far it turns out that there's even easier solution, we just enabled
> > > > "filestore seek hole" on some test cluster and that seems to fix the
> > > > problem for us. We'll see if fiemap works too.
> > > > 
> > > 
> > > Is it safe to enable "filestore seek hole", are there any tests that
> > > verifies that everything related to RBD works fine with this enabled?
> > > Can we make this enabled by default?
> > 
> > We would need to enable it in the qa environment first.  The risk here is
> > that users run a broad range of kernels and we are exposing ourselves to
> > any bugs in any kernel version they may run.  I'd prefer to leave it off
> > by default.
> 
> That's a common regression? If not, we could blacklist particular kernels and
> call it a day.
>  > We can enable it in the qa suite, though, which covers
> > centos7 (latest kernel) and ubuntu xenial and trusty.
> 
> +1. Do you need some particular PR for that?

Sure.  How about a patch that adds the config option to several of the 
files in qa/suites/rados/thrash/thrashers?

> > > I tested on few of our production images and it seems that about 30% is
> > > sparse. This will be lost on any cluster wide event (add/remove nodes,
> > > PG grow, recovery).
> > > 
> > > How this is/will be handled in BlueStore?
> > 
> > BlueStore exposes the same sparseness metadata that enabling the
> > filestore seek hole or fiemap options does, so it won't be a problem
> > there.
> > 
> > I think the only thing that we could potentially add is zero detection
> > on writes (so that explicitly writing zeros consumes no space).  We'd
> > have to be a bit careful measuring the performance impact of that check on
> > non-zero writes.
> 
> I saw that RBD (librbd) does that - replacing writes with discards when buffer
> contains only zeros. Some code that does the same in librados could be added
> and it shouldn't impact performance much, current implementation of
> mem_is_zero is fast and shouldn't be a big problem.

I'd rather not have librados silently translating requests; I think it 
makes more sense to do any zero checking in bluestore.  _do_write_small 
and _do_write_big already break writes into (aligned) chunks; that would 
be an easy place to add the check.

sage

  reply	other threads:[~2017-06-21 13:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06 10:15 Sparse file info in filestore not propagated to other OSDs Piotr Dałek
2017-04-06 13:25 ` Sage Weil
2017-04-06 13:30   ` Piotr Dałek
2017-04-06 13:55     ` Sage Weil
2017-04-06 14:24       ` Piotr Dałek
2017-04-06 14:27         ` Sage Weil
2017-04-06 15:50           ` Jason Dillaman
2017-04-06 17:52             ` Josh Durgin
2017-04-07  6:46           ` Piotr Dałek
2017-04-13 14:23   ` Piotr Dałek
     [not found]     ` <d4bde447-f179-aeca-bac5-636fa40ccba5-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org>
2017-06-14  6:30       ` Paweł Sadowski
2017-06-14 13:44         ` Sage Weil
     [not found]           ` <alpine.DEB.2.11.1706141340520.3646-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
2017-06-21  7:05             ` Piotr Dałek
2017-06-21 13:24               ` Sage Weil [this message]
2017-06-21 13:46                 ` Piotr Dałek
     [not found]                   ` <898546b4-b9b2-5413-27ab-74534cc77eed-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org>
2017-06-21 13:56                     ` Sage Weil
2017-06-26 11:59                 ` Piotr Dalek
2017-06-21 13:35               ` [ceph-users] " Jason Dillaman
     [not found]                 ` <CA+aFP1DJ3L3Pg0r4Pj3o7JoNTNnBRRs0u_nnb2JYz4nGxafUTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-21 13:47                   ` Piotr Dałek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.11.1706211322360.32269@piezo.novalocal \
    --to=sage@newdream.net \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph-users@ceph.com \
    --cc=ceph@sadziu.pl \
    --cc=piotr.dalek@corp.ovh.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.