All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vikas Sinha-SSI <v.sinha@ssi.samsung.com>
To: Igor Fedotov <ifedotov@mirantis.com>, Sage Weil <sage@newdream.net>
Cc: Allen Samuels <Allen.Samuels@sandisk.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: RE: Adding compression support for bluestore.
Date: Fri, 18 Mar 2016 17:17:38 +0000	[thread overview]
Message-ID: <EC1BE61F68296D4A8CE45AD3DEDB99E21AEF6926@SSIEXCH-MB3.ssi.samsung.com> (raw)
In-Reply-To: <56EC248E.3060502@mirantis.com>

Hi Igor,
Thanks a lot for this. Do you also consider supporting offline compression (via a background task, or
at least something not in the main IO path)? Will the current proposal allow this, and do you consider
this to be a useful option at all? My concern is with the performance impact of compression, and obviously I
don't know whether it will be significant. Obviously I'm also concerned about adding more complexity.
I would love to know your thoughts on this.
Thanks,
Vikas


> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Igor Fedotov
> Sent: Friday, March 18, 2016 8:54 AM
> To: Sage Weil
> Cc: Allen Samuels; ceph-devel
> Subject: Re: Adding compression support for bluestore.
> 
> 
> 
> On 17.03.2016 18:33, Sage Weil wrote:
> > I'd say "maybe". It's easy to say we should focus on read performance
> > now, but as soon as we have "support for compression" everybody is
> > going to want to turn it on on all of their clusters to spend less
> > money on hard disks. That will definitely include RBD users, where
> > write latency is very important. I'm hesitant to take an architectural
> > direction that locks us in. With something layered over BlueStore I
> > think we're forced to do it all in the initial phase; with the
> > monolithic approach that integrates it into BlueStore's write path we
> > have the option to do either one--perhaps based on the particular
> > request or hints or whatever.
> >>>>> What do you think?
> >>>>>
> >>>>> It would be nice to choose a simpler strategy for the first pass that
> >>>>> handles a subset of write patterns (i.e., sequential writes, possibly
> >>>>> unaligned) that is still a step in the direction of the more robust
> >>>>> strategy we expect to implement after that.
> >>>>>
> >>>> I'd probably agree but.... I don't see a good way how one can
> implement
> >>>> compression for specific write patterns only.
> >>>> We need to either ensure that these patterns are used exclusively
> ( append
> >>>> only / sequential only flags? ) or provide some means to fall back to
> >>>> regular
> >>>> mode when inappropriate write occurs.
> >>>> Don't think both are good and/or easy enough.
> >>> Well, if we simply don't implement a garbage collector, then for
> >>> sequential+aligned writes we don't end up with stuff that needs garbage
> >>> collection.  Even the sequential case might be doable if we make it
> >>> possible to fill the extent with a sequence of compressed strings (as long
> >>> as we haven't reached the compressed length, try to restart the
> >>> decompression stream).
> >> It's still unclear to me if such specific patterns should be exclusively
> >> applied to the object. E.g. by using specific object creation mode mode.
> >> Or we should detect them automatically and be able to fall back to regular
> >> write ( i.e. disable compression )  when write doesn't conform to the
> >> supported pattern.
> > I think initially supporting only the append workload is a simple check
> > for whether the offset == the object size (and maybe whether it is
> > aligned).  No persistent flags or hints needed there.
> Well, but issues appear immediately after some overwrite request takes
> place.
> How to handle overwrites? To do compression for the overwritten or not?
> If not - we need some way to be able to merge compressed and
> uncompressed blocks. And so on and so forth
> IMO it's hard (or even impossible) to apply compression for specific
> write patterns only unless you prohibit other ones.
> We can support a subset of compression policies ( i.e. ways how we
> resolve compression issues: RMW at init phase, lazy overwrite, WAL use,
> etc ) but not a subset of write patterns.
> 
> >> And I'm not following the idea about "a sequence of compressed strings".
> Could
> >> you please elaborate?
> > Let's say we have 32KB compressed_blocks, and the client is doing 1000
> > byte appends.  We will allocate a 32 chunk on disk, and only fill it with
> > say ~500 bytes of compressed data.  When the next write comes around,
> we
> > could compress it too and append it to the block without decompressing
> the
> > previous string.
> >
> > By string I mean that each compression cycle looks something like
> >
> >   start(...)
> >   while (more data)
> >     compress_some_stuff(...)
> >   finish(...)
> >
> > i.e., there's a header and maybe a footer in the compressed string.  If we
> > are decompressing and the decompressor says "done" but there is more
> data
> > in our compressed block, we could repeat the process until we get to the
> > end of the compressed data.
> Got it, thanks for clarification
> > But it might not matter or be worth it.  If the compressed blocks are
> > smallish then decompressing, appending, and recompressing isn't going to
> > be that expensive anyway.  I'm mostly worried about small appends, e.g.
> by
> > rbd mirroring (imaging 4 KB writes + some metadata) or the MDS journal.
> That's mainly about small appends not small writes, right?
> 
> At this point I agree with Allen that we need variable policies to
> handle compression. Most probably we wouldn't be able to create single
> one that fits perfect for any write pattern.
> The only concern about that is the complexity of such a task...
> > sage
> Thanks,
> Igor
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-03-18 17:27 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-15 16:29 Adding compression support for bluestore Igor Fedotov
2016-02-16  2:06 ` Haomai Wang
2016-02-17  0:11   ` Igor Fedotov
2016-02-19 23:13     ` Allen Samuels
2016-02-22 12:25       ` Sage Weil
2016-02-24 18:18         ` Igor Fedotov
2016-02-24 18:43           ` Allen Samuels
2016-02-26 17:41             ` Igor Fedotov
2016-03-15 17:12               ` Sage Weil
2016-03-16  1:06                 ` Allen Samuels
2016-03-16 18:34                 ` Igor Fedotov
2016-03-16 19:02                   ` Allen Samuels
2016-03-16 19:15                     ` Sage Weil
2016-03-16 19:20                       ` Allen Samuels
2016-03-16 19:29                         ` Sage Weil
2016-03-16 19:36                           ` Allen Samuels
2016-03-17 14:55                     ` Igor Fedotov
2016-03-17 15:28                       ` Allen Samuels
2016-03-18 13:00                         ` Igor Fedotov
2016-03-16 19:27                   ` Sage Weil
2016-03-16 19:41                     ` Allen Samuels
     [not found]                       ` <CA+z5DsxA9_LLozFrDOtnVRc7FcvN7S8OF12zswQZ4q4ysK_0BA@mail.gmail.com>
2016-03-16 22:56                         ` Blair Bethwaite
2016-03-17  3:21                           ` Allen Samuels
2016-03-17 10:01                             ` Willem Jan Withagen
2016-03-17 17:29                               ` Howard Chu
2016-03-17 15:21                             ` Igor Fedotov
2016-03-17 15:18                     ` Igor Fedotov
2016-03-17 15:33                       ` Sage Weil
2016-03-17 18:53                         ` Allen Samuels
2016-03-18 14:58                           ` Igor Fedotov
2016-03-18 15:53                         ` Igor Fedotov
2016-03-18 17:17                           ` Vikas Sinha-SSI [this message]
2016-03-19  3:14                             ` Allen Samuels
2016-03-21 14:19                             ` Igor Fedotov
2016-03-19  3:14                           ` Allen Samuels
2016-03-21 14:07                             ` Igor Fedotov
2016-03-21 15:14                               ` Allen Samuels
2016-03-21 16:35                                 ` Igor Fedotov
2016-03-21 17:14                                   ` Allen Samuels
2016-03-21 18:31                                     ` Igor Fedotov
2016-03-21 21:14                                       ` Allen Samuels
2016-03-21 15:32                             ` Igor Fedotov
2016-03-21 15:50                               ` Sage Weil
2016-03-21 18:01                                 ` Igor Fedotov
2016-03-24 12:45                                 ` Igor Fedotov
2016-03-24 22:29                                   ` Allen Samuels
2016-03-29 20:19                                   ` Sage Weil
2016-03-29 20:45                                     ` Allen Samuels
2016-03-30 12:32                                       ` Igor Fedotov
2016-03-30 12:28                                     ` Igor Fedotov
2016-03-30 12:47                                       ` Sage Weil
2016-03-31 21:56                                   ` Sage Weil
2016-04-01 18:54                                     ` Allen Samuels
2016-04-04 12:31                                     ` Igor Fedotov
2016-04-04 12:38                                     ` Igor Fedotov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EC1BE61F68296D4A8CE45AD3DEDB99E21AEF6926@SSIEXCH-MB3.ssi.samsung.com \
    --to=v.sinha@ssi.samsung.com \
    --cc=Allen.Samuels@sandisk.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ifedotov@mirantis.com \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.