All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sage Weil <sage@newdream.net>
To: Igor Fedotov <ifedotov@mirantis.com>
Cc: Allen Samuels <Allen.Samuels@sandisk.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: Adding compression support for bluestore.
Date: Wed, 30 Mar 2016 08:47:01 -0400 (EDT)	[thread overview]
Message-ID: <alpine.DEB.2.11.1603300840560.22014@cpach.fuggernut.com> (raw)
In-Reply-To: <56FBC684.6090700@mirantis.com>

On Wed, 30 Mar 2016, Igor Fedotov wrote:
> On 29.03.2016 23:19, Sage Weil wrote:
> > On Thu, 24 Mar 2016, Igor Fedotov wrote:
> > > Sage, Allen et. al.
> > > 
> > > Please find some follow-up on our discussion below.
> > > 
> > > Your past and future comments are highly appreciated.
> > > 
> > > WRITE/COMPRESSION POLICY and INTERNAL BLUESTORE STRUCTURES OVERVIEW.
> > > 
> > > Used terminology:
> > > Extent - basic allocation unit. Variable in size, maximum size is limited
> > > by
> > > lblock length (see below), alignment: min_alloc_unit param (configurable,
> > > expected range: 4-64 Kb .
> > > Logical Block (lblock) - standalone traceable data unit. Min size
> > > unspecified.
> > > Alignment unspecified. Max size limited by max_logical_unit param
> > > (configurable, expected range: 128-512 Kb)
> > > 
> > > Compression to be applied on per-extent basis.
> > > Multiple lblocks can refer specific region within a single extent.
> > This (and the what's below) sound right to me.  My main concern is around
> > naming.  I don't much like "extent" vs "lblock" (which is which?).  Maybe
> > extent and extent_ref?
> > 
> > Also, I don't think we need the size limits you mention above.  When
> > compression is enabled, we'll limit the size of the disk extents by
> > policy, but the structures themselves needn't enforce that.  Similarly, I
> > don't think the lblocks (extent refs?  logical extents?) need a max size
> > either.
> Actually structures themselves don't have explicit limits except length fields
> width. But I'd prefer to enforce such a limit in the code ( add a policy?)
> that handles write (or perform merge ) to avoid huge l(p)extents for both
> compressed and uncompressed cases.
> The rationale for that is potentially ineffective space usage. Partially
> overlapped writes occlude previous extents thus the larger they are the more
> probable such occluding take place and more space is wasted. Moreover IMHO
> leaving the control over extent granularity ( if we don't enforce any limit
> they totally depend on the user write pattern) isn't a good idea in any case.

I'm thinking of the uncompressed case, where we can deallocate whatever 
min_alloc_size-aligned portion of the pextext we overwrite.  Similarly, in 
the checksum case, the size of the piece we have to r/m/w will depend on 
the checksum granularity.  Right now that code assumes it's always a 
single block, but I think it will become a function of the pextent 
properties (what size portion of the pextent can be modified?  
block-aligned, or checksum-block aligned, or is the entire pextent a 
single unit?).

> Would you like to have new data structures completely ready at this stage?
> With all checksum/compression/flag fields present?
> As for me I'd prefer to add them incrementally when specific feature (
> compression, checksum verification etc.) is implemented.
> It might be hard to design all of them at once. And probably blocks the
> implementation until all the discussions completion.

Just placeholder fields are fine. The main thing we want to not forget is 
that the pextent may be big (due to checksums), but we've already settled 
on a pextent/lextent approach that addresses that issue.  The other thing 
is that the checksum granuarity might vary, making the overwrite/update 
unit a function of the pextent, as I mentioned above.

Just a lot of considerations to juggle, and even a placeholder will help 
remind us. :)

sage

  reply	other threads:[~2016-03-30 12:47 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-15 16:29 Adding compression support for bluestore Igor Fedotov
2016-02-16  2:06 ` Haomai Wang
2016-02-17  0:11   ` Igor Fedotov
2016-02-19 23:13     ` Allen Samuels
2016-02-22 12:25       ` Sage Weil
2016-02-24 18:18         ` Igor Fedotov
2016-02-24 18:43           ` Allen Samuels
2016-02-26 17:41             ` Igor Fedotov
2016-03-15 17:12               ` Sage Weil
2016-03-16  1:06                 ` Allen Samuels
2016-03-16 18:34                 ` Igor Fedotov
2016-03-16 19:02                   ` Allen Samuels
2016-03-16 19:15                     ` Sage Weil
2016-03-16 19:20                       ` Allen Samuels
2016-03-16 19:29                         ` Sage Weil
2016-03-16 19:36                           ` Allen Samuels
2016-03-17 14:55                     ` Igor Fedotov
2016-03-17 15:28                       ` Allen Samuels
2016-03-18 13:00                         ` Igor Fedotov
2016-03-16 19:27                   ` Sage Weil
2016-03-16 19:41                     ` Allen Samuels
     [not found]                       ` <CA+z5DsxA9_LLozFrDOtnVRc7FcvN7S8OF12zswQZ4q4ysK_0BA@mail.gmail.com>
2016-03-16 22:56                         ` Blair Bethwaite
2016-03-17  3:21                           ` Allen Samuels
2016-03-17 10:01                             ` Willem Jan Withagen
2016-03-17 17:29                               ` Howard Chu
2016-03-17 15:21                             ` Igor Fedotov
2016-03-17 15:18                     ` Igor Fedotov
2016-03-17 15:33                       ` Sage Weil
2016-03-17 18:53                         ` Allen Samuels
2016-03-18 14:58                           ` Igor Fedotov
2016-03-18 15:53                         ` Igor Fedotov
2016-03-18 17:17                           ` Vikas Sinha-SSI
2016-03-19  3:14                             ` Allen Samuels
2016-03-21 14:19                             ` Igor Fedotov
2016-03-19  3:14                           ` Allen Samuels
2016-03-21 14:07                             ` Igor Fedotov
2016-03-21 15:14                               ` Allen Samuels
2016-03-21 16:35                                 ` Igor Fedotov
2016-03-21 17:14                                   ` Allen Samuels
2016-03-21 18:31                                     ` Igor Fedotov
2016-03-21 21:14                                       ` Allen Samuels
2016-03-21 15:32                             ` Igor Fedotov
2016-03-21 15:50                               ` Sage Weil
2016-03-21 18:01                                 ` Igor Fedotov
2016-03-24 12:45                                 ` Igor Fedotov
2016-03-24 22:29                                   ` Allen Samuels
2016-03-29 20:19                                   ` Sage Weil
2016-03-29 20:45                                     ` Allen Samuels
2016-03-30 12:32                                       ` Igor Fedotov
2016-03-30 12:28                                     ` Igor Fedotov
2016-03-30 12:47                                       ` Sage Weil [this message]
2016-03-31 21:56                                   ` Sage Weil
2016-04-01 18:54                                     ` Allen Samuels
2016-04-04 12:31                                     ` Igor Fedotov
2016-04-04 12:38                                     ` Igor Fedotov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.11.1603300840560.22014@cpach.fuggernut.com \
    --to=sage@newdream.net \
    --cc=Allen.Samuels@sandisk.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ifedotov@mirantis.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.