All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Fedotov <ifedotov@mirantis.com>
To: Allen Samuels <Allen.Samuels@sandisk.com>, Sage Weil <sage@newdream.net>
Cc: ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: Adding compression support for bluestore.
Date: Thu, 17 Mar 2016 17:55:53 +0300	[thread overview]
Message-ID: <56EAC579.9060208@mirantis.com> (raw)
In-Reply-To: <BLUPR0201MB1890A352BFE71C1D74634ED2E88A0@BLUPR0201MB1890.namprd02.prod.outlook.com>

Allen,

On 16.03.2016 22:02, Allen Samuels wrote:
>
>> Compression support approach:
>> The aim is to provide generic compression support allowing random
>> object read/write.
>> To do that compression engine to be placed (logically - actual
>> implementation may be discussed later) on top of bluestore to
>> "intercept" read-write requests and modify them as needed.
>>> I think it is going to make the most sense to do the compression and
>>> decompression in _do_write and _do_read (or helpers), within
>>> bluestore--not in some layer that sits above it but communicates
>>> metadata down to it.
>> My original intention was to minimize bluestore modifications needed to add
>> compression support. Particularly this helps to avoid additional bluestore
>> complication.
>> Another point for a segregation is a potential ability to move compression
>> engine out of store level to a pool one in the future.
>> Remember we still have 200% CPU utilization overhead for current approach
>> with replicated pools as each replica is compressed independently.
> One advantage of the current scheme is that you can use the same basic flow for EC and replicated pools. The scheme that you propose means that EC chunking boundaries become fluid and data-sensitive -- destroying the "seek" capability (i.e., you no longer know which node has any given logical address within the object). Essentially you'll need an entirely different backend flow for EC pools (at this level) with a complicated metadata mapping scheme. That seems MUCH more complicated and run-time expensive to me.
Wouldn't agree with this statement.  Perhaps I improperly presented my 
ideas or missing something...
IMHO current EC pool's write pattern is just a regular append only mode. 
And read pattern is partially random - EC reads data in arbitrary order 
at specific offsets only. As long as some layer is able to handle such 
patterns it's probably OK for EC pool. And I don't see any reasons why 
compression layer is unable to do that and what's the difference 
comparing to replicated pools.
Actually my idea  about segregation was mainly about reusing existing 
bluestore rather than modifying it. Compression engine should somehow 
(e.g. by inheriting from bluestore and overriding _do_write/_do_read 
methods ) intercept write/read requests and maintain its OWN block 
management independent from bluestore one. Bluestore is left untouched 
and exposes its functionality ( via Read/Write handlers) AS-IS to the 
compression layer instead of pools. The key thing is that compressed 
blocks map and bluestore extents map use the same logical offset, i.e. 
if some compressed block starts at offset X it's written to bluestore at 
offset X too. But written block is shorter than original one and thus 
store space is saved.
I would agree with the comment that this probably complicates metadata 
handling - compression layer metadata has to be handled similar to 
bluestore ones ( proper sync, WAL, transaction, etc). But I don't see 
any issues specific to EC here...
Have I missed something?

PS. This is rather academic question to better understand the difference 
in our POVs. Please ignore if you find it obtrusive or don't have enough 
time for detailed explanation. It looks like we wouldn't go this way in 
any case.

Thanks,
Igor

  parent reply	other threads:[~2016-03-17 14:55 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-15 16:29 Adding compression support for bluestore Igor Fedotov
2016-02-16  2:06 ` Haomai Wang
2016-02-17  0:11   ` Igor Fedotov
2016-02-19 23:13     ` Allen Samuels
2016-02-22 12:25       ` Sage Weil
2016-02-24 18:18         ` Igor Fedotov
2016-02-24 18:43           ` Allen Samuels
2016-02-26 17:41             ` Igor Fedotov
2016-03-15 17:12               ` Sage Weil
2016-03-16  1:06                 ` Allen Samuels
2016-03-16 18:34                 ` Igor Fedotov
2016-03-16 19:02                   ` Allen Samuels
2016-03-16 19:15                     ` Sage Weil
2016-03-16 19:20                       ` Allen Samuels
2016-03-16 19:29                         ` Sage Weil
2016-03-16 19:36                           ` Allen Samuels
2016-03-17 14:55                     ` Igor Fedotov [this message]
2016-03-17 15:28                       ` Allen Samuels
2016-03-18 13:00                         ` Igor Fedotov
2016-03-16 19:27                   ` Sage Weil
2016-03-16 19:41                     ` Allen Samuels
     [not found]                       ` <CA+z5DsxA9_LLozFrDOtnVRc7FcvN7S8OF12zswQZ4q4ysK_0BA@mail.gmail.com>
2016-03-16 22:56                         ` Blair Bethwaite
2016-03-17  3:21                           ` Allen Samuels
2016-03-17 10:01                             ` Willem Jan Withagen
2016-03-17 17:29                               ` Howard Chu
2016-03-17 15:21                             ` Igor Fedotov
2016-03-17 15:18                     ` Igor Fedotov
2016-03-17 15:33                       ` Sage Weil
2016-03-17 18:53                         ` Allen Samuels
2016-03-18 14:58                           ` Igor Fedotov
2016-03-18 15:53                         ` Igor Fedotov
2016-03-18 17:17                           ` Vikas Sinha-SSI
2016-03-19  3:14                             ` Allen Samuels
2016-03-21 14:19                             ` Igor Fedotov
2016-03-19  3:14                           ` Allen Samuels
2016-03-21 14:07                             ` Igor Fedotov
2016-03-21 15:14                               ` Allen Samuels
2016-03-21 16:35                                 ` Igor Fedotov
2016-03-21 17:14                                   ` Allen Samuels
2016-03-21 18:31                                     ` Igor Fedotov
2016-03-21 21:14                                       ` Allen Samuels
2016-03-21 15:32                             ` Igor Fedotov
2016-03-21 15:50                               ` Sage Weil
2016-03-21 18:01                                 ` Igor Fedotov
2016-03-24 12:45                                 ` Igor Fedotov
2016-03-24 22:29                                   ` Allen Samuels
2016-03-29 20:19                                   ` Sage Weil
2016-03-29 20:45                                     ` Allen Samuels
2016-03-30 12:32                                       ` Igor Fedotov
2016-03-30 12:28                                     ` Igor Fedotov
2016-03-30 12:47                                       ` Sage Weil
2016-03-31 21:56                                   ` Sage Weil
2016-04-01 18:54                                     ` Allen Samuels
2016-04-04 12:31                                     ` Igor Fedotov
2016-04-04 12:38                                     ` Igor Fedotov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56EAC579.9060208@mirantis.com \
    --to=ifedotov@mirantis.com \
    --cc=Allen.Samuels@sandisk.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.