All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Fedotov <ifedotov@mirantis.com>
To: Allen Samuels <Allen.Samuels@sandisk.com>, Sage Weil <sage@newdream.net>
Cc: ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: Adding compression support for bluestore.
Date: Fri, 18 Mar 2016 16:00:57 +0300	[thread overview]
Message-ID: <56EBFC09.1060008@mirantis.com> (raw)
In-Reply-To: <CY1PR0201MB18974651A4981E5F6889AFF7E88B0@CY1PR0201MB1897.namprd02.prod.outlook.com>



On 17.03.2016 18:28, Allen Samuels wrote:
>> flow for EC and replicated pools. The scheme that you propose means that
>> EC chunking boundaries become fluid and data-sensitive -- destroying the
>> "seek" capability (i.e., you no longer know which node has any given logical
>> address within the object). Essentially you'll need an entirely different
>> backend flow for EC pools (at this level) with a complicated metadata
>> mapping scheme. That seems MUCH more complicated and run-time
>> expensive to me.
>> Wouldn't agree with this statement.  Perhaps I improperly presented my
>> ideas or missing something...
>> IMHO current EC pool's write pattern is just a regular append only mode.
> This is where we diverge. Sam and I worked out a blueprint for doing non append-only writes into EC pools (i.e., partial and/or complete overwrites).
>
> See https://github.com/athanatos/ceph/blob/wip-ec-overwrites/doc/dev/osd_internals/ec_overwrites.rst
>
> This allows all of the current restrictions in the usages of EC Pools to be eliminated, enabling all protocols to directly utilize EC pools.
>
> If you do compression BEFORE you do EC, then you have a real problem with landing your data across the different nodes of an EC stripe in the non-append case.
>
> BTW, it's BlueStore itself that enables this new capability to be implemented efficiently, it's very expensive to do this with FileStore (an additional full copy of the data is required, i.e., 3x write-amp on FileStore)
>   
Got it.  That's where we diverge:
Actually speaking of compression layer segregation in this thread I 
meant having it AFTER EC/Replicated pools and BEFORE the bluestore. Not 
BEFORE pools ... IMO in my case such a layer is absolutely similar for 
both replicated and EC pools as EC write patterns are always (for both 
append only and overwrite modes) a subset of replicated pool one.

Anyway thanks a lot for your clarifications. I highly appreciate your 
help...

>
>> And read pattern is partially random - EC reads data in arbitrary order at
>> specific offsets only. As long as some layer is able to handle such patterns it's
>> probably OK for EC pool. And I don't see any reasons why compression layer
>> is unable to do that and what's the difference comparing to replicated pools.
>> Actually my idea  about segregation was mainly about reusing existing
>> bluestore rather than modifying it. Compression engine should somehow
>> (e.g. by inheriting from bluestore and overriding _do_write/_do_read
>> methods ) intercept write/read requests and maintain its OWN block
>> management independent from bluestore one. Bluestore is left untouched
>> and exposes its functionality ( via Read/Write handlers) AS-IS to the
>> compression layer instead of pools. The key thing is that compressed blocks
>> map and bluestore extents map use the same logical offset, i.e.
>> if some compressed block starts at offset X it's written to bluestore at offset
>> X too. But written block is shorter than original one and thus store space is
>> saved.
>> I would agree with the comment that this probably complicates metadata
>> handling - compression layer metadata has to be handled similar to bluestore
>> ones ( proper sync, WAL, transaction, etc). But I don't see any issues specific
>> to EC here...
>> Have I missed something?
> No doubt the metadata gets more complicated in the presence of compression. Especially if we want to enable the sort of lazy partial-overwrite with background cleanup that seems to be the most desirable.
Agree.

Thanks,
Igor

  reply	other threads:[~2016-03-18 13:00 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-15 16:29 Adding compression support for bluestore Igor Fedotov
2016-02-16  2:06 ` Haomai Wang
2016-02-17  0:11   ` Igor Fedotov
2016-02-19 23:13     ` Allen Samuels
2016-02-22 12:25       ` Sage Weil
2016-02-24 18:18         ` Igor Fedotov
2016-02-24 18:43           ` Allen Samuels
2016-02-26 17:41             ` Igor Fedotov
2016-03-15 17:12               ` Sage Weil
2016-03-16  1:06                 ` Allen Samuels
2016-03-16 18:34                 ` Igor Fedotov
2016-03-16 19:02                   ` Allen Samuels
2016-03-16 19:15                     ` Sage Weil
2016-03-16 19:20                       ` Allen Samuels
2016-03-16 19:29                         ` Sage Weil
2016-03-16 19:36                           ` Allen Samuels
2016-03-17 14:55                     ` Igor Fedotov
2016-03-17 15:28                       ` Allen Samuels
2016-03-18 13:00                         ` Igor Fedotov [this message]
2016-03-16 19:27                   ` Sage Weil
2016-03-16 19:41                     ` Allen Samuels
     [not found]                       ` <CA+z5DsxA9_LLozFrDOtnVRc7FcvN7S8OF12zswQZ4q4ysK_0BA@mail.gmail.com>
2016-03-16 22:56                         ` Blair Bethwaite
2016-03-17  3:21                           ` Allen Samuels
2016-03-17 10:01                             ` Willem Jan Withagen
2016-03-17 17:29                               ` Howard Chu
2016-03-17 15:21                             ` Igor Fedotov
2016-03-17 15:18                     ` Igor Fedotov
2016-03-17 15:33                       ` Sage Weil
2016-03-17 18:53                         ` Allen Samuels
2016-03-18 14:58                           ` Igor Fedotov
2016-03-18 15:53                         ` Igor Fedotov
2016-03-18 17:17                           ` Vikas Sinha-SSI
2016-03-19  3:14                             ` Allen Samuels
2016-03-21 14:19                             ` Igor Fedotov
2016-03-19  3:14                           ` Allen Samuels
2016-03-21 14:07                             ` Igor Fedotov
2016-03-21 15:14                               ` Allen Samuels
2016-03-21 16:35                                 ` Igor Fedotov
2016-03-21 17:14                                   ` Allen Samuels
2016-03-21 18:31                                     ` Igor Fedotov
2016-03-21 21:14                                       ` Allen Samuels
2016-03-21 15:32                             ` Igor Fedotov
2016-03-21 15:50                               ` Sage Weil
2016-03-21 18:01                                 ` Igor Fedotov
2016-03-24 12:45                                 ` Igor Fedotov
2016-03-24 22:29                                   ` Allen Samuels
2016-03-29 20:19                                   ` Sage Weil
2016-03-29 20:45                                     ` Allen Samuels
2016-03-30 12:32                                       ` Igor Fedotov
2016-03-30 12:28                                     ` Igor Fedotov
2016-03-30 12:47                                       ` Sage Weil
2016-03-31 21:56                                   ` Sage Weil
2016-04-01 18:54                                     ` Allen Samuels
2016-04-04 12:31                                     ` Igor Fedotov
2016-04-04 12:38                                     ` Igor Fedotov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56EBFC09.1060008@mirantis.com \
    --to=ifedotov@mirantis.com \
    --cc=Allen.Samuels@sandisk.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.