From mboxrd@z Thu Jan 1 00:00:00 1970 From: Allen Samuels Subject: RE: Adding compression support for bluestore. Date: Wed, 16 Mar 2016 19:36:34 +0000 Message-ID: References: <56C1FCF3.4030505@mirantis.com> <56C3BAA3.3070804@mirantis.com> <56CDF40C.9060405@mirantis.com> <56D08E30.20308@mirantis.com> <56E9A727.1030400@mirantis.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Return-path: Received: from mail-by2on0068.outbound.protection.outlook.com ([207.46.100.68]:64083 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S935102AbcCPTgi convert rfc822-to-8bit (ORCPT ); Wed, 16 Mar 2016 15:36:38 -0400 In-Reply-To: Content-Language: en-US Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: Igor Fedotov , ceph-devel > -----Original Message----- > From: Sage Weil [mailto:sage@newdream.net] > Sent: Wednesday, March 16, 2016 2:30 PM > To: Allen Samuels > Cc: Igor Fedotov ; ceph-devel devel@vger.kernel.org> > Subject: RE: Adding compression support for bluestore. > > On Wed, 16 Mar 2016, Allen Samuels wrote: > > As described earlier, we can easily afford the cost of setting > > min_alloc_size to 4KB. I don't see any advantage in handling the > > larger allocation sizes -- only disadvantages. > > That too. The original motivation was driven by HDD behavior: if we have a > 4KB overwrite we're better off doing a WAL record and async overwrite that > allocating a new 4KB extent and overfragmenting the object. But the same > thing can be accomplished as policy in _do_write without restricting the size > of allocations. Agreed. But the size of allocations affects the compression ratio too. Effectively you're rounding up to the min_alloc_size for all of you allocations. Making a bigger compression block size tends to compensate for this -- but you pay for that in the WAL/RMW stuff. > > This is all assuming we get the allocator/freelist memory under control, which > we need to do anyway. Yes, see my previous e-mails. I believe they describe one solution (I'm sure there are others). I'm trying to hack some of that code together now, just to make sure I haven't missed anything. Assuming that my outlined solution is essentially correct, then the min_alloc size can be fixed at 4K with no downsides. This makes the selection of the compression blocksize much easier (as you limit the interaction of parameters). > > sage > > > > > > Allen Samuels > > Software Architect, Fellow, Systems and Software Solutions > > > > 2880 Junction Avenue, San Jose, CA 95134 > > T: +1 408 801 7030| M: +1 408 780 6416 allen.samuels@SanDisk.com > > > > > > > -----Original Message----- > > > From: Sage Weil [mailto:sage@newdream.net] > > > Sent: Wednesday, March 16, 2016 2:15 PM > > > To: Allen Samuels > > > Cc: Igor Fedotov ; ceph-devel > > devel@vger.kernel.org> > > > Subject: RE: Adding compression support for bluestore. > > > > > > On Wed, 16 Mar 2016, Allen Samuels wrote: > > > > > A potential issue with using WAL for compressed block overwrites > > > > > is significant WAL data volume increase. IIUC currently WAL > > > > > record can have up to 2*bluestore_min_alloc_size (i.e. 128K) > > > > > client data per single write request > > > > > - overlapped head and tail. > > > > > In case of compressed blocks this will be up to > > > > > 2*bluestore_max_compressed_block ( i.e. 8Mb ) as you can't > > > > > simply overwrite fully overlapped extents - one should operate > > > > > compression blocks now... > > > > > > > > > > Seems attractive otherwise... > > > > > > > > This is one of the fundamental tradeoffs with compression. When > > > > your > > > compression block size exceeds the minimum I/O size you either have > > > to consume time (RMW + uncompress/recompress) or you have to > consume > > > space (overlapping extents). Sage's current code essentially starts > > > out by consuming space and then assumes in the background that he'll > > > consume time to recover the space. > > > > Of course if you set the compression block size equal to or > > > > smaller than the > > > minimum I/O size you can avoid these problems -- but you create > > > others (including poor compression, needing to track very small > > > chunks of space, > > > etc.) and nobody seriously believes that this is a viable alternative. > > > > > > My inclination would be to set min_alloc_size to something smallish > > > (if not 64KB, then 32KB perhaps) and the compression_block to > > > something also reasonable (256KB or 512KB at most). That means you > > > lose some of the savings (on average, 1/2 of min_alloc_size) which > > > is more significant if compression_block is not >> min_alloc_size, > > > but it avoids the expensive r/m/w cases and big read + decompress for a > small read request... > > > > > > sage > > > >