From mboxrd@z Thu Jan 1 00:00:00 1970 From: Igor Fedotov Subject: Re: Adding compression support for bluestore. Date: Mon, 4 Apr 2016 15:31:35 +0300 Message-ID: <57025EA7.5040601@mirantis.com> References: <56C1FCF3.4030505@mirantis.com> <56C3BAA3.3070804@mirantis.com> <56CDF40C.9060405@mirantis.com> <56D08E30.20308@mirantis.com> <56E9A727.1030400@mirantis.com> <56EACAAD.90002@mirantis.com> <56EC248E.3060502@mirantis.com> <56F013FB.4040002@mirantis.com> <56F3E157.2090004@mirantis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-lb0-f171.google.com ([209.85.217.171]:35444 "EHLO mail-lb0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751747AbcDDMbk (ORCPT ); Mon, 4 Apr 2016 08:31:40 -0400 Received: by mail-lb0-f171.google.com with SMTP id bc4so158077683lbc.2 for ; Mon, 04 Apr 2016 05:31:39 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: Allen Samuels , ceph-devel On 01.04.2016 0:56, Sage Weil wrote: > How about this: > > // in the onode: > map data_map; > map blob_map; > > // in the enode > map blob_map; > > struct bluestore_lextent_t { > enum { > FLAG_SHARED = 1, ///< pextent lives in enode > }; > > uint64_t logical_length; ///< length of logical bytes we represent No need for that field - x_len is exactly the same. > uint32_t pextent_id; ///< id of pextent in onode or enode blob_id as Allen already mentioned. > uint32_t x_off, x_len; ///< relative portion of pextent with our data > uint32_t flags; ///< FLAG_* > }; > > struct bluestore_pextent_t { > uint64_t offset; ///< offset on disk > uint64_t length; ///< length on disk > }; > > struct bluestore_blob_t { > enum { > CSUM_XXHASH32 = 1, > CSUM_XXHASH64 = 2, > CSUM_CRC32C = 3, > CSUM_CRC16 = 4, > }; > enum { > FLAG_IMMUTABLE = 1, ///< no overwrites allowed > FLAG_COMPRESSED = 2, ///< extent is compressed; alg is in first byte of data > }; > enum { > COMP_ZLIB = 1, > COMP_SNAPPY = 2, > COMP_LZO = 3, > }; > > vector extents; ///< extents on disk Major reasons to have a set of pextents instead of a single one are as follows, right? 1) To be able to handle the case Allen pointed out when we are unable to allocate contiguous region during the writing. 2) To be able to deallocate the blob partially, e.g. when it's partially occluded. > uint32_t logical_length; ///< uncompressed length > uint32_t flags; ///< FLAG_* > uint8_t csum_type; ///< CSUM_* > uint8_t csum_block_order; > uint16_t num_refs; ///< reference count (always 1 when in onode) > vector csum_data; ///< opaque vector of csum data > > uint32_t get_ondisk_length() const { > uint32_t len = 0; > for (auto &p : extentes) { > len += p.length; > } > return len; > } > > uint32_t get_csum_block_size() const { > return 1 << csum_block_order; > } Shouldn't we try to maintain single pextent size equal (or aligned) to this one? > size_t get_csum_value_size() const { > switch (csum_type) { > case CSUM_XXHASH32: return 4; > case CSUM_XXHASH64: return 8; > case CSUM_CRC32C: return 4; > case CSUM_CRC16: return 2; > default: return 0; > } > } > > // assert (ondisk_length / csum_block_size) * csum_value_size == > // csum_data.length() > };