From mboxrd@z Thu Jan 1 00:00:00 1970 From: Haomai Wang Subject: Re: ceph zstd not for bluestor due to performance reasons Date: Thu, 26 Oct 2017 06:44:25 +0000 Message-ID: References: <5cf6f721-05ea-4e38-a6b9-04cff5d6aad3@profihost.ag> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1745194595502180474==" Return-path: In-Reply-To: <5cf6f721-05ea-4e38-a6b9-04cff5d6aad3-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: Sage Weil , Stefan Priebe - Profihost AG Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org List-Id: ceph-devel.vger.kernel.org --===============1745194595502180474== Content-Type: multipart/alternative; boundary="001a114e2a626ee24f055c6d8104" --001a114e2a626ee24f055c6d8104 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Stefan Priebe - Profihost AG =E4=BA=8E2017=E5=B9=B41= 0=E6=9C=8826=E6=97=A5 =E5=91=A8=E5=9B=9B17:06=E5=86=99=E9=81=93=EF=BC=9A > Hi Sage, > > Am 25.10.2017 um 21:54 schrieb Sage Weil: > > On Wed, 25 Oct 2017, Stefan Priebe - Profihost AG wrote: > >> Hello, > >> > >> in the lumious release notes is stated that zstd is not supported by > >> bluestor due to performance reason. I'm wondering why btrfs instead > >> states that zstd is as fast as lz4 but compresses as good as zlib. > >> > >> Why is zlib than supported by bluestor? And why does btrfs / facebook > >> behave different? > >> > >> "BlueStore supports inline compression using zlib, snappy, or LZ4. (Ce= ph > >> also supports zstd for RGW compression but zstd is not recommended for > >> BlueStore for performance reasons.)" > > > > zstd will work but in our testing the performance wasn't great for > > bluestore in particular. The problem was that for each compression run > > there is a relatively high start-up cost initializing the zstd > > context/state (IIRC a memset of a huge memory buffer) that dominated th= e > > execution time... primarily because bluestore is generally compressing > > pretty small chunks of data at a time, not big buffers or streams. > > > > Take a look at unittest_compression timings on compressing 16KB buffers > > (smaller than bluestore needs usually, but illustrated of the problem): > > > > [ RUN ] Compressor/CompressorTest.compress_16384/0 > > [plugin zlib (zlib/isal)] > > [ OK ] Compressor/CompressorTest.compress_16384/0 (294 ms) > > [ RUN ] Compressor/CompressorTest.compress_16384/1 > > [plugin zlib (zlib/noisal)] > > [ OK ] Compressor/CompressorTest.compress_16384/1 (1755 ms) > > [ RUN ] Compressor/CompressorTest.compress_16384/2 > > [plugin snappy (sna > ppy)] > > [ OK ] Compressor/CompressorTest.compress_16384/2 (169 ms) > > [ RUN ] Compressor/CompressorTest.compress_16384/3 > > [plugin zstd (zstd)] > > [ OK ] Compressor/CompressorTest.compress_16384/3 (4528 ms) > > > > It's an order of magnitude slower than zlib or snappy, which probably > > isn't acceptable--even if it is a bit smaller. > > > > We just updated to a newer zstd the other day but I haven't been paying > > attention to the zstd code changes. When I was working on this the > plugin > > was initially also misusing the zstd API, but it was also pointed out > > that the size of the memset is dependent on the compression level. > > Maybe a different (default) choice there woudl help. > > > > https://github.com/facebook/zstd/issues/408#issuecomment-252163241 > > thanks for the fast reply. Btrfs uses a default compression level of 3 > but i think this is the default anyway. > > Does the zstd plugin of ceph already uses the mentioned > ZSTD_resetCStream instead of creating and initializing a new one every > time? > > So if performance matters ceph would recommand snappy? > in our test, lz4 is better than snappy > > Greets, > Stefan > _______________________________________________ > ceph-users mailing list > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > --001a114e2a626ee24f055c6d8104 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi Sage= ,

Am 25.10.2017 um 21:54 schrieb Sage Weil:
> On Wed, 25 Oct 2017, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> in the lumious release notes is stated that zstd is not supported = by
>> bluestor due to performance reason. I'm wondering why btrfs in= stead
>> states that zstd is as fast as lz4 but compresses as good as zlib.=
>>
>> Why is zlib than supported by bluestor? And why does btrfs / faceb= ook
>> behave different?
>>
>> "BlueStore supports inline compression using zlib, snappy, or= LZ4. (Ceph
>> also supports zstd for RGW compression but zstd is not recommended= for
>> BlueStore for performance reasons.)"
>
> zstd will work but in our testing the performance wasn't great for=
> bluestore in particular.=C2=A0 The problem was that for each compressi= on run
> there is a relatively high start-up cost initializing the zstd
> context/state (IIRC a memset of a huge memory buffer) that dominated t= he
> execution time... primarily because bluestore is generally compressing=
> pretty small chunks of data at a time, not big buffers or streams.
>
> Take a look at unittest_compression timings on compressing 16KB buffer= s
> (smaller than bluestore needs usually, but illustrated of the problem)= :
>
> [ RUN=C2=A0 =C2=A0 =C2=A0 ] Compressor/CompressorTest.compress_16384/0=
> [plugin zlib (zlib/isal)]
> [=C2=A0 =C2=A0 =C2=A0 =C2=A0OK ] Compressor/CompressorTest.compress_16= 384/0 (294 ms)
> [ RUN=C2=A0 =C2=A0 =C2=A0 ] Compressor/CompressorTest.compress_16384/1=
> [plugin zlib (zlib/noisal)]
> [=C2=A0 =C2=A0 =C2=A0 =C2=A0OK ] Compressor/CompressorTest.compress_16= 384/1 (1755 ms)
> [ RUN=C2=A0 =C2=A0 =C2=A0 ] Compressor/CompressorTest.compress_16384/2=
> [plugin snapp
y (snappy)]
> [=C2=A0 =C2=A0 =C2=A0 =C2=A0OK ] Compressor/CompressorTest.compress_16= 384/2 (169 ms)
> [ RUN=C2=A0 =C2=A0 =C2=A0 ] Compressor/CompressorTest.compress_16384/3=
> [plugin zstd (zstd)]
> [=C2=A0 =C2=A0 =C2=A0 =C2=A0OK ] Compressor/CompressorTest.compress_16= 384/3 (4528 ms)
>
> It's an order of magnitude slower than zlib or snappy, which proba= bly
> isn't acceptable--even if it is a bit smaller.
>
> We just updated to a newer zstd the other day but I haven't been p= aying
> attention to the zstd code changes.=C2=A0 When I was working on this t= he plugin
> was initially also misusing the zstd API, but it was also pointed out<= br> > that the size of the memset is dependent on the compression level.
> Maybe a different (default) choice there woudl help.
>
> https://github.com/facebook/z= std/issues/408#issuecomment-252163241

thanks for the fast reply. Btrfs uses a default compression level of 3
but i think this is the default anyway.

Does the zstd plugin of ceph already uses the mentioned
ZSTD_resetCStream instead of creating and initializing a new one every time= ?

So if performance matters ceph would recommand snappy?


in our test, lz4 is better than snappy

Greets,
Stefan
_______________________________________________
ceph-users mailing list
ceph-users@l= ists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-= ceph.com
--001a114e2a626ee24f055c6d8104-- --===============1745194595502180474== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com --===============1745194595502180474==--