From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f65.google.com ([209.85.214.65]:52894 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934379AbeCSV7v (ORCPT ); Mon, 19 Mar 2018 17:59:51 -0400 Received: by mail-it0-f65.google.com with SMTP id k135-v6so12403453ite.2 for ; Mon, 19 Mar 2018 14:59:50 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20180319203021.GA59118@gmail.com> References: <1520705944-6723-1-git-send-email-jix024@eng.ucsd.edu> <1520705944-6723-6-git-send-email-jix024@eng.ucsd.edu> <0924a2b3-6f21-4aaf-224d-2f5accc21d10@gmail.com> <20180311192256.GA630@zzz.localdomain> <20180319203021.GA59118@gmail.com> From: Andiry Xu Date: Mon, 19 Mar 2018 14:59:49 -0700 Message-ID: Subject: Re: [RFC v2 05/83] Add NOVA filesystem definitions and useful helper routines. To: Eric Biggers Cc: Nikolay Borisov , Linux FS Devel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , Dan Williams , "Rudoff, Andy" , coughlan@redhat.com, Steven Swanson , Dave Chinner , Jan Kara , swhiteho@redhat.com, miklos@szeredi.hu, Jian Xu , Andiry Xu , Herbert Xu Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Mon, Mar 19, 2018 at 1:30 PM, Eric Biggers wrote: > On Mon, Mar 19, 2018 at 12:39:55PM -0700, Andiry Xu wrote: >> On Sun, Mar 11, 2018 at 12:22 PM, Eric Biggers wrote: >> > On Sun, Mar 11, 2018 at 02:00:13PM +0200, Nikolay Borisov wrote: >> >> [Adding Herbert Xu to CC since he is the maintainer of the crypto subsys >> >> maintainer] >> >> >> >> On 10.03.2018 20:17, Andiry Xu wrote: >> >> >> >> >> >> > +static inline u32 nova_crc32c(u32 crc, const u8 *data, size_t len) >> >> > +{ >> >> > + u8 *ptr = (u8 *) data; >> >> > + u64 acc = crc; /* accumulator, crc32c value in lower 32b */ >> >> > + u32 csum; >> >> > + >> >> > + /* x86 instruction crc32 is part of SSE-4.2 */ >> >> > + if (static_cpu_has(X86_FEATURE_XMM4_2)) { >> >> > + /* This inline assembly implementation should be equivalent >> >> > + * to the kernel's crc32c_intel_le_hw() function used by >> >> > + * crc32c(), but this performs better on test machines. >> >> > + */ >> >> > + while (len > 8) { >> >> > + asm volatile(/* 64b quad words */ >> >> > + "crc32q (%1), %0" >> >> > + : "=r" (acc) >> >> > + : "r" (ptr), "0" (acc) >> >> > + ); >> >> > + ptr += 8; >> >> > + len -= 8; >> >> > + } >> >> > + >> >> > + while (len > 0) { >> >> > + asm volatile(/* trailing bytes */ >> >> > + "crc32b (%1), %0" >> >> > + : "=r" (acc) >> >> > + : "r" (ptr), "0" (acc) >> >> > + ); >> >> > + ptr++; >> >> > + len--; >> >> > + } >> >> > + >> >> > + csum = (u32) acc; >> >> > + } else { >> >> > + /* The kernel's crc32c() function should also detect and use the >> >> > + * crc32 instruction of SSE-4.2. But calling in to this function >> >> > + * is about 3x to 5x slower than the inline assembly version on >> >> > + * some test machines. >> >> >> >> That is really odd. Did you try to characterize why this is the case? Is >> >> it purely the overhead of dispatching to the correct backend function? >> >> That's a rather big performance hit. >> >> >> >> > + */ >> >> > + csum = crc32c(crc, data, len); >> >> > + } >> >> > + >> >> > + return csum; >> >> > +} >> >> > + >> > >> > Are you sure that CONFIG_CRYPTO_CRC32C_INTEL was enabled during your tests and >> > that the accelerated version was being called? Or, perhaps CRC32C_PCL_BREAKEVEN >> > (defined in arch/x86/crypto/crc32c-intel_glue.c) needs to be adjusted. Please >> > don't hack around performance problems like this; if they exist, they need to be >> > fixed for everyone. >> > >> >> I have performed the crc32c test on a Xeon X5647 at 2.93GHz, 14G DDR3 >> memory at 1066MHz platform. >> You are right that enabling CONFIG_CRYPTO_CRC32C_INTEL improves the >> performance significantly. nova_crc32c() is still slightly faster than >> crc32c() with the flag enabled. >> >> Result numbers are follows: data size in bytes, latency in ns, column >> 3 is crc32c() with CONFIG_CRYPTO_CRC32C_INTEL enabled and column 4 >> disabled. >> >> data size (bytes) nova_crc32c() crc32c() -enabled >> crc32c() -disabled >> 64 19 21 56 >> 128 28 29 99 >> 256 46 43 182 >> 512 82 149 354 >> 1024 157 232 728 >> 2048 305 415 1440 >> 4096 603 725 2869 >> > > Probably CRC32C_PCL_BREAKEVEN needs to be adjusted for that CPU, as I suggested > may be the case; notice that your measured speeds are about the same before 512 > (CRC32C_PCL_BREAKEVEN) bytes, but the crypto API version is slower at >= 512 > bytes. It would be possible to set the breakeven point in > crc32c_intel_mod_init() depending on the CPU. Again, if the performance is not > good enough you need to fix it for everyone, not hack around it. > We verify that by setting CRC32C_PCL_BREAKEVEN to 8192, the performance difference between nova_crc32c() and kernel's crc32c() is negligible. Thanks for the comments, and I will use kernel's crc32c() in the next version. Thanks, Andiry