linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeffrey Lien <Jeff.Lien@wdc.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"herbert@gondor.apana.org.au" <herbert@gondor.apana.org.au>,
	"tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	David Darrington <david.darrington@wdc.com>,
	Jeff Furlong <jeff.furlong@wdc.com>
Subject: RE: [PATCH] Performance Improvement in CRC16 Calculations.
Date: Thu, 16 Aug 2018 14:02:41 +0000	[thread overview]
Message-ID: <SN1PR04MB1824A5944E0428AC84B62886EA3E0@SN1PR04MB1824.namprd04.prod.outlook.com> (raw)
In-Reply-To: <20180810201601.GA80850@gmail.com>

Eric,
We did not test the slice by 4 or 8 tables.  I'm not sure of  the value of doing that since the slice by 16 will provide the best performance gain.   If I'm missing anything here, please let me know.   

I'm working on a new version of the patch based on the feedback from others and will also change the pointer variables to start with p and fix the indenting you mentioned below in the new version of the patch.  

Thanks

Jeff Lien

-----Original Message-----
From: Eric Biggers [mailto:ebiggers@kernel.org] 
Sent: Friday, August 10, 2018 3:16 PM
To: Jeffrey Lien <Jeff.Lien@wdc.com>
Cc: linux-kernel@vger.kernel.org; linux-crypto@vger.kernel.org; linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; herbert@gondor.apana.org.au; tim.c.chen@linux.intel.com; martin.petersen@oracle.com; David Darrington <david.darrington@wdc.com>; Jeff Furlong <jeff.furlong@wdc.com>
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote:
> This patch provides a performance improvement for the CRC16 
> calculations done in read/write workloads using the T10 Type 1/2/3 
> guard field.  For example, today with sequential write workloads (one 
> thread/CPU of IO) we consume 100% of the CPU because of the CRC16 
> computation bottleneck.  Today's block devices are considerably 
> faster, but the CRC16 calculation prevents folks from utilizing the 
> throughput of such devices.  To speed up this calculation and expose 
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop, with a larger CRC table to match.  The result has shown 5x performance improvements on various big endian and little endian systems running the 4.18.0 kernel version.
> 
> FIO Sequential Write, 64K Block Size, Queue Depth 64
> BE Base Kernel:        bw=201.5 MiB/s
> BE Modified CRC Calc:  bw=968.1 MiB/s
> 4.80x performance improvement
> 
> LE Base Kernel:        bw=357 MiB/s
> LE Modified CRC Calc:  bw=1964 MiB/s
> 5.51x performance improvement
> 
> FIO Sequential Read, 64K Block Size, Queue Depth 64
> BE Base Kernel:        bw=611.2 MiB/s
> BE Modified CRC calc:  bw=684.9 MiB/s
> 1.12x performance improvement
> 
> LE Base Kernel:        bw=797 MiB/s
> LE Modified CRC Calc:  bw=2730 MiB/s
> 3.42x performance improvement

Did you also test the slice-by-4 (requires 2048-byte table) and slice-by-8 (requires 4096-byte table) methods?  Your proposal is slice-by-16 (requires 8192-byte table); the original was slice-by-1 (requires 512-byte table).

>  __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, 
> size_t len)  {
> -	unsigned int i;
> +	const __u8 *i = (const __u8 *)buffer;
> +	const __u8 *i_end = i + len;
> +	const __u8 *i_last16 = i + (len / 16 * 16);

'i' is normally a loop counter, not a pointer.
Use 'p', 'p_end', and 'p_last16'.

>  
> -	for (i = 0 ; i < len ; i++)
> -		crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> +	for (; i < i_last16; i += 16) {
> +		crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^
> +		t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^
> +		t10_dif_crc_table[13][i[2]] ^
> +		t10_dif_crc_table[12][i[3]] ^
> +		t10_dif_crc_table[11][i[4]] ^
> +		t10_dif_crc_table[10][i[5]] ^
> +		t10_dif_crc_table[9][i[6]] ^
> +		t10_dif_crc_table[8][i[7]] ^
> +		t10_dif_crc_table[7][i[8]] ^
> +		t10_dif_crc_table[6][i[9]] ^
> +		t10_dif_crc_table[5][i[10]] ^
> +		t10_dif_crc_table[4][i[11]] ^
> +		t10_dif_crc_table[3][i[12]] ^
> +		t10_dif_crc_table[2][i[13]] ^
> +		t10_dif_crc_table[1][i[14]] ^
> +		t10_dif_crc_table[0][i[15]];
> +	}

Please indent this properly.

		crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^
		      t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^
		      t10_dif_crc_table[13][i[2]] ^
		      t10_dif_crc_table[12][i[3]] ^
		      t10_dif_crc_table[11][i[4]] ^
		      ...

- Eric

  reply	other threads:[~2018-08-16 14:02 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-10 19:12 [PATCH] Performance Improvement in CRC16 Calculations Jeff Lien
2018-08-10 19:23 ` Joe Perches
2018-08-10 20:02   ` Nicolas Pitre
2018-08-11  0:11     ` Joe Perches
2018-08-11  0:34       ` Nicolas Pitre
2018-08-11  2:39       ` Douglas Gilbert
2018-08-11  9:04         ` Joe Perches
2018-08-11 15:06           ` Joe Perches
2018-08-13 18:41             ` Jeffrey Lien
2018-08-13  3:36       ` Douglas Gilbert
2018-08-13  4:29         ` Joe Perches
2018-08-10 20:00 ` Nicolas Pitre
2018-08-10 20:16 ` Eric Biggers
2018-08-16 14:02   ` Jeffrey Lien [this message]
2018-08-16 14:22     ` Douglas Gilbert
2018-08-16 15:41       ` Christophe LEROY
2018-08-16 17:38         ` Douglas Gilbert
2018-08-17  3:20           ` Martin K. Petersen
2018-08-16 15:47     ` Christophe LEROY
2018-08-10 20:56 ` Douglas Gilbert
2018-08-11 15:36 ` Martin K. Petersen
2018-08-11 16:35   ` Joe Perches
2018-08-22  1:40   ` Martin K. Petersen
2018-08-22  6:20     ` Christoph Hellwig
2018-08-24 15:32       ` Jeffrey Lien
2018-08-24 15:39         ` Ard Biesheuvel
2018-08-24 16:29           ` Martin K. Petersen
2018-08-24 17:38             ` Ard Biesheuvel
2018-08-24 21:46               ` Martin K. Petersen
2018-08-24 21:54                 ` Ard Biesheuvel
2018-08-24 22:12                   ` Martin K. Petersen
2018-08-25  6:12                 ` Herbert Xu
2018-08-26  2:35                   ` Martin K. Petersen
2018-08-26  2:40                   ` [PATCH 1/4] crypto: Introduce notifier for new crypto algorithms Martin K. Petersen
2018-08-26  2:40                     ` [PATCH 2/4] crc-t10dif: Pick better transform if one becomes available Martin K. Petersen
2018-08-27  6:13                       ` Herbert Xu
2018-08-26  2:40                     ` [PATCH 3/4] crc-t10dif: Allow current transform to be inspected in sysfs Martin K. Petersen
2018-08-26  2:40                     ` [PATCH 4/4] block: Integrity profile init function to trigger module loads Martin K. Petersen
2018-08-26  8:22                       ` Ard Biesheuvel
2018-08-26 13:30                         ` Martin K. Petersen
2018-08-26 13:44                           ` Ard Biesheuvel
2018-08-26 13:48                             ` Martin K. Petersen
2018-08-27  6:09                     ` [PATCH 1/4] crypto: Introduce notifier for new crypto algorithms Herbert Xu
2018-08-30 14:57                       ` Martin K. Petersen
2018-08-30 15:00                       ` [PATCH v2 1/3] " Martin K. Petersen
2018-08-30 15:00                         ` [PATCH v2 2/3] crc-t10dif: Pick better transform if one becomes available Martin K. Petersen
2018-08-30 15:00                         ` [PATCH v2 3/3] crc-t10dif: Allow current transform to be inspected in sysfs Martin K. Petersen
2018-08-31 17:17                         ` [PATCH v2 1/3] crypto: Introduce notifier for new crypto algorithms Jeffrey Lien
2018-09-04  5:21                         ` Herbert Xu
2018-09-04 13:30                         ` Torsten Duwe
2018-08-24 16:30         ` [PATCH] Performance Improvement in CRC16 Calculations Martin K. Petersen
2018-08-13  4:44 ` Chaitanya Kulkarni
2018-08-13 11:45 ` David Laight
2018-08-13 13:50   ` David Laight
2018-08-13 22:44 ` Tim Chen
2018-08-15 12:51   ` Jeffrey Lien
2018-08-15 18:31 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN1PR04MB1824A5944E0428AC84B62886EA3E0@SN1PR04MB1824.namprd04.prod.outlook.com \
    --to=jeff.lien@wdc.com \
    --cc=david.darrington@wdc.com \
    --cc=ebiggers@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jeff.furlong@wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).