All of lore.kernel.org
 help / color / mirror / Atom feed
From: "H. Peter Anvin" <hpa@zytor.com>
To: David Laight <David.Laight@ACULAB.COM>,
	"'Linus Torvalds'" <torvalds@linux-foundation.org>
Cc: Noah Goldstein <goldstein.w.n@gmail.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"oe-kbuild-all@lists.linux.dev" <oe-kbuild-all@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"edumazet@google.com" <edumazet@google.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Subject: RE: x86/csum: Remove unnecessary odd handling
Date: Sat, 06 Jan 2024 17:09:09 -0800	[thread overview]
Message-ID: <4313F9BB-DE2E-448F-A366-A68CAEA2BFE0@zytor.com> (raw)
In-Reply-To: <124b21857fe44e499e29800cbf4f63f8@AcuMS.aculab.com>

On January 6, 2024 2:08:48 PM PST, David Laight <David.Laight@ACULAB.COM> wrote:
>From: Linus Torvalds
>> Sent: 05 January 2024 18:06
>> 
>> On Fri, 5 Jan 2024 at 02:41, David Laight <David.Laight@aculab.com> wrote:
>> >
>> > Interesting, I'm pretty sure trying to get two blocks of
>> >  'adc' scheduled in parallel like that doesn't work.
>> 
>> You should check out the benchmark at
>> 
>>        https://github.com/fenrus75/csum_partial
>> 
>> and see if you can improve on it. I'm including the patch (on top of
>> that code by Arjan) to implement the actual current kernel version as
>> "New version".
>
>Annoyingly (for me) you are partially right...
>
>I found where my ip checksum perf code was hiding and revisited it.
>Although I found comments elsewhere that the 'jecxz, adc, adc, lea, jmp'
>did an adc every clock it isn't happening for me now.
>
>I'm only measuring the inner loop for multiples of 64 bytes.
>The code less than 8 bytes and partial final words is a
>separate problem.
>The less unrolled the main loop, the less overhead there'll
>be for 'normal' sizes.
>So I've changed your '80 byte' block to 64 bytes for consistency.
>
>I'm ignoring pre-sandy bridge cpu (no split flags) and pre-broadwell
>(adc takes two clocks - although adc to alternate regs is one clock
>on sandy bridge).
>My test system is an i7-7700, I think anything from broadwell (gen 4)
>will be at least as good.
>I don't have a modern amd cpu.
>
>The best loop for 256+ bytes is an adxc/adxo one.
>However that requires the run-time patching.
>Followed by new kernel version (two blocks of 4 adc).
>The surprising one is:
>		xor	sum, sum
>	1:	adc	(buff), sum
>		adc	8(buff), sum
>		lea	16(buff), buff
>		dec	count
>		jnz	1b
>		adc	$0, sum
>For 256 bytes it is only a couple of clocks slower.
>Maybe 10% slower for 512+ bytes.
>But it need almost no extra code for 'normal' buffer sizes.
>By comparison the adxc/adxo one is 20% faster.
>
>The code is doing:
>	old = rdpmc
>	mfence
>	csum = do_csum(buf, len);
>	mfence
>	clocks = rdpmc - old
>(That is directly reading the pmc register.)
>With 'no-op' function it takes 160 clocks (I-cache resident).
>Without the mfence 40 - but pretty much everything can execute
>after the 2nd rdpmc.
>
>I've attached my (horrid) test program.
>
>	David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
>Registration No: 1397386 (Wales)

Rather than runtime patching perhaps separate paths...

  reply	other threads:[~2024-01-07  1:18 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20230628020657.957880-1-goldstein.w.n@gmail.com>
2023-06-28  9:12 ` x86/csum: Remove unnecessary odd handling Borislav Petkov
2023-06-28 15:32   ` Noah Goldstein
2023-06-28 17:44     ` Linus Torvalds
2023-06-28 18:34       ` Noah Goldstein
2023-06-28 20:02         ` Linus Torvalds
2023-06-29 14:04   ` David Laight
2023-06-29 14:27   ` David Laight
2023-09-01 22:21 ` Noah Goldstein
2023-09-06 13:49   ` David Laight
2023-09-06 14:38   ` David Laight
2023-09-20 19:20     ` Noah Goldstein
2023-09-20 19:23 ` Noah Goldstein
2023-09-23  3:24   ` kernel test robot
2023-09-23 14:05     ` Noah Goldstein
2023-09-23 21:13       ` David Laight
2023-09-24 14:35         ` Noah Goldstein
2023-12-23 22:18           ` Noah Goldstein
2024-01-04 23:28             ` Noah Goldstein
2024-01-04 23:34               ` Dave Hansen
2024-01-04 23:36               ` Linus Torvalds
2024-01-05  0:33                 ` Linus Torvalds
2024-01-05 10:41                   ` David Laight
2024-01-05 16:12                     ` David Laight
2024-01-05 18:05                     ` Linus Torvalds
2024-01-05 23:52                       ` David Laight
2024-01-06  0:18                         ` Linus Torvalds
2024-01-06 10:26                           ` Eric Dumazet
2024-01-06 19:32                             ` Linus Torvalds
2024-01-07 12:11                             ` David Laight
2024-01-06 22:08                       ` David Laight
2024-01-07  1:09                         ` H. Peter Anvin [this message]
2024-01-07 11:44                           ` David Laight
2023-09-24 14:35 ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4313F9BB-DE2E-448F-A366-A68CAEA2BFE0@zytor.com \
    --to=hpa@zytor.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=goldstein.w.n@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oe-kbuild-all@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.