All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: David Laight <David.Laight@aculab.com>
Cc: Noah Goldstein <goldstein.w.n@gmail.com>,
	Johannes Berg <johannes@sipsolutions.net>,
	"alexanderduyck@fb.com" <alexanderduyck@fb.com>,
	"kbuild-all@lists.01.org" <kbuild-all@lists.01.org>,
	open list <linux-kernel@vger.kernel.org>,
	"linux-um@lists.infradead.org" <linux-um@lists.infradead.org>,
	"lkp@intel.com" <lkp@intel.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	X86 ML <x86@kernel.org>
Subject: Re: [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12: error: implicit declaration of function 'load_unaligned_zeropad'
Date: Fri, 26 Nov 2021 10:09:45 -0800	[thread overview]
Message-ID: <CANn89iJubuJxjVp4fx78-bjKBN3e9JsdAwZxj4XO6g2_7ZPqJQ@mail.gmail.com> (raw)
In-Reply-To: <4dbf7f8d095b46a8a45e285d0ec8f8b0@AcuMS.aculab.com>

On Fri, Nov 26, 2021 at 9:18 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 25 November 2021 04:01
> ...
> > > The outputs seem to match if `buff` is aligned to 64-bit. Still see
> > > difference with `csum_fold(csum_partial())` if `buff` is not 64-bit aligned.
> > >
> > > The comment at the top says it's "best" to have `buff` 64-bit aligned but
> > > the code logic seems meant to support the misaligned case so not
> > > sure if it's an issue.
> > >
> >
> > It is an issue in general, not in standard cases because network
> > headers are aligned.
> >
> > I think it came when I folded csum_partial() and do_csum(), I forgot
> > to ror() the seed.
> >
> > I suspect the following would help:
> >
> > diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
> > index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..ee7b0e7a6055bcbef42d22f7e1d8f52ddbd6be6d
> > 100644
> > --- a/arch/x86/lib/csum-partial_64.c
> > +++ b/arch/x86/lib/csum-partial_64.c
> > @@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
> >         if (unlikely(odd)) {
> >                 if (unlikely(len == 0))
> >                         return sum;
> > +               temp64 = ror32((__force u64)sum, 8);
> >                 temp64 += (*(unsigned char *)buff << 8);
> >                 len--;
> >                 buff++;
>
> You can save an instruction (as if this path matters) by:
>                         temp64 = sum + *(unsigned char *)buff;

This might overflow, sum is 32bit.

Given that we have temp64 = (__force u64)sum;  already done at this
point, we can instead

temp64 += *(u8 *)buff;

>                         temp64 <<= 8;



> Although that probably falls foul of 64bit shifts being slow.

Are they slower than the ror32(sum, 8) ?

> So maybe just:
>                         sum += *(unsigned char *)buff;

we might miss a carry/overflow here

>                         temp64 = bswap32(sum);


> AFAICT (from a pdf) bswap32() and ror(x, 8) are likely to be
> the same speed but may use different execution units.
>
> Intel seem so have managed to slow down ror(x, %cl) to 3 clocks
> in sandy bridge - and still not fixed it.
> Although the compiler might be making a pigs-breakfast of the
> register allocation when you tried setting 'odd = 8'.
>
> Weeks can be spent fiddling with this code :-(

Yes, and in the end, it won't be able to compete with  a
specialized/inlined ipv6_csum_partial()

WARNING: multiple messages have this Message-ID (diff)
From: Eric Dumazet <edumazet@google.com>
To: kbuild-all@lists.01.org
Subject: Re: [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12: error: implicit declaration of function 'load_unaligned_zeropad'
Date: Fri, 26 Nov 2021 10:09:45 -0800	[thread overview]
Message-ID: <CANn89iJubuJxjVp4fx78-bjKBN3e9JsdAwZxj4XO6g2_7ZPqJQ@mail.gmail.com> (raw)
In-Reply-To: <4dbf7f8d095b46a8a45e285d0ec8f8b0@AcuMS.aculab.com>

[-- Attachment #1: Type: text/plain, Size: 2536 bytes --]

On Fri, Nov 26, 2021 at 9:18 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 25 November 2021 04:01
> ...
> > > The outputs seem to match if `buff` is aligned to 64-bit. Still see
> > > difference with `csum_fold(csum_partial())` if `buff` is not 64-bit aligned.
> > >
> > > The comment at the top says it's "best" to have `buff` 64-bit aligned but
> > > the code logic seems meant to support the misaligned case so not
> > > sure if it's an issue.
> > >
> >
> > It is an issue in general, not in standard cases because network
> > headers are aligned.
> >
> > I think it came when I folded csum_partial() and do_csum(), I forgot
> > to ror() the seed.
> >
> > I suspect the following would help:
> >
> > diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
> > index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..ee7b0e7a6055bcbef42d22f7e1d8f52ddbd6be6d
> > 100644
> > --- a/arch/x86/lib/csum-partial_64.c
> > +++ b/arch/x86/lib/csum-partial_64.c
> > @@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
> >         if (unlikely(odd)) {
> >                 if (unlikely(len == 0))
> >                         return sum;
> > +               temp64 = ror32((__force u64)sum, 8);
> >                 temp64 += (*(unsigned char *)buff << 8);
> >                 len--;
> >                 buff++;
>
> You can save an instruction (as if this path matters) by:
>                         temp64 = sum + *(unsigned char *)buff;

This might overflow, sum is 32bit.

Given that we have temp64 = (__force u64)sum;  already done at this
point, we can instead

temp64 += *(u8 *)buff;

>                         temp64 <<= 8;



> Although that probably falls foul of 64bit shifts being slow.

Are they slower than the ror32(sum, 8) ?

> So maybe just:
>                         sum += *(unsigned char *)buff;

we might miss a carry/overflow here

>                         temp64 = bswap32(sum);


> AFAICT (from a pdf) bswap32() and ror(x, 8) are likely to be
> the same speed but may use different execution units.
>
> Intel seem so have managed to slow down ror(x, %cl) to 3 clocks
> in sandy bridge - and still not fixed it.
> Although the compiler might be making a pigs-breakfast of the
> register allocation when you tried setting 'odd = 8'.
>
> Weeks can be spent fiddling with this code :-(

Yes, and in the end, it won't be able to compete with  a
specialized/inlined ipv6_csum_partial()

WARNING: multiple messages have this Message-ID (diff)
From: Eric Dumazet <edumazet@google.com>
To: David Laight <David.Laight@aculab.com>
Cc: Noah Goldstein <goldstein.w.n@gmail.com>,
	Johannes Berg <johannes@sipsolutions.net>,
	"alexanderduyck@fb.com" <alexanderduyck@fb.com>,
	"kbuild-all@lists.01.org" <kbuild-all@lists.01.org>,
	open list <linux-kernel@vger.kernel.org>,
	"linux-um@lists.infradead.org" <linux-um@lists.infradead.org>,
	"lkp@intel.com" <lkp@intel.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	X86 ML <x86@kernel.org>
Subject: Re: [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12: error: implicit declaration of function 'load_unaligned_zeropad'
Date: Fri, 26 Nov 2021 10:09:45 -0800	[thread overview]
Message-ID: <CANn89iJubuJxjVp4fx78-bjKBN3e9JsdAwZxj4XO6g2_7ZPqJQ@mail.gmail.com> (raw)
In-Reply-To: <4dbf7f8d095b46a8a45e285d0ec8f8b0@AcuMS.aculab.com>

On Fri, Nov 26, 2021 at 9:18 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 25 November 2021 04:01
> ...
> > > The outputs seem to match if `buff` is aligned to 64-bit. Still see
> > > difference with `csum_fold(csum_partial())` if `buff` is not 64-bit aligned.
> > >
> > > The comment at the top says it's "best" to have `buff` 64-bit aligned but
> > > the code logic seems meant to support the misaligned case so not
> > > sure if it's an issue.
> > >
> >
> > It is an issue in general, not in standard cases because network
> > headers are aligned.
> >
> > I think it came when I folded csum_partial() and do_csum(), I forgot
> > to ror() the seed.
> >
> > I suspect the following would help:
> >
> > diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
> > index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..ee7b0e7a6055bcbef42d22f7e1d8f52ddbd6be6d
> > 100644
> > --- a/arch/x86/lib/csum-partial_64.c
> > +++ b/arch/x86/lib/csum-partial_64.c
> > @@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
> >         if (unlikely(odd)) {
> >                 if (unlikely(len == 0))
> >                         return sum;
> > +               temp64 = ror32((__force u64)sum, 8);
> >                 temp64 += (*(unsigned char *)buff << 8);
> >                 len--;
> >                 buff++;
>
> You can save an instruction (as if this path matters) by:
>                         temp64 = sum + *(unsigned char *)buff;

This might overflow, sum is 32bit.

Given that we have temp64 = (__force u64)sum;  already done at this
point, we can instead

temp64 += *(u8 *)buff;

>                         temp64 <<= 8;



> Although that probably falls foul of 64bit shifts being slow.

Are they slower than the ror32(sum, 8) ?

> So maybe just:
>                         sum += *(unsigned char *)buff;

we might miss a carry/overflow here

>                         temp64 = bswap32(sum);


> AFAICT (from a pdf) bswap32() and ror(x, 8) are likely to be
> the same speed but may use different execution units.
>
> Intel seem so have managed to slow down ror(x, %cl) to 3 clocks
> in sandy bridge - and still not fixed it.
> Although the compiler might be making a pigs-breakfast of the
> register allocation when you tried setting 'odd = 8'.
>
> Weeks can be spent fiddling with this code :-(

Yes, and in the end, it won't be able to compete with  a
specialized/inlined ipv6_csum_partial()

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


  reply	other threads:[~2021-11-26 18:34 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-17 18:45 [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12: error: implicit declaration of function 'load_unaligned_zeropad' kernel test robot
2021-11-17 18:45 ` kernel test robot
2021-11-17 18:55 ` Eric Dumazet
2021-11-17 18:55   ` Eric Dumazet
2021-11-17 19:40   ` Eric Dumazet
2021-11-17 19:40     ` Eric Dumazet
2021-11-18 16:00     ` Peter Zijlstra
2021-11-18 16:00       ` Peter Zijlstra
2021-11-18 16:00       ` Peter Zijlstra
2021-11-18 16:26       ` Johannes Berg
2021-11-18 16:26         ` Johannes Berg
2021-11-18 16:26         ` Johannes Berg
2021-11-18 16:57         ` Eric Dumazet
2021-11-18 16:57           ` Eric Dumazet
2021-11-18 16:57           ` Eric Dumazet
2021-11-18 17:02           ` Eric Dumazet
2021-11-18 17:02             ` Eric Dumazet
2021-11-18 17:02             ` Eric Dumazet
2021-11-25  1:58           ` Noah Goldstein
2021-11-25  1:58             ` Noah Goldstein
2021-11-25  1:58             ` Noah Goldstein
2021-11-25  2:56             ` Eric Dumazet
2021-11-25  2:56               ` Eric Dumazet
2021-11-25  2:56               ` Eric Dumazet
2021-11-25  3:41               ` Noah Goldstein
2021-11-25  3:41                 ` Noah Goldstein
2021-11-25  3:41                 ` Noah Goldstein
2021-11-25  4:00                 ` Eric Dumazet
2021-11-25  4:00                   ` Eric Dumazet
2021-11-25  4:00                   ` Eric Dumazet
2021-11-25  4:08                   ` Eric Dumazet
2021-11-25  4:08                     ` Eric Dumazet
2021-11-25  4:08                     ` Eric Dumazet
2021-11-25  4:20                     ` Eric Dumazet
2021-11-25  4:20                       ` Eric Dumazet
2021-11-25  4:20                       ` Eric Dumazet
2021-11-25  4:56                       ` Noah Goldstein
2021-11-25  4:56                         ` Noah Goldstein
2021-11-25  4:56                         ` Noah Goldstein
2021-11-25  5:09                         ` Noah Goldstein
2021-11-25  5:09                           ` Noah Goldstein
2021-11-25  5:09                           ` Noah Goldstein
2021-11-25  6:32                           ` Eric Dumazet
2021-11-25  6:32                             ` Eric Dumazet
2021-11-25  6:32                             ` Eric Dumazet
2021-11-25  6:45                             ` Eric Dumazet
2021-11-25  6:45                               ` Eric Dumazet
2021-11-25  6:45                               ` Eric Dumazet
2021-11-25  6:49                               ` Noah Goldstein
2021-11-25  6:49                                 ` Noah Goldstein
2021-11-25  6:49                                 ` Noah Goldstein
2021-11-25  6:47                             ` Noah Goldstein
2021-11-25  6:47                               ` Noah Goldstein
2021-11-25  6:47                               ` Noah Goldstein
2021-11-26 17:18                   ` David Laight
2021-11-26 17:18                     ` David Laight
2021-11-26 17:18                     ` David Laight
2021-11-26 18:09                     ` Eric Dumazet [this message]
2021-11-26 18:09                       ` Eric Dumazet
2021-11-26 18:09                       ` Eric Dumazet
2021-11-26 22:41                       ` David Laight
2021-11-26 22:41                         ` David Laight
2021-11-26 22:41                         ` David Laight
2021-11-26 23:04                         ` Noah Goldstein
2021-11-26 23:04                           ` Noah Goldstein
2021-11-26 23:04                           ` Noah Goldstein
2021-11-28 18:30                           ` David Laight
2021-11-28 18:30                             ` David Laight
2021-11-28 18:30                             ` David Laight
2021-12-29  6:00       ` Al Viro
2021-12-29  6:00         ` Al Viro
2021-12-29  6:00         ` Al Viro
2022-01-31  2:29         ` Al Viro
2022-01-31  2:29           ` Al Viro
2022-01-31  2:29           ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANn89iJubuJxjVp4fx78-bjKBN3e9JsdAwZxj4XO6g2_7ZPqJQ@mail.gmail.com \
    --to=edumazet@google.com \
    --cc=David.Laight@aculab.com \
    --cc=alexanderduyck@fb.com \
    --cc=goldstein.w.n@gmail.com \
    --cc=johannes@sipsolutions.net \
    --cc=kbuild-all@lists.01.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=lkp@intel.com \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.