From: Willy Tarreau <w@1wt.eu>
To: David Laight <David.Laight@aculab.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: how many memset(,0,) calls in kernel ?
Date: Tue, 14 Sep 2021 18:46:54 +0200 [thread overview]
Message-ID: <20210914164654.GC10488@1wt.eu> (raw)
In-Reply-To: <15cd0a8e72b3460db939060db25dd59a@AcuMS.aculab.com>
On Tue, Sep 14, 2021 at 08:23:40AM +0000, David Laight wrote:
> > The exact point is, here it's up to the compiler to decide thanks to
> > its builtin what it considers best for the target CPU. It already
> > knows the fixed size and the code is emitted accordingly. It may
> > very well be a call to the memset() function when the size is large
> > and a power of two because it knows alternate variants are available
> > for example.
> >
> > The compiler might even decide to shrink that area if other bytes
> > are written just after the memset(), leaving only holes touched by
> > memset().
>
> You might think the compiler will make sane choices for the target CPU.
> But it often makes a complete pig's breakfast of it.
> I'm pretty sure 6 'rep stos' is slower than 6 write an absolutely
> everything - with the possible exception of an 8088.
It can be suboptimal (especially with the moderate latencies required
for small areas), but my point is that in plenty of cases the memset()
call will be totally eliminated. Example:
The file:
#include <string.h>
int f(int a, int b)
{
struct {
int n1;
int n2;
int n3;
int n4;
} s;
memset(&s, 0, sizeof(s));
s.n2 = a;
s.n3 = b;
return s.n1 + s.n2 + s.n3 + s.n4;
}
gives:
0000000000000000 <f>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: c3 retq
See ? The builtin allowed the compiler to *know* that these areas
were zeroes and could optimize them away. More importantly this
can save some reads from being performed, with the data being only
written into:
#include <string.h>
struct {
int n1;
int n2;
} s;
void f(int a, int b)
{
memset(&s, 0, sizeof(s));
s.n1 |= a;
s.n2 |= b;
}
Gives:
0000000000000000 <f>:
0: 89 3d 00 00 00 00 mov %edi,0x0(%rip) # 6 <f+0x6>
6: 89 35 00 00 00 00 mov %esi,0x0(%rip) # c <f+0xc>
c: c3 retq
See ? Just plain writes, no read-modify-write of the memory area.
If you'd call an external memset() function, you'd instantly lose
all these possibilities:
0000000000000000 <f>:
0: 55 push %rbp
1: ba 08 00 00 00 mov $0x8,%edx
6: 89 fd mov %edi,%ebp
8: bf 00 00 00 00 mov $0x0,%edi
d: 53 push %rbx
e: 89 f3 mov %esi,%ebx
10: 31 f6 xor %esi,%esi
12: 48 83 ec 08 sub $0x8,%rsp
16: e8 00 00 00 00 callq 1b <f+0x1b>
1b: 09 2d 00 00 00 00 or %ebp,0x0(%rip) # 21 <f+0x21>
21: 09 1d 00 00 00 00 or %ebx,0x0(%rip) # 27 <f+0x27>
27: 48 83 c4 08 add $0x8,%rsp
2b: 5b pop %rbx
2c: 5d pop %rbp
2d: c3 retq
Thus the fact that the compiler has knowledge of the memset() is useful.
Willy
prev parent reply other threads:[~2021-09-14 16:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-12 3:36 how many memset(,0,) calls in kernel ? Douglas Gilbert
2021-09-12 4:56 ` Willy Tarreau
2021-09-13 16:03 ` David Laight
2021-09-13 16:09 ` Willy Tarreau
2021-09-14 8:23 ` David Laight
2021-09-14 16:46 ` Willy Tarreau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210914164654.GC10488@1wt.eu \
--to=w@1wt.eu \
--cc=David.Laight@aculab.com \
--cc=dgilbert@interlog.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.