linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	Nathan Chancellor <natechancellor@gmail.com>
Subject: Re: [RFCv2] string: Use faster alternatives when constant arguments are used
Date: Mon, 25 Mar 2019 22:24:00 +0100	[thread overview]
Message-ID: <8672b98c-bf71-7b5b-625e-2f241807d46c@rasmusvillemoes.dk> (raw)
In-Reply-To: <20190324223202.GA875@sultan-box.localdomain>

On 24/03/2019 23.32, Sultan Alsawaf wrote:
> On Sun, Mar 24, 2019 at 10:17:49PM +0100, Rasmus Villemoes wrote:
>> gcc already knows the semantics of these functions and can optimize
>> accordingly. E.g. for strcpy() of a literal to a buffer, gcc readily
>> compiles
> 
> The example you gave appears to get optimized accordingly, but there are
> numerous sane counterexamples that don't get optimized.
> 
> On arm64, with GCC 8.3.0, let's look at the simple strcmp usage in
> kernel/trace/preemptirq_delay_test.c:
> 
> static int preemptirq_delay_run(void *data)
> {
> 	unsigned long flags;
> 
> 	if (!strcmp(test_mode, "irq")) {
> 		local_irq_save(flags);
> 		busy_wait(delay);
> 		local_irq_restore(flags);
> 	} else if (!strcmp(test_mode, "preempt")) {
> 		preempt_disable();
> 		busy_wait(delay);
> 		preempt_enable();
> 	}
> 
> 	return 0;
> }
> 
> This is what happens without my patch:

Excellent, this is the kind of examples I was looking for. On x86-64,
there's no callq strcmp in sight (just a repz cmpsb loop). With your
patch, the code gets compiled similar to the arm64 case (it's a rather
convenient case which can be done as single 4- or 8-byte comparisons).

> I can produce lots of kernel examples that don't seem to follow basic str*
> optimization, which is what prompted me to write this in the first place.

Probably because the compiler doesn't want to introduce memory accesses
that are not allowed by the C language.

>> Does this even compile? It's a well-known (or perhaps
>> not-so-well-known?) pitfall that __builtin_constant_p() is not
>> guaranteed to be usable in __builtin_choose_expr() - the latter only
>> accepts bona fide integer constant expressions, while evaluation of
>> __builtin_constant_p can be delayed until various optimization phases.
> 
> It not only compiles, but also seems to work correctly from what I've observed.

Interesting. I'll have to look into how that can be.

>> This seems to be buggy - you don't know that src is at least as long as
>> dest. And arguing "if it's shorter, there's a nul byte, which will
>> differ from dest at that point, so memcmp will/should stop" means that
>> the whole word-at-a-time argument would be out.
> 
> You mean reading the last word of a string could read out of bounds of the
> string when using memcmp? There are lots of memcmp instances using a literal
> string in the kernel; I don't think it would be hard to find one that violates
> what you've pointed out, so I'm not really sure why it's a problem.

Existing code may have bugs. Or existing code may use a dubious pattern
which happens to be ok due to the context it appears in, but becomes
buggy when copy-pasted elsewhere.

> After a few minutes of grepping, it seems like the memcmp usage in
> drivers/md/dm-integrity.c can read out of bounds on its arguments:
> } else if (!memcmp(opt_string, "internal_hash:", strlen("internal_hash:"))) {
> 	r = get_alg_and_key(opt_string, &ic->internal_hash_alg, &ti->error,
> 			    "Invalid internal_hash argument");
> 	if (r)
> 		goto bad;

Yeah, this actually looks buggy. Maybe we don't have any memcmp()
implementations that do word-at-a-time (it would mostly make sense on BE
architectures). Or maybe there's something here that guarantees
sufficient slack after the nul byte, so we never read across a page
boundary (which is how bad things could happen).

Something like

  char stackbuffer[32];
  fill_stackbuffer_somehow();
  if (memcmp(stackbuffer, "literal", strlen("literal"))

should be sorta ok, even if memcmp might end up reading uninitialized bytes.

What I'm worried about is your patch changing every single strcmp(,
"literal") into a memcmp, with absolutely no way of knowing or checking
anything about the other buffer. And actually, it doesn't have to be a
BE arch with a word-at-a-time memcmp. If (as is usually the case) the
strcmp() result is compared to zero, after you change

  !strcmp(buf, "literal")

into

  !memcmp(buf, "literal", 8)

the compiler may (exactly as you want it to) change that into a single
8-byte load (or two 4-byte loads) and comparisons to literals, no
memcmp() involved. And how do you know that _that_ is ok, for every one
of the hundreds, if not thousands, of instances in the tree?

So, for strcpy(), I think gcc does optimize across all arches. For
strcat(), I dunno, it probably doesn't have many users and users of
strcat() obviously do not care about performance anyway. And for
strcmp(), some arch backends already optimize pretty well, some may be
lacking (either because the arch does not have suitable string ops, or
because nobody implemented the optimizations in gcc yet), and converting
blindly to memcmp() seems to be somewhat dangerous - a few pre-existing
"open-coded" such instances doesn't really justify a tree-wide conversion.

Rasmus

  reply	other threads:[~2019-03-25 21:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-24  1:44 [RFC] string: Use faster alternatives when constant arguments are used Sultan Alsawaf
2019-03-24  2:24 ` [RFCv2] " Sultan Alsawaf
2019-03-24  3:31   ` Nathan Chancellor
2019-03-24  7:18     ` [RFC v3] " Sultan Alsawaf
2019-03-24 21:17   ` [RFCv2] " Rasmus Villemoes
2019-03-24 22:32     ` Sultan Alsawaf
2019-03-25 21:24       ` Rasmus Villemoes [this message]
2019-03-30 22:59         ` Sultan Alsawaf
2019-04-01 20:43           ` Rasmus Villemoes
2019-04-01 23:55             ` Sultan Alsawaf
2019-04-02  8:17               ` Rasmus Villemoes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8672b98c-bf71-7b5b-625e-2f241807d46c@rasmusvillemoes.dk \
    --to=linux@rasmusvillemoes.dk \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=natechancellor@gmail.com \
    --cc=sultan@kerneltoast.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).