linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: George Spelvin <lkml@sdf.org>, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrey Abramov <st5pub@yandex.ru>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Daniel Wagner <daniel.wagner@siemens.com>,
	Don Mullis <don.mullis@gmail.com>,
	Dave Chinner <dchinner@redhat.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: Re: [PATCH 1/5] lib/sort: Make swap functions more generic
Date: Wed, 13 Mar 2019 22:23:44 +0100	[thread overview]
Message-ID: <6e08375f-6960-546b-39a6-26ff24a0ce0e@rasmusvillemoes.dk> (raw)
In-Reply-To: <dc6cc09e71f8687ff2ef4cbfc2b11accc9aba8aa.1552097842.git.lkml@sdf.org>

On 21/02/2019 07.30, George Spelvin wrote:
> Rather than u32_swap and u64_swap working on 4- and 8-byte objects
> directly, let them handle any multiple of 4 or 8 bytes.  This speeds
> up most users of sort() by avoiding fallback to the byte copy loop.
> 
> Despite what commit ca96ab859ab4 ("lib/sort: Add 64 bit swap function")
> claims, very few users of sort() sort pointers (or pointer-sized
> objects); most sort structures containing at least two words.
> (E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte
> struct acpi_fan_fps.)
> 
> x86-64 code size 872 -> 885 bytes (+8)
> 
> Signed-off-by: George Spelvin <lkml@sdf.org>
> ---
>  lib/sort.c | 117 +++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 96 insertions(+), 21 deletions(-)
> 
> diff --git a/lib/sort.c b/lib/sort.c
> index d6b7a202b0b6..dff2ab2e196e 100644
> --- a/lib/sort.c
> +++ b/lib/sort.c
> @@ -11,35 +11,110 @@
>  #include <linux/export.h>
>  #include <linux/sort.h>
>  
> -static int alignment_ok(const void *base, int align)
> +/**
> + * alignment_ok - is this pointer & size okay for word-wide copying?
> + * @base: pointer to data
> + * @size: size of each element
> + * @align: required aignment (typically 4 or 8)
> + *

typo aLignment

> + * Returns true if elements can be copied using word loads and stores.
> + * The size must be a multiple of the alignment, and the base address must
> + * be if we do not have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
> + *

I wonder if we shouldn't unconditionally require the base to be aligned.
It of course depends on what exactly 'efficient' means, but if the base
is not aligned, we know that every single load and store will be
unaligned, and we're doing lots of them. IIRC even on x86 it's usually
two cycles instead of one, and even more when we cross a cache line
boundary (which we do rather often - e.g. in your 40 byte example,
almost every swap will hit at least one). One could also have some data
that is naturally 4-byte aligned with an element size that happens to be
a multiple of 8 bytes; it would be a bit sad to use 8-byte accesses when
4-byte would all have been aligned.

> + * For some reason, gcc doesn't know to optimize "if (a & mask || b & mask)"
> + * to "if ((a | b) & mask)", so we do that by hand.
> + */
> +static bool __attribute_const__
> +alignment_ok(const void *base, size_t size, unsigned int align)
>  {
> -	return IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
> -		((unsigned long)base & (align - 1)) == 0;
> +	unsigned int lsbits = (unsigned int)size;

Drop that cast.

> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +	(void)base;
> +#else
> +	lsbits |= (unsigned int)(size_t)base;

The kernel usually casts pointers to long or unsigned long. If some
oddball arch has size_t something other than unsigned long, this would
give a "warning: cast from pointer to integer of different size". So
just cast to unsigned long, and drop the cast to unsigned int.

If you do drop CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, this all becomes
much simpler, of course

lsbits = size | (unsigned long)base;

> +#endif
> +	lsbits &= align - 1;
> +	return lsbits == 0;
>  }
>  
> +/**
> + * u32_swap - swap two elements in 32-bit chunks
> + * @a, @b: pointers to the elements
> + * @size: element size (must be a multiple of 4)
> + *
> + * Exchange the two objects in memory.  This exploits base+index addressing,
> + * which basically all CPUs have, to minimize loop overhead computations.
> + *
> + * For some reason, on x86 gcc 7.3.0 adds a redundant test of n at the
> + * bottom of the loop, even though the zero flag is stil valid from the
> + * subtract (since the intervening mov instructions don't alter the flags).
> + * Gcc 8.1.0 doesn't have that problem.
> + */
>  static void u32_swap(void *a, void *b, int size)
>  {
> -	u32 t = *(u32 *)a;
> -	*(u32 *)a = *(u32 *)b;
> -	*(u32 *)b = t;
> +	size_t n = size;
> +

Since the callback has int in its prototype (we should fix that globally
at some point...), might as well use a four-byte local variable, to
avoid rex prefix on x86-64. So just make that "unsigned" or "u32".

> +	do {
> +		u32 t = *(u32 *)(a + (n -= 4));
> +		*(u32 *)(a + n) = *(u32 *)(b + n);
> +		*(u32 *)(b + n) = t;
> +	} while (n);
>  }

Hm, are we absolutely sure there's no weird .config where some struct
ends up being empty, yet we still sort an array of them? It's far
fetched, of course, but perhaps an "if (unlikely(!size)) return;" at the
start of sort() would be in order. Dunno, I'll leave it to you.

Rasmus

  parent reply	other threads:[~2019-03-13 21:23 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-09  2:17 [PATCH 0/5] lib/sort & lib/list_sort: faster and smaller George Spelvin
2019-02-21  6:30 ` [PATCH 1/5] lib/sort: Make swap functions more generic George Spelvin
     [not found]   ` <20190309140653.GO9224@smile.fi.intel.com>
2019-03-09 15:53     ` lkml
2019-03-09 20:19       ` Andrey Abramov
2019-03-14  9:29       ` Andy Shevchenko
2019-03-14 10:09         ` George Spelvin
2019-03-14 10:41           ` Geert Uytterhoeven
2019-03-14 11:53             ` George Spelvin
2019-03-14 12:18               ` Andy Shevchenko
2019-03-14 19:59                 ` Andrey Abramov
2019-03-15  3:35                   ` George Spelvin
2019-03-15  8:27                     ` Geert Uytterhoeven
2019-03-14 10:11         ` George Spelvin
2019-03-09 21:02     ` George Spelvin
2019-03-13 21:23   ` Rasmus Villemoes [this message]
2019-03-13 22:02     ` Geert Uytterhoeven
2019-03-13 23:15     ` George Spelvin
2019-02-21  8:21 ` [PATCH 2/5] lib/sort: Use more efficient bottom-up heapsort variant George Spelvin
2019-03-13 22:29   ` Rasmus Villemoes
2019-03-14  0:03     ` George Spelvin
2019-03-14  0:15       ` Rasmus Villemoes
2019-02-21  8:21 ` [PATCH 3/5] lib/sort: Avoid indirect calls to built-in swap George Spelvin
2019-03-05  3:06 ` [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS George Spelvin
2019-03-10 21:54   ` Rasmus Villemoes
2019-03-10 22:29     ` George Spelvin
2019-03-14  9:10   ` Andy Shevchenko
2019-03-14  9:41     ` George Spelvin
2019-03-15  4:33     ` George Spelvin
2019-03-15  8:20       ` Geert Uytterhoeven
2019-03-15 10:23         ` George Spelvin
2019-03-15 12:57           ` Geert Uytterhoeven
2019-03-15 16:59             ` George Spelvin
2019-03-15 17:47               ` Geert Uytterhoeven
2019-03-15 18:53                 ` Andrey Abramov
2019-03-15 19:06                   ` Andy Shevchenko
2019-03-15 19:23                     ` Andrey Abramov
2019-03-15 19:56                       ` Andy Shevchenko
2019-03-16  3:49                         ` George Spelvin
2019-03-05  5:58 ` [PATCH 5/5] lib/list_sort: Optimize number of calls to comparison function George Spelvin
2019-03-13 23:28   ` Rasmus Villemoes
2019-03-14  1:58     ` George Spelvin
2019-06-21 23:12       ` Rasmus Villemoes
2019-12-08  8:01         ` George Spelvin
2019-03-15 19:54 ` [PATCH 0/5] lib/sort & lib/list_sort: faster and smaller Andrey Abramov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e08375f-6960-546b-39a6-26ff24a0ce0e@rasmusvillemoes.dk \
    --to=linux@rasmusvillemoes.dk \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=daniel.wagner@siemens.com \
    --cc=dchinner@redhat.com \
    --cc=don.mullis@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml@sdf.org \
    --cc=st5pub@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).