linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: George Spelvin <lkml@sdf.org>, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrey Abramov <st5pub@yandex.ru>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Daniel Wagner <daniel.wagner@siemens.com>,
	Don Mullis <don.mullis@gmail.com>,
	Dave Chinner <dchinner@redhat.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: Re: [PATCH 5/5] lib/list_sort: Optimize number of calls to comparison function
Date: Thu, 14 Mar 2019 00:28:16 +0100	[thread overview]
Message-ID: <dd8924c1-07a9-4317-bfa8-23271b138a62@rasmusvillemoes.dk> (raw)
In-Reply-To: <b36187f091acc1b3a8fc1fc3e9dbb6eca56231a9.1552097842.git.lkml@sdf.org>

On 05/03/2019 06.58, George Spelvin wrote:
> CONFIG_RETPOLINE has severely degraded indirect function call
> performance, so it's worth putting some effort into reducing
> the number of times cmp() is called.
> 
> This patch avoids badly unbalanced merges on unlucky input sizes.
> It slightly increases the code size, but saves an average of 0.2*n
> calls to cmp().
> 
[snip]
> 
> (I confess to being more than a little bit proud of how clean this
> code turned out.  It took a lot of thinking, but the resultant inner
> loop is very simple and efficient.)
> 
> Refs:
>   Bottom-up Mergesort: A Detailed Analysis
>   Wolfgang Panny, Helmut Prodinger
>   Algorithmica 14(4):340--354, October 1995
>   https://doi.org/10.1007/BF01294131
>   https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260
> 
>   The cost distribution of queue-mergesort, optimal mergesorts, and
>   power-of-two rules
>   Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen
>   Journal of Algorithms 30(2); Pages 423--448, February 1999
>   https://doi.org/10.1006/jagm.1998.0986
>   https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380
> 
>   Queue-Mergesort
>   Mordecai J. Golin, Robert Sedgewick
>   Information Processing Letters, 48(5):253--259, 10 December 1993
>   https://doi.org/10.1016/0020-0190(93)90088-q
>   https://sci-hub.tw/10.1016/0020-0190(93)90088-Q

This is beautiful. So no comments on the patch itself. One thing that
might be nice would be to see the reduction in number of cmp callbacks
explicitly; it should be trivial to use the priv element for that in the
list_sort_test module. But to really see it one would of course have to
extend that test to do a lot more different sizes of lists.

And looking at the actual cmp functions used made me think we might do
something similar to what you did for the swap functions for sort(),
though it will require updating call sites. Suppose we have some

struct foo {
   ...
   struct list_head list;
   ...
   some_integer_type key;
   ...
};

and a trivial cmp function that just compares the ->key members. What if
we do something like

#define LIST_SORT_SIMPLE_CMP(type, list, key) ({ \
  /* encode the size and signedness of type.key along with
   offsetof(type, key) - offsetof(type, list) into the low, say,
   24 bits - a signed 20 bit number should be sufficient for the
   offsetof diff, and we can bail at compile time if too big */
})

then do

int do_cmp(void *priv, list_head *a, list_head *b, cmp_func cmp)
{
   if ((long)cmp & high_bits) /* it's a kernel pointer */
      return cmp(priv, a, b);
   int diff = extract_diff(cmp);
   int type = extract_type(cmp);
   void *keya = (void*)a + diff;
   void *keyb = (void*)b + diff;
   if (type == s32)
      return *(s32*)keya > *(s32*)keyb;
   if (type == u32)
       return *(u32*)keya > *(u32*)keyb;
   ...

in practice, we'd probably only have to support 4- and 8-byte signed and
unsigned versions (and we can check at compile time whether key has a
supported type).

Similarly one could do a SORT_SIMPLE_CMP() for when sorting an array of
structs according to a single numeric member. That sort is not stable,
so the comparison functions would have to do a full -1,0,1 return, of
course, but we'd still avoid indirect calls.

Rasmus

  reply	other threads:[~2019-03-13 23:28 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-09  2:17 [PATCH 0/5] lib/sort & lib/list_sort: faster and smaller George Spelvin
2019-02-21  6:30 ` [PATCH 1/5] lib/sort: Make swap functions more generic George Spelvin
     [not found]   ` <20190309140653.GO9224@smile.fi.intel.com>
2019-03-09 15:53     ` lkml
2019-03-09 20:19       ` Andrey Abramov
2019-03-14  9:29       ` Andy Shevchenko
2019-03-14 10:09         ` George Spelvin
2019-03-14 10:41           ` Geert Uytterhoeven
2019-03-14 11:53             ` George Spelvin
2019-03-14 12:18               ` Andy Shevchenko
2019-03-14 19:59                 ` Andrey Abramov
2019-03-15  3:35                   ` George Spelvin
2019-03-15  8:27                     ` Geert Uytterhoeven
2019-03-14 10:11         ` George Spelvin
2019-03-09 21:02     ` George Spelvin
2019-03-13 21:23   ` Rasmus Villemoes
2019-03-13 22:02     ` Geert Uytterhoeven
2019-03-13 23:15     ` George Spelvin
2019-02-21  8:21 ` [PATCH 2/5] lib/sort: Use more efficient bottom-up heapsort variant George Spelvin
2019-03-13 22:29   ` Rasmus Villemoes
2019-03-14  0:03     ` George Spelvin
2019-03-14  0:15       ` Rasmus Villemoes
2019-02-21  8:21 ` [PATCH 3/5] lib/sort: Avoid indirect calls to built-in swap George Spelvin
2019-03-05  3:06 ` [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS George Spelvin
2019-03-10 21:54   ` Rasmus Villemoes
2019-03-10 22:29     ` George Spelvin
2019-03-14  9:10   ` Andy Shevchenko
2019-03-14  9:41     ` George Spelvin
2019-03-15  4:33     ` George Spelvin
2019-03-15  8:20       ` Geert Uytterhoeven
2019-03-15 10:23         ` George Spelvin
2019-03-15 12:57           ` Geert Uytterhoeven
2019-03-15 16:59             ` George Spelvin
2019-03-15 17:47               ` Geert Uytterhoeven
2019-03-15 18:53                 ` Andrey Abramov
2019-03-15 19:06                   ` Andy Shevchenko
2019-03-15 19:23                     ` Andrey Abramov
2019-03-15 19:56                       ` Andy Shevchenko
2019-03-16  3:49                         ` George Spelvin
2019-03-05  5:58 ` [PATCH 5/5] lib/list_sort: Optimize number of calls to comparison function George Spelvin
2019-03-13 23:28   ` Rasmus Villemoes [this message]
2019-03-14  1:58     ` George Spelvin
2019-06-21 23:12       ` Rasmus Villemoes
2019-12-08  8:01         ` George Spelvin
2019-03-15 19:54 ` [PATCH 0/5] lib/sort & lib/list_sort: faster and smaller Andrey Abramov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd8924c1-07a9-4317-bfa8-23271b138a62@rasmusvillemoes.dk \
    --to=linux@rasmusvillemoes.dk \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=daniel.wagner@siemens.com \
    --cc=dchinner@redhat.com \
    --cc=don.mullis@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml@sdf.org \
    --cc=st5pub@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).