From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
To: George Spelvin <lkml@sdf.org>
Cc: linux-kernel@vger.kernel.org, kernel-janitors@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Andrey Abramov <st5pub@yandex.ru>,
Geert Uytterhoeven <geert@linux-m68k.org>,
Daniel Wagner <daniel.wagner@siemens.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Don Mullis <don.mullis@gmail.com>,
Dave Chinner <dchinner@redhat.com>
Subject: Re: [RESEND PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller
Date: Tue, 19 Mar 2019 16:01:45 +0200 [thread overview]
Message-ID: <20190319140145.GH9224@smile.fi.intel.com> (raw)
In-Reply-To: <cover.1552704200.git.lkml@sdf.org>
On Tue, Mar 19, 2019 at 08:15:56AM +0000, George Spelvin wrote:
> (Resend because earlier send had GIT_AUTHOR_DATE in the e-mail
> headers and got filed with last month's archives. And probably
> tripped a few spam filters.)
I got both.
>
> v1->v2: Various spelling, naming and code style cleanups.
> Generally positive and no negative responses to the
> goals and algorithms used.
>
> I'm running these patches, with CONFIG_TEST_SORT and
> CONFIG_TEST_LIST_SORT, on the machine I'm sending this from.
> I have tweaked the comments further, but I have verified
> the compiled object code is identical to a snapshot I took
> when I rebooted.
>
> As far as I'm concerned, this is ready to be merged.
> As there is no owner in MAINTAINERS, I was thinking of
> sending it via AKPM, like the recent lib/lzo changes.
> Andrew, is that okay with you?
>
Thanks for this improvement. I like when academic work is used in real life!
FWIW,
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Because CONFIG_RETPOLINE has made indirect calls much more expensive,
> I thought I'd try to reduce the number made by the library sort
> functions.
>
> The first three patches apply to lib/sort.c.
>
> Patch #1 is a simple optimization. The built-in swap has special cases
> for aligned 4- and 8-byte objects. But those are almost never used;
> most calls to sort() work on larger structures, which fall back to the
> byte-at-a-time loop. This generalizes them to aligned *multiples* of 4
> and 8 bytes. (If nothing else, it saves an awful lot of energy by not
> thrashing the store buffers as much.)
>
> Patch #2 grabs a juicy piece of low-hanging fruit. I agree that
> nice simple solid heapsort is preferable to more complex algorithms
> (sorry, Andrey), but it's possible to implement heapsort with far fewer
> comparisons (50% asymptotically, 25-40% reduction for realistic sizes)
> than the way it's been done up to now. And with some care, the code
> ends up smaller, as well. This is the "big win" patch.
>
> Patch #3 adds the same sort of indirect call bypass that has been added
> to the net code of late. The great majority of the callers use the
> builtin swap functions, so replace the indirect call to sort_func with a
> (highly preditable) series of if() statements. Rather surprisingly,
> this decreased code size, as the swap functions were inlined and their
> prologue & epilogue code eliminated.
>
> lib/list_sort.c is a bit trickier, as merge sort is already close to
> optimal, and we don't want to introduce triumphs of theory over
> practicality like the Ford-Johnson merge-insertion sort.
>
> Patch #4, without changing the algorithm, chops 32% off the code size and
> removes the part[MAX_LIST_LENGTH+1] pointer array (and the corresponding
> upper limit on efficiently sortable input size).
>
> Patch #5 improves the algorithm. The previous code is already optimal
> for power-of-two (or slightly smaller) size inputs, but when the input
> size is just over a power of 2, there's a very unbalanced final merge.
>
> There are, in the literature, several algorithms which solve this, but
> they all depend on the "breadth-first" merge order which was replaced
> by commit 835cc0c8477f with a more cache-friendly "depth-first" order.
> Some hard thinking came up with a depth-first algorithm which defers
> merges as little as possible while avoiding bad merges. This saves
> 0.2*n compares, averaged over all sizes.
>
> The code size increase is minimal (64 bytes on x86-64, reducing the net
> savings to 26%), but the comments expanded significantly to document
> the clever algorithm.
>
>
> TESTING NOTES: I have some ugly user-space benchmarking code
> which I used for testing before moving this code into the kernel.
> Shout if you want a copy.
>
> I'm running this code right now, with CONFIG_TEST_SORT and
> CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since
> the last round of minor edits to quell checkpatch. I figure there
> will be at least one round of comments and final testing.
>
> George Spelvin (5):
> lib/sort: Make swap functions more generic
> lib/sort: Use more efficient bottom-up heapsort variant
> lib/sort: Avoid indirect calls to built-in swap
> lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS
> lib/list_sort: Optimize number of calls to comparison function
>
> include/linux/list_sort.h | 1 +
> lib/list_sort.c | 244 +++++++++++++++++++++++++---------
> lib/sort.c | 266 +++++++++++++++++++++++++++++---------
> 3 files changed, 387 insertions(+), 124 deletions(-)
>
> --
> 2.20.1
>
--
With Best Regards,
Andy Shevchenko
prev parent reply other threads:[~2019-03-19 14:01 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-16 2:43 [PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller George Spelvin
2019-02-21 6:30 ` [PATCH v2 1/5] lib/sort: Make swap functions more generic George Spelvin
2019-02-21 8:21 ` [PATCH v2 3/5] lib/sort: Avoid indirect calls to built-in swap George Spelvin
2019-02-21 8:21 ` [PATCH v2 2/5] lib/sort: Use more efficient bottom-up heapsort variant George Spelvin
2019-03-05 3:06 ` [PATCH v2 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS George Spelvin
2019-03-05 5:58 ` [PATCH v2 5/5] lib/list_sort: Optimize number of calls to comparison function George Spelvin
2019-03-19 8:15 ` [RESEND PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller George Spelvin
2019-03-19 8:15 ` [RESEND PATCH v2 1/5] lib/sort: Make swap functions more generic George Spelvin
2019-03-19 8:15 ` [RESEND PATCH v2 2/5] lib/sort: Use more efficient bottom-up heapsort variant George Spelvin
2019-03-19 8:15 ` [RESEND PATCH v2 3/5] lib/sort: Avoid indirect calls to built-in swap George Spelvin
2019-03-19 8:16 ` [RESEND PATCH v2 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS George Spelvin
2019-03-28 22:08 ` Andrew Morton
2019-03-29 4:10 ` George Spelvin
2019-03-29 4:31 ` [PATCH 6/5] lib/list_sort: Fix GCC warning George Spelvin
2019-03-19 8:16 ` [RESEND PATCH v2 5/5] lib/list_sort: Optimize number of calls to comparison function George Spelvin
2019-03-19 10:56 ` [RESEND PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller Rasmus Villemoes
2019-03-19 14:01 ` Andy Shevchenko [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190319140145.GH9224@smile.fi.intel.com \
--to=andriy.shevchenko@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=daniel.wagner@siemens.com \
--cc=dchinner@redhat.com \
--cc=don.mullis@gmail.com \
--cc=geert@linux-m68k.org \
--cc=kernel-janitors@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=lkml@sdf.org \
--cc=st5pub@yandex.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).