linux-bcachefs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: colyli@suse.de, kent.overstreet@linux.dev
Cc: bfoster@redhat.com, jserv@ccns.ncku.edu.tw,
	linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-bcachefs@vger.kernel.org,
	Kuan-Wei Chiu <visitorckw@gmail.com>
Subject: [PATCH 5/5] bcache: Optimize number of comparisons in heap_sift
Date: Sun, 21 Jan 2024 23:36:49 +0800	[thread overview]
Message-ID: <20240121153649.2733274-6-visitorckw@gmail.com> (raw)
In-Reply-To: <20240121153649.2733274-1-visitorckw@gmail.com>

Optimize the heap_sift() macro, resulting in a significant reduction of
approximately 50% in the number of comparisons for large random inputs,
while maintaining identical results.

The current implementation performs two comparisons per level to
identify the maximum among three elements. In contrast, the proposed
bottom-up variation uses only one comparison per level to assess two
children until reaching the leaves. Then, it sifts up until the correct
position is determined.

Typically, the process of sifting down proceeds to the leaf level,
resulting in O(1) secondary comparisons instead of log2(n). This
optimization significantly reduces the number of costly indirect
function calls and improves overall performance.

The experimental data below is derived from first adding N elements
generated by get_random_u32() to the heap, and then performing heap_pop
until the heap is empty.

|   N   | comparisons(old) | comparisons(new) | time(old) | time(new) |
|-------|------------------|------------------|-----------|-----------|
| 10000 |     249215       |     164872       |   1253 us |    958 us |
| 20000 |     539766       |     350113       |   2693 us |   2059 us |
| 30000 |     843297       |     543037       |   4120 us |   3119 us |
| 40000 |    1159127       |     739595       |   5624 us |   4221 us |
| 50000 |    1482608       |     941655       |   7147 us |   5349 us |
| 60000 |    1808772       |    1144716       |   8754 us |   6786 us |
| 70000 |    2139443       |    1348878       |  10323 us |   8030 us |
| 80000 |    2478304       |    1560892       |  11934 us |   9061 us |
| 90000 |    2820532       |    1768678       |  13611 us |  10679 us |
| 100000|    3163503       |    1983806       |  15244 us |  11745 us |

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
This patch has undergone unit testing and micro benchmarking using the
following code [1].

[1]:
static long long int cmp_count = 0;

typedef DECLARE_HEAP(u32, heap);

int mycmp(u32 a, u32 b)
{
    cmp_count++;
    return a > b;
}

int check_heap(heap *h, int (*cmp)(u32, u32))
{
    size_t i;

    for (i = 0; i < h->used / 2; i++) {
        if (i * 2 + 1 < h->used)
            if (cmp(h->data[i], h->data[i * 2 + 1]))
                return -1;
        if (i * 2 + 2 < h->used)
            if (cmp(h->data[i], h->data[i * 2 + 2]))
                return -1;
    }
    return 0;
}

static int test(void)
{
    size_t N, i;
    u32 x;
    heap myheap;
    ktime_t start, end;
	s64 delta;

	/* Test for correctness. */
    for (N = 1000; N <= 10000; N += 1000) {
        init_heap(&myheap, N, GFP_KERNEL);
        for (i = 0; i < N; i++) {
            heap_add(&myheap, get_random_u32(), mycmp);
            if (check_heap(&myheap, mycmp))
                return -1;
        }
        for (i = 0; i < N; i++) {
            heap_pop(&myheap, x, mycmp);
            if (check_heap(&myheap, mycmp))
                return -1;
        }
        free_heap(&myheap);
    }

	/* Micro-benchmark. */
	for(N = 10000; N <= 100000; N += 10000) {
		cmp_count = 0;
        init_heap(&myheap, N, GFP_KERNEL);

        start = ktime_get();
        for (i = 0; i < N; i++)
            heap_add(&myheap, get_random_u32(), mycmp);
        for (i = 0; i < N; i++)
            heap_pop(&myheap, x, mycmp);
        end = ktime_get();
		delta = ktime_us_delta(end, start);
        printk(KERN_INFO "time: %lld\n", delta);
        printk(KERN_INFO "comparisons: %lld\n", cmp_count);
        free_heap(&myheap);
    }

    return 0;
}

 drivers/md/bcache/util.h | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index f61ab1bada6c..3aa74b0d7f0a 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -56,17 +56,20 @@ do {									\
 
 #define heap_sift(h, i, cmp)						\
 do {									\
-	size_t _r, _j = i;						\
+	size_t _j, _k;							\
 									\
-	for (; _j * 2 + 1 < (h)->used; _j = _r) {			\
-		_r = _j * 2 + 1;					\
-		if (_r + 1 < (h)->used &&				\
-		    cmp((h)->data[_r], (h)->data[_r + 1]))		\
-			_r++;						\
-									\
-		if (cmp((h)->data[_r], (h)->data[_j]))			\
-			break;						\
-		heap_swap(h, _r, _j);					\
+	for (_j = i; _k = 2 * _j + 1, _k + 1 < (h)->used;) {		\
+		if (cmp((h)->data[_k], (h)->data[_k + 1]))		\
+			_k++;						\
+		_j = _k;						\
+	}								\
+	if (_j * 2 + 2 == (h)->used)					\
+		_j = _j * 2 + 1;					\
+	while (_j != i && cmp((h)->data[_j], (h)->data[i]))		\
+		_j = (_j - 1) / 2;					\
+	for (_k = _j; _j != i;) {					\
+		_j = (_j - 1) / 2;					\
+		heap_swap(h, _j, _k);					\
 	}								\
 } while (0)
 
-- 
2.25.1


  parent reply	other threads:[~2024-01-21 15:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-21 15:36 [PATCH 0/5] Optimize number of comparisons for heap/heapsort implementaion Kuan-Wei Chiu
2024-01-21 15:36 ` [PATCH 1/5] bcachefs: Optimize eytzinger0_sort() using bottom-up heapsort Kuan-Wei Chiu
2024-01-21 15:36 ` [PATCH 2/5] bcachefs: Introduce parent function for sort_cmp_size() Kuan-Wei Chiu
2024-01-21 16:17   ` Kent Overstreet
2024-01-21 17:05     ` Kuan-Wei Chiu
2024-01-21 17:37       ` Kent Overstreet
2024-01-21 15:36 ` [PATCH 3/5] bcachefs: Optimize sort_cmp_size() using bottom-up heapsort Kuan-Wei Chiu
2024-01-21 15:36 ` [PATCH 4/5] bcachefs: Optimize number of comparisons in heap_sift_down Kuan-Wei Chiu
2024-01-21 15:36 ` Kuan-Wei Chiu [this message]
2024-01-21 16:21 ` [PATCH 0/5] Optimize number of comparisons for heap/heapsort implementaion Kent Overstreet
2024-01-21 16:55   ` Kuan-Wei Chiu
2024-01-21 17:41     ` Kent Overstreet
2024-01-22 15:06       ` Kuan-Wei Chiu
2024-01-22 16:06         ` Kent Overstreet
2024-01-22 16:23           ` Kuan-Wei Chiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240121153649.2733274-6-visitorckw@gmail.com \
    --to=visitorckw@gmail.com \
    --cc=bfoster@redhat.com \
    --cc=colyli@suse.de \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).