All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>, linux-mm@kvack.org
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
	chuck.lever@oracle.com, netdev@vger.kernel.org,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH RFC net-next 3/3] mm: make zone->free_area[order] access faster
Date: Wed, 24 Feb 2021 19:56:51 +0100	[thread overview]
Message-ID: <161419301128.2718959.4838557038019199822.stgit@firesoul> (raw)
In-Reply-To: <161419296941.2718959.12575257358107256094.stgit@firesoul>

Avoid multiplication (imul) operations when accessing:
 zone->free_area[order].nr_free

This was really tricky to find. I was puzzled why perf reported that
rmqueue_bulk was using 44% of the time in an imul operation:

       │     del_page_from_free_list():
 44,54 │ e2:   imul   $0x58,%rax,%rax

This operation was generated (by compiler) because the struct free_area have
size 88 bytes or 0x58 hex. The compiler cannot find a shift operation to use
and instead choose to use a more expensive imul, to find the offset into the
array free_area[].

The patch align struct free_area to a cache-line, which cause the
compiler avoid the imul operation. The imul operation is very fast on
modern Intel CPUs. To help fast-path that decrement 'nr_free' move the
member 'nr_free' to be first element, which saves one 'add' operation.

Looking up instruction latency this exchange a 3-cycle imul with a
1-cycle shl, saving 2-cycles. It does trade some space to do this.

Used: gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/mmzone.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b593316bff3d..4d83201717e1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -93,10 +93,12 @@ extern int page_group_by_mobility_disabled;
 #define get_pageblock_migratetype(page)					\
 	get_pfnblock_flags_mask(page, page_to_pfn(page), MIGRATETYPE_MASK)
 
+/* Aligned struct to make zone->free_area[order] access faster */
 struct free_area {
-	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
-};
+	unsigned long		__pad_to_align_free_list;
+	struct list_head	free_list[MIGRATE_TYPES];
+}  ____cacheline_aligned_in_smp;
 
 static inline struct page *get_page_from_free_area(struct free_area *area,
 					    int migratetype)



  parent reply	other threads:[~2021-02-24 19:01 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24 10:26 [RFC PATCH 0/3] Introduce a bulk order-0 page allocator for sunrpc Mel Gorman
2021-02-24 10:26 ` [PATCH 1/3] SUNRPC: Set rq_page_end differently Mel Gorman
2021-02-24 10:26 ` [PATCH 2/3] mm, page_alloc: Add a bulk page allocator Mel Gorman
2021-02-24 10:26 ` [PATCH 3/3] SUNRPC: Refresh rq_pages using " Mel Gorman
2021-02-24 11:27 ` [RFC PATCH 0/3] Introduce a bulk order-0 page allocator for sunrpc Jesper Dangaard Brouer
2021-02-24 11:55   ` Mel Gorman
2021-02-24 13:20 ` Chuck Lever
2021-02-24 18:56 ` [PATCH RFC net-next 0/3] Use bulk order-0 page allocator API for page_pool Jesper Dangaard Brouer
2021-02-24 18:56   ` [PATCH RFC net-next 1/3] net: page_pool: refactor dma_map into own function page_pool_dma_map Jesper Dangaard Brouer
2021-02-24 20:11     ` Ilias Apalodimas
2021-02-24 18:56   ` [PATCH RFC net-next 2/3] net: page_pool: use alloc_pages_bulk in refill code path Jesper Dangaard Brouer
2021-02-24 20:15     ` Ilias Apalodimas
2021-02-26 14:31       ` Jesper Dangaard Brouer
2021-02-25  0:06     ` kernel test robot
2021-02-24 18:56   ` Jesper Dangaard Brouer [this message]
2021-02-25 11:28     ` [PATCH RFC net-next 3/3] mm: make zone->free_area[order] access faster Mel Gorman
2021-02-25 15:16       ` Jesper Dangaard Brouer
2021-02-25 15:38         ` Mel Gorman
2021-02-26 14:34           ` Jesper Dangaard Brouer
2021-03-01 13:29 ` [PATCH RFC V2 net-next 0/2] Use bulk order-0 page allocator API for page_pool Jesper Dangaard Brouer
2021-03-01 13:29   ` [PATCH RFC V2 net-next 1/2] net: page_pool: refactor dma_map into own function page_pool_dma_map Jesper Dangaard Brouer
2021-03-01 13:29   ` [PATCH RFC V2 net-next 2/2] net: page_pool: use alloc_pages_bulk in refill code path Jesper Dangaard Brouer
2021-03-02 17:40     ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=161419301128.2718959.4838557038019199822.stgit@firesoul \
    --to=brouer@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.