All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>,
	Larysa Zaremba <larysa.zaremba@intel.com>,
	netdev@vger.kernel.org,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	linux-kernel@vger.kernel.org,
	Michal Kubiak <michal.kubiak@intel.com>,
	intel-wired-lan@lists.osuosl.org, Christoph Hellwig <hch@lst.de>,
	Magnus Karlsson <magnus.karlsson@intel.com>
Subject: [Intel-wired-lan] [PATCH net-next 06/11] net: page_pool: avoid calling no-op externals when possible
Date: Tue, 16 May 2023 18:18:36 +0200	[thread overview]
Message-ID: <20230516161841.37138-7-aleksander.lobakin@intel.com> (raw)
In-Reply-To: <20230516161841.37138-1-aleksander.lobakin@intel.com>

Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
even when on DMA-coherent platforms (like x86) with no active IOMMU or
swiotlb, just for the call ladder.
Indeed, it's

page_pool_put_page()
  page_pool_put_defragged_page()                  <- external
    __page_pool_put_page()
      page_pool_dma_sync_for_device()             <- non-inline
        dma_sync_single_range_for_device()
          dma_sync_single_for_device()            <- external
            dma_direct_sync_single_for_device()
              dev_is_dma_coherent()               <- exit

For the inline functions, no guarantees the compiler won't uninline them
(they're clearly not one-liners and sometimes compilers uninline even
2 + 2). The first external call is necessary, but the rest 2+ are done
for nothing each time, plus a bunch of checks here and there.
Since Page Pool mappings are long-term and for one "device + addr" pair
dma_need_sync() will always return the same value (basically, whether it
belongs to an swiotlb pool), addresses can be tested once right after
they're obtained and the result can be reused until the page is unmapped.
Define new PP flag, which will mean "do DMA syncs for device, but only
when needed" and turn it on by default when the driver asks to sync
pages. When a page is mapped, check whether it needs syncs and if so,
replace that "sync when needed" back to "always do syncs" globally for
the whole pool (better safe than sorry). As long as a pool has no pages
requiring DMA syncs, this cuts off a good piece of calls and checks.
On my x86_64, this gives from 2% to 5% performance benefit with no
negative impact for cases when IOMMU is on and the shortcut can't be
used.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool.h |  3 +++
 net/core/page_pool.c    | 10 ++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index c8ec2f34722b..8435013de06e 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -46,6 +46,9 @@
 					* device driver responsibility
 					*/
 #define PP_FLAG_PAGE_FRAG	BIT(2) /* for page frag feature */
+#define PP_FLAG_DMA_MAYBE_SYNC	BIT(3) /* Internal, should not be used in the
+					* drivers
+					*/
 #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
 				 PP_FLAG_DMA_SYNC_DEV |\
 				 PP_FLAG_PAGE_FRAG)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index e212e9d7edcb..57f323dee6c4 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -175,6 +175,10 @@ static int page_pool_init(struct page_pool *pool,
 		/* pool->p.offset has to be set according to the address
 		 * offset used by the DMA engine to start copying rx data
 		 */
+
+		/* Try to avoid calling no-op syncs */
+		pool->p.flags |= PP_FLAG_DMA_MAYBE_SYNC;
+		pool->p.flags &= ~PP_FLAG_DMA_SYNC_DEV;
 	}
 
 	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
@@ -323,6 +327,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
 
 	page_pool_set_dma_addr(page, dma);
 
+	if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) &&
+	    dma_need_sync(pool->p.dev, dma)) {
+		pool->p.flags |= PP_FLAG_DMA_SYNC_DEV;
+		pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC;
+	}
+
 	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
 		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
 
-- 
2.40.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

WARNING: multiple messages have this Message-ID (diff)
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Magnus Karlsson <magnus.karlsson@intel.com>,
	Michal Kubiak <michal.kubiak@intel.com>,
	Larysa Zaremba <larysa.zaremba@intel.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Christoph Hellwig <hch@lst.de>,
	netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH net-next 06/11] net: page_pool: avoid calling no-op externals when possible
Date: Tue, 16 May 2023 18:18:36 +0200	[thread overview]
Message-ID: <20230516161841.37138-7-aleksander.lobakin@intel.com> (raw)
In-Reply-To: <20230516161841.37138-1-aleksander.lobakin@intel.com>

Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
even when on DMA-coherent platforms (like x86) with no active IOMMU or
swiotlb, just for the call ladder.
Indeed, it's

page_pool_put_page()
  page_pool_put_defragged_page()                  <- external
    __page_pool_put_page()
      page_pool_dma_sync_for_device()             <- non-inline
        dma_sync_single_range_for_device()
          dma_sync_single_for_device()            <- external
            dma_direct_sync_single_for_device()
              dev_is_dma_coherent()               <- exit

For the inline functions, no guarantees the compiler won't uninline them
(they're clearly not one-liners and sometimes compilers uninline even
2 + 2). The first external call is necessary, but the rest 2+ are done
for nothing each time, plus a bunch of checks here and there.
Since Page Pool mappings are long-term and for one "device + addr" pair
dma_need_sync() will always return the same value (basically, whether it
belongs to an swiotlb pool), addresses can be tested once right after
they're obtained and the result can be reused until the page is unmapped.
Define new PP flag, which will mean "do DMA syncs for device, but only
when needed" and turn it on by default when the driver asks to sync
pages. When a page is mapped, check whether it needs syncs and if so,
replace that "sync when needed" back to "always do syncs" globally for
the whole pool (better safe than sorry). As long as a pool has no pages
requiring DMA syncs, this cuts off a good piece of calls and checks.
On my x86_64, this gives from 2% to 5% performance benefit with no
negative impact for cases when IOMMU is on and the shortcut can't be
used.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool.h |  3 +++
 net/core/page_pool.c    | 10 ++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index c8ec2f34722b..8435013de06e 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -46,6 +46,9 @@
 					* device driver responsibility
 					*/
 #define PP_FLAG_PAGE_FRAG	BIT(2) /* for page frag feature */
+#define PP_FLAG_DMA_MAYBE_SYNC	BIT(3) /* Internal, should not be used in the
+					* drivers
+					*/
 #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
 				 PP_FLAG_DMA_SYNC_DEV |\
 				 PP_FLAG_PAGE_FRAG)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index e212e9d7edcb..57f323dee6c4 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -175,6 +175,10 @@ static int page_pool_init(struct page_pool *pool,
 		/* pool->p.offset has to be set according to the address
 		 * offset used by the DMA engine to start copying rx data
 		 */
+
+		/* Try to avoid calling no-op syncs */
+		pool->p.flags |= PP_FLAG_DMA_MAYBE_SYNC;
+		pool->p.flags &= ~PP_FLAG_DMA_SYNC_DEV;
 	}
 
 	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
@@ -323,6 +327,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
 
 	page_pool_set_dma_addr(page, dma);
 
+	if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) &&
+	    dma_need_sync(pool->p.dev, dma)) {
+		pool->p.flags |= PP_FLAG_DMA_SYNC_DEV;
+		pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC;
+	}
+
 	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
 		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
 
-- 
2.40.1


  parent reply	other threads:[~2023-05-16 16:20 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-16 16:18 [PATCH net-next 00/11] net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin
2023-05-16 16:18 ` [Intel-wired-lan] " Alexander Lobakin
2023-05-16 16:18 ` [PATCH net-next 01/11] net: intel: introduce Intel Ethernet common library Alexander Lobakin
2023-05-16 16:18   ` [Intel-wired-lan] " Alexander Lobakin
2023-05-16 16:18 ` [PATCH net-next 02/11] iavf: kill "legacy-rx" for good Alexander Lobakin
2023-05-16 16:18   ` [Intel-wired-lan] " Alexander Lobakin
2023-05-16 16:18 ` [PATCH net-next 03/11] iavf: optimize Rx buffer allocation a bunch Alexander Lobakin
2023-05-16 16:18   ` [Intel-wired-lan] " Alexander Lobakin
2023-05-16 16:18 ` [Intel-wired-lan] [PATCH net-next 04/11] iavf: remove page splitting/recycling Alexander Lobakin
2023-05-16 16:18   ` Alexander Lobakin
2023-05-16 16:18 ` [Intel-wired-lan] [PATCH net-next 05/11] iavf: always use a full order-0 page Alexander Lobakin
2023-05-16 16:18   ` Alexander Lobakin
2023-05-16 16:18 ` Alexander Lobakin [this message]
2023-05-16 16:18   ` [PATCH net-next 06/11] net: page_pool: avoid calling no-op externals when possible Alexander Lobakin
2023-05-17  8:14   ` Christoph Hellwig
2023-05-17  8:14     ` [Intel-wired-lan] " Christoph Hellwig
2023-05-18 13:26     ` Alexander Lobakin
2023-05-18 13:26       ` [Intel-wired-lan] " Alexander Lobakin
2023-05-18  4:08   ` Jakub Kicinski
2023-05-18  4:08     ` [Intel-wired-lan] " Jakub Kicinski
2023-05-18  4:54     ` Yunsheng Lin
2023-05-18  4:54       ` [Intel-wired-lan] " Yunsheng Lin
2023-05-18 13:29       ` Alexander Lobakin
2023-05-18 13:29         ` [Intel-wired-lan] " Alexander Lobakin
2023-05-18 13:34     ` Alexander Lobakin
2023-05-18 13:34       ` [Intel-wired-lan] " Alexander Lobakin
2023-05-18 15:02       ` Jakub Kicinski
2023-05-18 15:02         ` Jakub Kicinski
2023-05-16 16:18 ` [PATCH net-next 07/11] net: page_pool: add DMA-sync-for-CPU inline helpers Alexander Lobakin
2023-05-16 16:18   ` [Intel-wired-lan] " Alexander Lobakin
2023-05-18  4:12   ` Jakub Kicinski
2023-05-18  4:12     ` [Intel-wired-lan] " Jakub Kicinski
2023-05-18  7:03     ` Ilias Apalodimas
2023-05-18  7:03       ` [Intel-wired-lan] " Ilias Apalodimas
2023-05-18 13:53       ` Alexander Lobakin
2023-05-18 13:53         ` [Intel-wired-lan] " Alexander Lobakin
2023-05-18 13:45     ` Alexander Lobakin
2023-05-18 13:45       ` Alexander Lobakin
2023-05-18 14:56       ` Jakub Kicinski
2023-05-18 14:56         ` Jakub Kicinski
2023-05-18 15:41         ` Alexander Lobakin
2023-05-18 15:41           ` Alexander Lobakin
2023-05-18 20:36           ` Jakub Kicinski
2023-05-18 20:36             ` Jakub Kicinski
2023-05-19 13:56             ` Alexander Lobakin
2023-05-19 13:56               ` Alexander Lobakin
2023-05-19 20:45               ` Jakub Kicinski
2023-05-19 20:45                 ` Jakub Kicinski
2023-05-22 13:48                 ` Alexander Lobakin
2023-05-22 13:48                   ` Alexander Lobakin
2023-05-22 15:27                   ` Magnus Karlsson
2023-05-22 15:27                     ` Magnus Karlsson
2023-05-18  4:19   ` Jakub Kicinski
2023-05-18  4:19     ` [Intel-wired-lan] " Jakub Kicinski
2023-05-16 16:18 ` [PATCH net-next 08/11] iavf: switch to Page Pool Alexander Lobakin
2023-05-16 16:18   ` [Intel-wired-lan] " Alexander Lobakin
2023-05-23 22:42   ` David Christensen
2023-05-23 22:42     ` [Intel-wired-lan] " David Christensen
2023-05-25 11:08     ` Alexander Lobakin
2023-05-25 11:08       ` [Intel-wired-lan] " Alexander Lobakin
2023-05-31 20:18       ` David Christensen
2023-06-02 13:25         ` Alexander Lobakin
2023-06-02 13:25           ` [Intel-wired-lan] " Alexander Lobakin
2023-05-16 16:18 ` [PATCH net-next 09/11] libie: add common queue stats Alexander Lobakin
2023-05-16 16:18   ` [Intel-wired-lan] " Alexander Lobakin
2023-05-16 16:18 ` [Intel-wired-lan] [PATCH net-next 10/11] libie: add per-queue Page Pool stats Alexander Lobakin
2023-05-16 16:18   ` Alexander Lobakin
2023-05-18  4:19   ` Jakub Kicinski
2023-05-18  4:19     ` [Intel-wired-lan] " Jakub Kicinski
2023-05-18 13:47     ` Alexander Lobakin
2023-05-18 13:47       ` [Intel-wired-lan] " Alexander Lobakin
2023-05-22 15:05   ` Paul Menzel
2023-05-22 15:05     ` Paul Menzel
2023-05-22 15:32     ` Alexander Lobakin
2023-05-22 15:32       ` Alexander Lobakin
2023-05-16 16:18 ` [Intel-wired-lan] [PATCH net-next 11/11] iavf: switch queue stats to libie Alexander Lobakin
2023-05-16 16:18   ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230516161841.37138-7-aleksander.lobakin@intel.com \
    --to=aleksander.lobakin@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=hch@lst.de \
    --cc=ilias.apalodimas@linaro.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=michal.kubiak@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.