Re: [PATCH net-next 07/11] net: page_pool: add DMA-sync-for-CPU inline helpers

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Magnus Karlsson <magnus.karlsson@intel.com>,
	Michal Kubiak <michal.kubiak@intel.com>,
	Larysa Zaremba <larysa.zaremba@intel.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next 07/11] net: page_pool: add DMA-sync-for-CPU inline helpers
Date: Thu, 18 May 2023 10:03:41 +0300	[thread overview]
Message-ID: <ZGXNzX77/5cXqAhe@hera> (raw)
In-Reply-To: <20230517211211.1d1bbd0b@kernel.org>

Hi all,

> On Wed, May 17, 2023 at 09:12:11PM -0700, Jakub Kicinski wrote:
> On Tue, 16 May 2023 18:18:37 +0200 Alexander Lobakin wrote:
> > Each driver is responsible for syncing buffers written by HW for CPU
> > before accessing them. Almost each PP-enabled driver uses the same
> > pattern, which could be shorthanded into a static inline to make driver
> > code a little bit more compact.
> > Introduce a couple such functions. The first one takes the actual size
> > of the data written by HW and is the main one to be used on Rx. The
> > second does the same, but only if the PP performs DMA synchronizations
> > at all. The last one picks max_len from the PP params and is designed
> > for more extreme cases when the size is unknown, but the buffer still
> > needs to be synced.
> > Also constify pointer arguments of page_pool_get_dma_dir() and
> > page_pool_get_dma_addr() to give a bit more room for optimization,
> > as both of them are read-only.
>
> Very neat.
>
> > diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> > index 8435013de06e..f740c50b661f 100644
> > --- a/include/net/page_pool.h
> > +++ b/include/net/page_pool.h
> > @@ -32,7 +32,7 @@
> >
> >  #include <linux/mm.h> /* Needed by ptr_ring */
> >  #include <linux/ptr_ring.h>
> > -#include <linux/dma-direction.h>
> > +#include <linux/dma-mapping.h>
>
> highly nit picky - but isn't dma-mapping.h pretty heavy?
> And we include page_pool.h in skbuff.h. Not that it matters
> today, but maybe one day we'll succeed putting skbuff.h
> on a diet -- so perhaps it's better to put "inline helpers
> with non-trivial dependencies" into a new header?
>
> >  #define PP_FLAG_DMA_MAP		BIT(0) /* Should page_pool do the DMA
> >  					* map/unmap
>
> > +/**
> > + * page_pool_dma_sync_for_cpu - sync Rx page for CPU after it's written by HW
> > + * @pool: page_pool which this page belongs to
> > + * @page: page to sync
> > + * @dma_sync_size: size of the data written to the page
> > + *
> > + * Can be used as a shorthand to sync Rx pages before accessing them in the
> > + * driver. Caller must ensure the pool was created with %PP_FLAG_DMA_MAP.
> > + */
> > +static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
> > +					      const struct page *page,
> > +					      u32 dma_sync_size)
> > +{
> > +	dma_sync_single_range_for_cpu(pool->p.dev,
> > +				      page_pool_get_dma_addr(page),
> > +				      pool->p.offset, dma_sync_size,
> > +				      page_pool_get_dma_dir(pool));
>
> Likely a dumb question but why does this exist?
> Is there a case where the "maybe" version is not safe?
>

I got similar concerns here.  Syncing for the cpu is currently a
responsibility for the driver.  The reason for having an automated DMA sync
is that we know when we allocate buffers for the NIC to consume so we can
safely sync them accordingly.  I am fine having a page pool version for the
cpu sync, but do we really have to check the pp flags for that?  IOW if you
are at the point that you need to sync a buffer for the cpu *someone*
already mapped it for you.  Regardsless of who mapped it the sync is
identical

> > +}
> > +
> > +/**
> > + * page_pool_dma_maybe_sync_for_cpu - sync Rx page for CPU if needed
> > + * @pool: page_pool which this page belongs to
> > + * @page: page to sync
> > + * @dma_sync_size: size of the data written to the page
> > + *
> > + * Performs DMA sync for CPU, but only when required (swiotlb, IOMMU etc.).
> > + */
> > +static inline void
> > +page_pool_dma_maybe_sync_for_cpu(const struct page_pool *pool,
> > +				 const struct page *page, u32 dma_sync_size)
> > +{
> > +	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> > +		page_pool_dma_sync_for_cpu(pool, page, dma_sync_size);
> > +}

Thanks
/Ilias