All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicoleotsuka@gmail.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: joro@8bytes.org, iommu@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist
Date: Fri, 2 Nov 2018 16:36:13 -0700	[thread overview]
Message-ID: <20181102233613.GA26856@Asurada-Nvidia.nvidia.com> (raw)
In-Reply-To: <63184c22-4587-be2b-8089-d2e4e0b5482d@arm.com>

On Fri, Nov 02, 2018 at 04:54:07PM +0000, Robin Murphy wrote:
> On 01/11/2018 21:35, Nicolin Chen wrote:
> > The __GFP_ZERO will be passed down to the generic page allocation
> > routine which zeros everything page by page. This is safe to be a
> > generic way but not efficient for iommu allocation that organizes
> > contiguous pages using scatterlist.
> > 
> > So this changes drops __GFP_ZERO from the flag, and adds a manual
> > memset after page/sg allocations, using the length of scatterlist.
> > 
> > My test result of a 2.5MB size allocation shows iommu_dma_alloc()
> > takes 46% less time, reduced from averagely 925 usec to 500 usec.
> 
> Assuming this is for arm64, I'm somewhat surprised that memset() could be
> that much faster than clear_page(), since they should effectively amount to
> the same thing (a DC ZVA loop). What hardware is this on? Profiling to try

I am running with tegra186-p2771-0000.dtb so it's arm64 yes.

> and see exactly where the extra time goes would be interesting too.

I re-ran the test to get some accuracy within the function and got:
1) pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
   // reduced from 422 usec to 56 usec == 366 usec less
2) if (!(prot & IOMMU_CACHE)) {...}	//flush routine
   // reduced from 439 usec to 236 usec == 203 usec less
Note: new memset takes about 164 usec, resulting in 400 usec diff
      for the entire iommu_dma_alloc() function call.

It looks like this might be more than the diff between clear_page
and memset, and might be related to mapping and cache. Any idea?

> > @@ -568,6 +571,15 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
> >   	if (attrs & DMA_ATTR_ALLOC_SINGLE_PAGES)
> >   		alloc_sizes = min_size;
> > +	/*
> > +	 * The generic zeroing in a length of one page size is slow,
> > +	 * so do it manually in a length of scatterlist size instead
> > +	 */
> > +	if (gfp & __GFP_ZERO) {
> > +		gfp &= ~__GFP_ZERO;
> > +		gfp_zero = true;
> > +	}
> 
> Or just mask it out in __iommu_dma_alloc_pages()?

Yea, the change here would be neater then.

> > @@ -581,6 +593,12 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
> >   	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
> >   		goto out_free_iova;
> > +	if (gfp_zero) {
> > +		/* Now zero all the pages in the scatterlist */
> > +		for_each_sg(sgt.sgl, s, sgt.orig_nents, i)
> > +			memset(sg_virt(s), 0, s->length);
> 
> What if the pages came from highmem? I know that doesn't happen on arm64
> today, but the point of this code *is* to be generic, and other users will
> arrive eventually.

Hmm, so it probably should use sg_miter_start/stop() too? Looking
at the flush routine doing in PAGE_SIZE for each iteration, would
be possible to map and memset contiguous pages together? Actually
the flush routine might be also optimized if we can map contiguous
pages.

Thank you
Nicolin

  reply	other threads:[~2018-11-02 23:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-01 21:35 [PATCH] iommu/dma: Zero pages manually in a length of scatterlist Nicolin Chen
2018-11-02 16:54 ` Robin Murphy
2018-11-02 23:36   ` Nicolin Chen [this message]
2018-11-05 14:58     ` Christoph Hellwig
2018-11-06 14:39       ` Robin Murphy
2018-11-09  7:45         ` Christoph Hellwig
2018-11-06 18:27     ` Robin Murphy
2018-11-07  0:11       ` Nicolin Chen
2018-11-04 15:50 ` Christoph Hellwig
2018-11-06 23:46   ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181102233613.GA26856@Asurada-Nvidia.nvidia.com \
    --to=nicoleotsuka@gmail.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.