All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Osipenko <digetx-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jonathan Hunter
	<jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Thierry Reding
	<thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v1 4/4] iommu/tegra: gart: Optimize map/unmap
Date: Mon, 7 May 2018 18:51:38 +0300	[thread overview]
Message-ID: <a6e26d17-97ac-d02e-7bf4-f009af1a25dc@gmail.com> (raw)
In-Reply-To: <20180507080420.GB18595-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>

On 07.05.2018 11:04, Joerg Roedel wrote:
> On Mon, May 07, 2018 at 12:19:01AM +0300, Dmitry Osipenko wrote:
>> Probably the best variant would be to give an explicit control over syncing to a
>> user of the IOMMU API, like for example device driver may perform multiple
>> mappings / unmappings and then sync/flush in the end. I'm not sure that it's
>> really worth the hassle to shuffle the API right now, maybe we can implement it
>> later if needed. Joerg, do you have objections to a 'compound page' approach?
> 
> Have you measured the performance difference on both variants? The
> compound-page approach only works for cases when the physical memory you
> map contiguous and correctly aligned.

Yes, previously I actually only tested mapping of the contiguous allocations
(used for memory isolation purposes). But now I've re-tested all variants and
got somewhat interesting results.

Firstly it is not that easy to test a really sparse mapping simply because
memory allocator produces sparse allocation only when memory is _really_
fragmented. Pretty much all of the time the sparse allocations are contiguous or
they consist of a very few chunks that do not impose any noticeable performance
impact.

Secondly, the interesting part is that mapping / unmapping of a contiguous
allocation (CMA using DMA API) is slower by ~50% then doing it for a sparse
allocation (get_pages using bare IOMMU API). /I think/ it's a shortcoming of the
arch/arm/mm/dma-mapping.c, which also suffers from other inflexibilities that
Thierry faced recently. Though I haven't really tried to figure out what is the
bottleneck yet and Thierry was going to re-write ARM's dma-mapping
implementation anyway, I'll take a closer look at this issue a bit later.

I've implemented the iotlb_sync_map() and tested things with it. The end result
is the same as for the compound page approach, simply because actual allocations
are pretty much always contiguous.

> If it is really needed I would prefer a separate iotlb_sync_map()
> call-back that is just NULL when not needed. This way all users that
> don't need it only get a minimal penalty in the mapping path and you
> don't have any requirements on the physical memory you map to get good
> performance.
Summarizing, the iotlb_sync_map() is indeed better way. As you rightly noticed,
that approach is also optimal for the non-contiguous cases as we won't have to
flush on mapping of each contiguous chunk of the sparse allocation, but after
the whole mapping is done.

Thierry, Robin and Joerg - thanks for your input, I'll prepare patches
implementing the iotlb_sync_map.

WARNING: multiple messages have this Message-ID (diff)
From: Dmitry Osipenko <digetx@gmail.com>
To: Joerg Roedel <joro@8bytes.org>
Cc: Robin Murphy <robin.murphy@arm.com>,
	Thierry Reding <thierry.reding@gmail.com>,
	linux-tegra@vger.kernel.org, iommu@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org,
	Jonathan Hunter <jonathanh@nvidia.com>
Subject: Re: [PATCH v1 4/4] iommu/tegra: gart: Optimize map/unmap
Date: Mon, 7 May 2018 18:51:38 +0300	[thread overview]
Message-ID: <a6e26d17-97ac-d02e-7bf4-f009af1a25dc@gmail.com> (raw)
In-Reply-To: <20180507080420.GB18595@8bytes.org>

On 07.05.2018 11:04, Joerg Roedel wrote:
> On Mon, May 07, 2018 at 12:19:01AM +0300, Dmitry Osipenko wrote:
>> Probably the best variant would be to give an explicit control over syncing to a
>> user of the IOMMU API, like for example device driver may perform multiple
>> mappings / unmappings and then sync/flush in the end. I'm not sure that it's
>> really worth the hassle to shuffle the API right now, maybe we can implement it
>> later if needed. Joerg, do you have objections to a 'compound page' approach?
> 
> Have you measured the performance difference on both variants? The
> compound-page approach only works for cases when the physical memory you
> map contiguous and correctly aligned.

Yes, previously I actually only tested mapping of the contiguous allocations
(used for memory isolation purposes). But now I've re-tested all variants and
got somewhat interesting results.

Firstly it is not that easy to test a really sparse mapping simply because
memory allocator produces sparse allocation only when memory is _really_
fragmented. Pretty much all of the time the sparse allocations are contiguous or
they consist of a very few chunks that do not impose any noticeable performance
impact.

Secondly, the interesting part is that mapping / unmapping of a contiguous
allocation (CMA using DMA API) is slower by ~50% then doing it for a sparse
allocation (get_pages using bare IOMMU API). /I think/ it's a shortcoming of the
arch/arm/mm/dma-mapping.c, which also suffers from other inflexibilities that
Thierry faced recently. Though I haven't really tried to figure out what is the
bottleneck yet and Thierry was going to re-write ARM's dma-mapping
implementation anyway, I'll take a closer look at this issue a bit later.

I've implemented the iotlb_sync_map() and tested things with it. The end result
is the same as for the compound page approach, simply because actual allocations
are pretty much always contiguous.

> If it is really needed I would prefer a separate iotlb_sync_map()
> call-back that is just NULL when not needed. This way all users that
> don't need it only get a minimal penalty in the mapping path and you
> don't have any requirements on the physical memory you map to get good
> performance.
Summarizing, the iotlb_sync_map() is indeed better way. As you rightly noticed,
that approach is also optimal for the non-contiguous cases as we won't have to
flush on mapping of each contiguous chunk of the sparse allocation, but after
the whole mapping is done.

Thierry, Robin and Joerg - thanks for your input, I'll prepare patches
implementing the iotlb_sync_map.

  parent reply	other threads:[~2018-05-07 15:51 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-09 20:07 [PATCH v1 0/4] Tegra GART fixes and improvements Dmitry Osipenko
2018-04-09 20:07 ` Dmitry Osipenko
     [not found] ` <cover.1523304324.git.digetx-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-09 20:07   ` [PATCH v1 1/4] iommu/tegra: gart: Add debugging facility Dmitry Osipenko
2018-04-09 20:07     ` Dmitry Osipenko
2018-04-27  9:46     ` Thierry Reding
2018-04-09 20:07   ` [PATCH v1 2/4] iommu/tegra: gart: Fix gart_iommu_unmap() Dmitry Osipenko
2018-04-09 20:07     ` Dmitry Osipenko
     [not found]     ` <dd25a9ff7bad7c92b345c0c0ce2bf235c4c3b6e8.1523304324.git.digetx-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-27  9:43       ` Thierry Reding
2018-04-27  9:43         ` Thierry Reding
2018-04-09 20:07   ` [PATCH v1 3/4] iommu/tegra: gart: Constify number of GART pages Dmitry Osipenko
2018-04-09 20:07     ` Dmitry Osipenko
     [not found]     ` <954659be6760130f6ffd5e733db2ad58cbb8e6e4.1523304324.git.digetx-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-27  9:49       ` Thierry Reding
2018-04-27  9:49         ` Thierry Reding
2018-04-09 20:07   ` [PATCH v1 4/4] iommu/tegra: gart: Optimize map/unmap Dmitry Osipenko
2018-04-09 20:07     ` Dmitry Osipenko
     [not found]     ` <f21a7b6a8f141b87f75687904a76f3728ea639a8.1523304324.git.digetx-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-27 10:02       ` Thierry Reding
2018-04-27 10:02         ` Thierry Reding
2018-04-27 12:01         ` Dmitry Osipenko
2018-04-27 12:36         ` Robin Murphy
2018-04-27 12:36           ` Robin Murphy
     [not found]           ` <716edf58-38a7-21e5-1668-b866bf392e34-5wv7dgnIgG8@public.gmane.org>
2018-05-06 21:19             ` Dmitry Osipenko
2018-05-06 21:19               ` Dmitry Osipenko
     [not found]               ` <6827bda3-1aa2-da60-a749-8e2dd2e595f3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-05-07  8:04                 ` Joerg Roedel
2018-05-07  8:04                   ` Joerg Roedel
     [not found]                   ` <20180507080420.GB18595-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2018-05-07 15:51                     ` Dmitry Osipenko [this message]
2018-05-07 15:51                       ` Dmitry Osipenko
     [not found]                       ` <a6e26d17-97ac-d02e-7bf4-f009af1a25dc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-05-07 17:38                         ` Dmitry Osipenko
2018-05-07 17:38                           ` Dmitry Osipenko
2018-05-07  7:59         ` Joerg Roedel
2018-05-07  7:59           ` Joerg Roedel
     [not found]           ` <20180507075920.GA18595-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2018-05-07 15:46             ` Dmitry Osipenko
2018-05-07 15:46               ` Dmitry Osipenko
2018-05-03 12:52   ` [PATCH v1 0/4] Tegra GART fixes and improvements Joerg Roedel
2018-05-03 12:52     ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6e26d17-97ac-d02e-7bf4-f009af1a25dc@gmail.com \
    --to=digetx-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
    --cc=joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.