All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Joerg Roedel <joro@8bytes.org>, Jens Axboe <axboe@kernel.dk>,
	Michal Simek <monstr@monstr.eu>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Julia Lawall <Julia.Lawall@lip6.fr>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	David Woodhouse <dwmw2@infradead.org>,
	Christoph Hellwig <hch@lst.de>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 1/2] scatterlist: use sg_phys()
Date: Wed, 10 Jun 2015 09:57:06 -0700	[thread overview]
Message-ID: <CAPcyv4i0MB9GTL4ujtybwBLB9sV+n0PX686Oi-Rt5CGEeey_hQ@mail.gmail.com> (raw)
In-Reply-To: <20150610163121.GP7557@n2100.arm.linux.org.uk>

On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
>> On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> > On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
>> >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> >> index 7e7583ddd607..9f6ff6671f01 100644
>> >> --- a/arch/arm/mm/dma-mapping.c
>> >> +++ b/arch/arm/mm/dma-mapping.c
>> >> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>> >>               return -ENOMEM;
>> >>
>> >>       for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> >> -             phys_addr_t phys = page_to_phys(sg_page(s));
>> >> +             phys_addr_t phys = sg_phys(s) - s->offset;
>> >
>> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
>> > which makes the above statement to:
>> >
>> >         page_to_phys(sg_page(s)) + s->offset - s->offset;
>> >
>> > The compiler will probably optimize that away, but it still doesn't look
>> > like an improvement.
>>
>> The goal is to eventually stop leaking struct page deep into the i/o
>> stack.  Anything that relies on being able to retrieve a struct page
>> out of an sg entry needs to be converted.  I think we need a new
>> helper for this case "sg_phys_aligned()?".
>
> Why?  The aim of the code is not to detect whether the address is aligned
> to a page (if it were, it'd be testing for a zero s->offset, or it would
> be testing for an s->offset being a multiple of the page size.
>
> Let's first understand the code that's being modified (which seems to be
> something which isn't done very much today - people seem to just like
> changing code for the hell of it.)
>
>         for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>                 phys_addr_t phys = page_to_phys(sg_page(s));
>                 unsigned int len = PAGE_ALIGN(s->offset + s->length);
>
>                 if (!is_coherent &&
>                         !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
>                         __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
> dir);
>
>                 prot = __dma_direction_to_prot(dir);
>
>                 ret = iommu_map(mapping->domain, iova, phys, len, prot);
>                 if (ret < 0)
>                         goto fail;
>                 count += len >> PAGE_SHIFT;
>                 iova += len;
>         }
>
> What it's doing is trying to map each scatterlist entry via an IOMMU.
> Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
> page.
>
> However, what says that the IOMMU page size is the same as the host CPU
> page size - it may not be... so the above code is wrong for a completely
> different reason.
>
> What we _should_ be doing is finding out what the IOMMU minimum page
> size is, and using that in conjunction with the sg_phys() of the
> scatterlist entry to determine the start and length of the mapping
> such that the IOMMU mapping covers the range described by the scatterlist
> entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
> minimum page size.)
>
> However, what we can also see from the above is that we have other code
> here using sg_page() - which is a necessity to be able to perform the
> required DMA cache maintanence to ensure that the data is visible to the
> DMA device.  We need to kmap_atomic() these in order to flush them, and
> there's no other way other than struct page to represent a highmem page.
>
> So, I think your intent to want to remove struct page from the I/O
> subsystem is a false hope, unless you want to end up rewriting lots of
> architecture code, and you can come up with an alternative method to
> represent highmem pages.

I think there will always be cases that need to do pfn_to_page() for
buffer management.  Those configurations will be blocked from seeing
cases where a pfn is not struct page backed.  So, you can have highmem
or dma to pmem, but not both.  Christoph, this is why I have Kconfig
symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
i/o.

WARNING: multiple messages have this Message-ID (diff)
From: dan.j.williams@intel.com (Dan Williams)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 1/2] scatterlist: use sg_phys()
Date: Wed, 10 Jun 2015 09:57:06 -0700	[thread overview]
Message-ID: <CAPcyv4i0MB9GTL4ujtybwBLB9sV+n0PX686Oi-Rt5CGEeey_hQ@mail.gmail.com> (raw)
In-Reply-To: <20150610163121.GP7557@n2100.arm.linux.org.uk>

On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
>> On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> > On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
>> >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> >> index 7e7583ddd607..9f6ff6671f01 100644
>> >> --- a/arch/arm/mm/dma-mapping.c
>> >> +++ b/arch/arm/mm/dma-mapping.c
>> >> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>> >>               return -ENOMEM;
>> >>
>> >>       for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> >> -             phys_addr_t phys = page_to_phys(sg_page(s));
>> >> +             phys_addr_t phys = sg_phys(s) - s->offset;
>> >
>> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
>> > which makes the above statement to:
>> >
>> >         page_to_phys(sg_page(s)) + s->offset - s->offset;
>> >
>> > The compiler will probably optimize that away, but it still doesn't look
>> > like an improvement.
>>
>> The goal is to eventually stop leaking struct page deep into the i/o
>> stack.  Anything that relies on being able to retrieve a struct page
>> out of an sg entry needs to be converted.  I think we need a new
>> helper for this case "sg_phys_aligned()?".
>
> Why?  The aim of the code is not to detect whether the address is aligned
> to a page (if it were, it'd be testing for a zero s->offset, or it would
> be testing for an s->offset being a multiple of the page size.
>
> Let's first understand the code that's being modified (which seems to be
> something which isn't done very much today - people seem to just like
> changing code for the hell of it.)
>
>         for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>                 phys_addr_t phys = page_to_phys(sg_page(s));
>                 unsigned int len = PAGE_ALIGN(s->offset + s->length);
>
>                 if (!is_coherent &&
>                         !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
>                         __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
> dir);
>
>                 prot = __dma_direction_to_prot(dir);
>
>                 ret = iommu_map(mapping->domain, iova, phys, len, prot);
>                 if (ret < 0)
>                         goto fail;
>                 count += len >> PAGE_SHIFT;
>                 iova += len;
>         }
>
> What it's doing is trying to map each scatterlist entry via an IOMMU.
> Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
> page.
>
> However, what says that the IOMMU page size is the same as the host CPU
> page size - it may not be... so the above code is wrong for a completely
> different reason.
>
> What we _should_ be doing is finding out what the IOMMU minimum page
> size is, and using that in conjunction with the sg_phys() of the
> scatterlist entry to determine the start and length of the mapping
> such that the IOMMU mapping covers the range described by the scatterlist
> entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
> minimum page size.)
>
> However, what we can also see from the above is that we have other code
> here using sg_page() - which is a necessity to be able to perform the
> required DMA cache maintanence to ensure that the data is visible to the
> DMA device.  We need to kmap_atomic() these in order to flush them, and
> there's no other way other than struct page to represent a highmem page.
>
> So, I think your intent to want to remove struct page from the I/O
> subsystem is a false hope, unless you want to end up rewriting lots of
> architecture code, and you can come up with an alternative method to
> represent highmem pages.

I think there will always be cases that need to do pfn_to_page() for
buffer management.  Those configurations will be blocked from seeing
cases where a pfn is not struct page backed.  So, you can have highmem
or dma to pmem, but not both.  Christoph, this is why I have Kconfig
symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
i/o.

  reply	other threads:[~2015-06-10 16:57 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-09 16:27 [PATCH 0/2] scatterlist cleanups Dan Williams
2015-06-09 16:27 ` Dan Williams
2015-06-09 16:27 ` [PATCH 1/2] scatterlist: use sg_phys() Dan Williams
2015-06-09 16:27   ` Dan Williams
2015-06-10  0:34   ` Elliott, Robert (Server Storage)
2015-06-10  0:34     ` Elliott, Robert (Server Storage)
2015-06-10  9:32   ` Joerg Roedel
2015-06-10  9:32     ` Joerg Roedel
2015-06-10 16:00     ` Dan Williams
2015-06-10 16:00       ` Dan Williams
2015-06-10 16:31       ` Russell King - ARM Linux
2015-06-10 16:31         ` Russell King - ARM Linux
2015-06-10 16:57         ` Dan Williams [this message]
2015-06-10 16:57           ` Dan Williams
2015-06-10 17:13           ` Russell King - ARM Linux
2015-06-10 17:13             ` Russell King - ARM Linux
2015-06-10 17:25             ` Dan Williams
2015-06-10 17:25               ` Dan Williams
2015-06-11  6:50       ` Joerg Roedel
2015-06-11  6:50         ` Joerg Roedel
2015-06-09 16:27 ` [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end() Dan Williams
2015-06-09 16:27   ` Dan Williams
2015-06-10  5:38   ` Herbert Xu
2015-06-10  5:38     ` Herbert Xu
2015-06-11  7:31     ` Christoph Hellwig
2015-06-11  7:31       ` Christoph Hellwig
2015-06-11  7:34       ` Herbert Xu
2015-06-11  7:34         ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4i0MB9GTL4ujtybwBLB9sV+n0PX686Oi-Rt5CGEeey_hQ@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=Julia.Lawall@lip6.fr \
    --cc=axboe@kernel.dk \
    --cc=dmaengine@vger.kernel.org \
    --cc=dwmw2@infradead.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=monstr@monstr.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.