All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Song Bao Hua <song.bao.hua@hisilicon.com>, "hch@lst.de" <hch@lst.de>
Cc: "davidm@hpl.hp.com" <davidm@hpl.hp.com>,
	"ralf@oss.sgi.com" <ralf@oss.sgi.com>,
	Linuxarm <linuxarm@huawei.com>,
	"linux@armlinux.org.uk" <linux@armlinux.org.uk>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"sailer@ife.ee.ethz.ch" <sailer@ife.ee.ethz.ch>,
	"Jay.Estabrook@compaq.com" <Jay.Estabrook@compaq.com>,
	"dagum@barrel.engr.sgi.com" <dagum@barrel.engr.sgi.com>,
	"andrea@suse.de" <andrea@suse.de>,
	"grundler@cup.hp.com" <grundler@cup.hp.com>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem
Date: Fri, 15 May 2020 23:12:19 +0100	[thread overview]
Message-ID: <f403a17b-bc3d-436a-4711-93cd8d4537ba@arm.com> (raw)
In-Reply-To: <B926444035E5E2439431908E3842AFD249F9F4@DGGEMI525-MBS.china.huawei.com>

On 2020-05-15 22:33, Song Bao Hua wrote:
>> Subject: Re: Constantly map and unmap of streaming DMA buffers with
>> IOMMU backend might cause serious performance problem
>>
>> On Fri, May 15, 2020 at 01:10:21PM +0100, Robin Murphy wrote:
>>>> Meanwhile, for the safety of buffers, lower-layer drivers need to make
>> certain the buffers have already been unmapped in iommu before those
>> buffers go back to buddy for other users.
>>>
>>> That sounds like it would only have benefit in a very small set of specific
>>> circumstances, and would be very difficult to generalise to buffers that
>>> are mapped via dma_map_page() or dma_map_single(). Furthermore, a
>>> high-level API that affects a low-level driver's interpretation of
>>> mid-layer API calls without the mid-layer's knowledge sounds like a hideous
>>> abomination of anti-design. If a mid-layer API lends itself to inefficiency
>>> at the lower level, it would seem a lot cleaner and more robust to extend
>>> *that* API for stateful buffer reuse. Failing that, it might possibly be
>>> appropriate to approach this at the driver level - many of the cleverer
>>> network drivers already implement buffer pools to recycle mapped SKBs
>>> internally, couldn't the "zip driver" simply try doing something like that
>>> for itself?
>>
>> Exactly.  If you upper consumer of the DMA API keeps reusing the same
>> pages just map them once and use dma_sync_* to transfer ownership as
>> needed.
> 
> The problem is that the lower-layer drivers don't know if upper consumer keeps reusing the same pages. They are running in different software layers.
> For example, Consumer is here in mm/zswap.c
> static int zswap_frontswap_store(unsigned type, pgoff_t offset,
> 				struct page *page)
> {
> 	...
> 	/* compress */
> 	dst = get_cpu_var(zswap_dstmem);
> 	...
> 	ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen);
> 	...
> }
> 
> But the lower-layer driver is in drivers/crypto/...
> 
> Meanwhile, the lower-layer driver couldn't cache the pointers of buffer address coming from consumers to detect if the upper-layer is using the same page.
> Because the same page might come from different users or come from the different stages of the same user with different permissions.

Indeed the driver can't cache arbitrary pointers, but if typical buffers 
are small enough it can copy the data into its own already-mapped page, 
dma_sync it, and perform the DMA operation from there. That might even 
be more or less what your first suggestion was, but I'm still not quite 
sure.

> For example, consumer A uses the buffer as destination, then returns it to buddy, but consumer B gets the same buffer and uses it as source.
> 
> Another possibility is
> Consumer A uses the buffer, returns it to buddy, after some time, it allocates a buffer again, but gets the same buffer from buddy like before.
> 
> For the safety of the buffer, lower-layer driver must guarantee the buffer is unmapped when the buffer returns to buddy.
> 
> I think only the upper-layer consumer knows if it is reusing the buffer.

Right, and if reusing buffers is common in crypto callers, then there's 
an argument for "set up reusable buffer", "process updated buffer" and 
"clean up buffer" operations to be added to the crypto API itself, such 
that the underlying drivers can then optimise for DMA usage in a robust 
and obvious way if they want to (or just implement the setup and 
teardown as no-ops and still do a full map/unmap in each "process" call 
if they don't).

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID
From: Robin Murphy <robin.murphy@arm.com>
To: Song Bao Hua <song.bao.hua@hisilicon.com>, "hch@lst.de" <hch@lst.de>
Cc: "davidm@hpl.hp.com" <davidm@hpl.hp.com>,
	"ralf@oss.sgi.com" <ralf@oss.sgi.com>,
	Linuxarm <linuxarm@huawei.com>,
	"linux@armlinux.org.uk" <linux@armlinux.org.uk>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"sailer@ife.ee.ethz.ch" <sailer@ife.ee.ethz.ch>,
	"Jay.Estabrook@compaq.com" <Jay.Estabrook@compaq.com>,
	"dagum@barrel.engr.sgi.com" <dagum@barrel.engr.sgi.com>,
	"andrea@suse.de" <andrea@suse.de>,
	"grundler@cup.hp.com" <grundler@cup.hp.com>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"m.szyprowski@samsung.com" <m.szyprowski@samsung.com>
Subject: Re: Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem
Date: Fri, 15 May 2020 23:12:19 +0100	[thread overview]
Message-ID: <f403a17b-bc3d-436a-4711-93cd8d4537ba@arm.com> (raw)
In-Reply-To: <B926444035E5E2439431908E3842AFD249F9F4@DGGEMI525-MBS.china.huawei.com>

On 2020-05-15 22:33, Song Bao Hua wrote:
>> Subject: Re: Constantly map and unmap of streaming DMA buffers with
>> IOMMU backend might cause serious performance problem
>>
>> On Fri, May 15, 2020 at 01:10:21PM +0100, Robin Murphy wrote:
>>>> Meanwhile, for the safety of buffers, lower-layer drivers need to make
>> certain the buffers have already been unmapped in iommu before those
>> buffers go back to buddy for other users.
>>>
>>> That sounds like it would only have benefit in a very small set of specific
>>> circumstances, and would be very difficult to generalise to buffers that
>>> are mapped via dma_map_page() or dma_map_single(). Furthermore, a
>>> high-level API that affects a low-level driver's interpretation of
>>> mid-layer API calls without the mid-layer's knowledge sounds like a hideous
>>> abomination of anti-design. If a mid-layer API lends itself to inefficiency
>>> at the lower level, it would seem a lot cleaner and more robust to extend
>>> *that* API for stateful buffer reuse. Failing that, it might possibly be
>>> appropriate to approach this at the driver level - many of the cleverer
>>> network drivers already implement buffer pools to recycle mapped SKBs
>>> internally, couldn't the "zip driver" simply try doing something like that
>>> for itself?
>>
>> Exactly.  If you upper consumer of the DMA API keeps reusing the same
>> pages just map them once and use dma_sync_* to transfer ownership as
>> needed.
> 
> The problem is that the lower-layer drivers don't know if upper consumer keeps reusing the same pages. They are running in different software layers.
> For example, Consumer is here in mm/zswap.c
> static int zswap_frontswap_store(unsigned type, pgoff_t offset,
> 				struct page *page)
> {
> 	...
> 	/* compress */
> 	dst = get_cpu_var(zswap_dstmem);
> 	...
> 	ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen);
> 	...
> }
> 
> But the lower-layer driver is in drivers/crypto/...
> 
> Meanwhile, the lower-layer driver couldn't cache the pointers of buffer address coming from consumers to detect if the upper-layer is using the same page.
> Because the same page might come from different users or come from the different stages of the same user with different permissions.

Indeed the driver can't cache arbitrary pointers, but if typical buffers 
are small enough it can copy the data into its own already-mapped page, 
dma_sync it, and perform the DMA operation from there. That might even 
be more or less what your first suggestion was, but I'm still not quite 
sure.

> For example, consumer A uses the buffer as destination, then returns it to buddy, but consumer B gets the same buffer and uses it as source.
> 
> Another possibility is
> Consumer A uses the buffer, returns it to buddy, after some time, it allocates a buffer again, but gets the same buffer from buddy like before.
> 
> For the safety of the buffer, lower-layer driver must guarantee the buffer is unmapped when the buffer returns to buddy.
> 
> I think only the upper-layer consumer knows if it is reusing the buffer.

Right, and if reusing buffers is common in crypto callers, then there's 
an argument for "set up reusable buffer", "process updated buffer" and 
"clean up buffer" operations to be added to the crypto API itself, such 
that the underlying drivers can then optimise for DMA usage in a robust 
and obvious way if they want to (or just implement the setup and 
teardown as no-ops and still do a full map/unmap in each "process" call 
if they don't).

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-05-15 22:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-15  8:19 Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem Song Bao Hua
2020-05-15  8:19 ` Song Bao Hua
2020-05-15 12:10 ` Robin Murphy
2020-05-15 12:10   ` Robin Murphy
2020-05-15 14:45   ` hch
2020-05-15 14:45     ` hch
2020-05-15 21:33     ` Song Bao Hua
2020-05-15 21:33       ` Song Bao Hua
2020-05-15 22:12       ` Robin Murphy [this message]
2020-05-15 22:12         ` Robin Murphy
2020-05-15 22:45   ` Song Bao Hua
2020-05-15 22:45     ` Song Bao Hua

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f403a17b-bc3d-436a-4711-93cd8d4537ba@arm.com \
    --to=robin.murphy@arm.com \
    --cc=Jay.Estabrook@compaq.com \
    --cc=andrea@suse.de \
    --cc=dagum@barrel.engr.sgi.com \
    --cc=davidm@hpl.hp.com \
    --cc=grundler@cup.hp.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxarm@huawei.com \
    --cc=ralf@oss.sgi.com \
    --cc=sailer@ife.ee.ethz.ch \
    --cc=song.bao.hua@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.