linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Yongji Xie <xieyongji@bytedance.com>
Cc: kvm <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	virtualization <virtualization@lists.linux-foundation.org>,
	"Christian Brauner" <christian.brauner@canonical.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Dan Carpenter" <dan.carpenter@oracle.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	"Liu Xiaodong" <xiaodong.liu@intel.com>,
	"Joe Perches" <joe@perches.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	songmuchun@bytedance.com, "Jens Axboe" <axboe@kernel.dk>,
	"He Zhe" <zhe.he@windriver.com>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	iommu@lists.linux-foundation.org, bcrl@kvack.org,
	netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	"Mika Penttilä" <mika.penttila@nextfour.com>
Subject: Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()
Date: Wed, 4 Aug 2021 16:43:43 +0100	[thread overview]
Message-ID: <417ce5af-4deb-5319-78ce-b74fb4dd0582@arm.com> (raw)
In-Reply-To: <CACycT3sm2r8NMMUPy1k1PuSZZ3nM9aic-O4AhdmRRCwgmwGj4Q@mail.gmail.com>

On 2021-08-04 06:02, Yongji Xie wrote:
> On Tue, Aug 3, 2021 at 6:54 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2021-08-03 09:54, Yongji Xie wrote:
>>> On Tue, Aug 3, 2021 at 3:41 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>>
>>>> 在 2021/7/29 下午3:34, Xie Yongji 写道:
>>>>> Export alloc_iova_fast() and free_iova_fast() so that
>>>>> some modules can use it to improve iova allocation efficiency.
>>>>
>>>>
>>>> It's better to explain why alloc_iova() is not sufficient here.
>>>>
>>>
>>> Fine.
>>
>> What I fail to understand from the later patches is what the IOVA domain
>> actually represents. If the "device" is a userspace process then
>> logically the "IOVA" would be the userspace address, so presumably
>> somewhere you're having to translate between this arbitrary address
>> space and actual usable addresses - if you're worried about efficiency
>> surely it would be even better to not do that?
>>
> 
> Yes, userspace daemon needs to translate the "IOVA" in a DMA
> descriptor to the VA (from mmap(2)). But this actually doesn't affect
> performance since it's an identical mapping in most cases.

I'm not familiar with the vhost_iotlb stuff, but it looks suspiciously 
like you're walking yet another tree to make those translations. Even if 
the buffer can be mapped all at once with a fixed offset such that each 
DMA mapping call doesn't need a lookup for each individual "IOVA" - that 
might be what's happening already, but it's a bit hard to follow just 
reading the patches in my mail client - vhost_iotlb_add_range() doesn't 
look like it's super-cheap to call, and you're serialising on a lock for 
that.

My main point, though, is that if you've already got something else 
keeping track of the actual addresses, then the way you're using an 
iova_domain appears to be something you could do with a trivial bitmap 
allocator. That's why I don't buy the efficiency argument. The main 
design points of the IOVA allocator are to manage large address spaces 
while trying to maximise spatial locality to minimise the underlying 
pagetable usage, and allocating with a flexible limit to support 
multiple devices with different addressing capabilities in the same 
address space. If none of those aspects are relevant to the use-case - 
which AFAICS appears to be true here - then as a general-purpose 
resource allocator it's rubbish and has an unreasonably massive memory 
overhead and there are many, many better choices.

FWIW I've recently started thinking about moving all the caching stuff 
out of iova_domain and into the iommu-dma layer since it's now a giant 
waste of space for all the other current IOVA users.

>> Presumably userspace doesn't have any concern about alignment and the
>> things we have to worry about for the DMA API in general, so it's pretty
>> much just allocating slots in a buffer, and there are far more effective
>> ways to do that than a full-blown address space manager.
> 
> Considering iova allocation efficiency, I think the iova allocator is
> better here. In most cases, we don't even need to hold a spin lock
> during iova allocation.
> 
>> If you're going
>> to reuse any infrastructure I'd have expected it to be SWIOTLB rather
>> than the IOVA allocator. Because, y'know, you're *literally implementing
>> a software I/O TLB* ;)
>>
> 
> But actually what we can reuse in SWIOTLB is the IOVA allocator.

Huh? Those are completely unrelated and orthogonal things - SWIOTLB does 
not use an external allocator (see find_slots()). By SWIOTLB I mean 
specifically the library itself, not dma-direct or any of the other 
users built around it. The functionality for managing slots in a buffer 
and bouncing data in and out can absolutely be reused - that's why users 
like the Xen and iommu-dma code *are* reusing it instead of open-coding 
their own versions.

> And
> the IOVA management in SWIOTLB is not what we want. For example,
> SWIOTLB allocates and uses contiguous memory for bouncing, which is
> not necessary in VDUSE case.

alloc_iova() allocates a contiguous (in IOVA address) region of space. 
In vduse_domain_map_page() you use it to allocate a contiguous region of 
space from your bounce buffer. Can you clarify how that is fundamentally 
different from allocating a contiguous region of space from a bounce 
buffer? Nobody's saying the underlying implementation details of where 
the buffer itself comes from can't be tweaked.

> And VDUSE needs coherent mapping which is
> not supported by the SWIOTLB. Besides, the SWIOTLB works in singleton
> mode (designed for platform IOMMU) , but VDUSE is based on on-chip
> IOMMU (supports multiple instances).
That's not entirely true - the IOMMU bounce buffering scheme introduced 
in intel-iommu and now moved into the iommu-dma layer was already a step 
towards something conceptually similar. It does still rely on stealing 
the underlying pages from the global SWIOTLB pool at the moment, but the 
bouncing is effectively done in a per-IOMMU-domain context.

The next step is currently queued in linux-next, wherein we can now have 
individual per-device SWIOTLB pools. In fact at that point I think you 
might actually be able to do your thing without implementing any special 
DMA ops at all - you'd need to set up a pool for your "device" with 
force_bounce set, then when you mmap() that to userspace, set up 
dev->dma_range_map to describe an offset from the physical address of 
the buffer to the userspace address, and I think dma-direct would be 
tricked into doing the right thing. It's a bit wacky, but it could stand 
to save a hell of a lot of bother.

Finally, enhancing SWIOTLB to cope with virtually-mapped buffers that 
don't have to be physically contiguous is a future improvement which I 
think could benefit various use-cases - indeed it's possibly already on 
the table for IOMMU bounce pages - so would probably be welcome in general.

 > So I still prefer to reuse the
 > IOVA allocator to implement a MMU-based software IOTLB.

If you're dead set on open-coding all the bounce-buffering machinery, 
then I'd honestly recommend open-coding a more suitable buffer allocator 
as well ;)

Thanks,
Robin.

  reply	other threads:[~2021-08-04 15:43 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29  7:34 [PATCH v10 00/17] Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2021-07-29  7:34 ` [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast() Xie Yongji
2021-08-03  7:41   ` Jason Wang
2021-08-03  7:41   ` Jason Wang
2021-08-03  8:54     ` Yongji Xie
2021-08-03 10:53       ` Robin Murphy
2021-08-04  5:02         ` Yongji Xie
2021-08-04 15:43           ` Robin Murphy [this message]
2021-08-05 12:34             ` Yongji Xie
2021-08-05 13:31               ` Jason Wang
2021-08-09  5:56                 ` Yongji Xie
2021-08-10  3:02                   ` Jason Wang
2021-08-10  7:43                     ` Yongji Xie
2021-07-29  7:34 ` [PATCH v10 02/17] file: Export receive_fd() to modules Xie Yongji
2021-08-03  7:45   ` Jason Wang
2021-08-03  9:01     ` Yongji Xie
2021-08-04  8:27       ` Jason Wang
2021-07-29  7:34 ` [PATCH v10 03/17] vdpa: Fix code indentation Xie Yongji
2021-08-03  7:50   ` Jason Wang
2021-08-03  9:13     ` Yongji Xie
2021-07-29  7:34 ` [PATCH v10 04/17] vdpa: Fail the vdpa_reset() if fail to set device status to zero Xie Yongji
2021-08-03  7:58   ` Jason Wang
2021-08-03  9:31     ` Yongji Xie
2021-08-04  8:30       ` Jason Wang
2021-07-29  7:34 ` [PATCH v10 05/17] vhost-vdpa: Fail the vhost_vdpa_set_status() on reset failure Xie Yongji
2021-08-03  8:10   ` Jason Wang
2021-08-03  9:50     ` Yongji Xie
2021-08-04  8:33       ` Jason Wang
2021-07-29  7:34 ` [PATCH v10 06/17] vhost-vdpa: Handle the failure of vdpa_reset() Xie Yongji
2021-07-29  7:34 ` [PATCH v10 07/17] virtio: Don't set FAILED status bit on device index allocation failure Xie Yongji
2021-08-03  8:02   ` Jason Wang
2021-08-03  9:17     ` Yongji Xie
2021-07-29  7:34 ` [PATCH v10 08/17] virtio_config: Add a return value to reset function Xie Yongji
2021-07-29  7:34 ` [PATCH v10 09/17] virtio-vdpa: Handle the failure of vdpa_reset() Xie Yongji
2021-07-29  7:34 ` [PATCH v10 10/17] virtio: Handle device reset failure in register_virtio_device() Xie Yongji
2021-08-03  8:09   ` Jason Wang
2021-08-03  9:38     ` Yongji Xie
2021-08-04  8:32       ` Jason Wang
2021-08-04  8:50         ` Yongji Xie
2021-08-04  8:54           ` Jason Wang
2021-08-04  9:07             ` Yongji Xie
2021-08-05  7:12               ` Jason Wang
2021-07-29  7:34 ` [PATCH v10 11/17] vhost-iotlb: Add an opaque pointer for vhost IOTLB Xie Yongji
2021-07-29  7:34 ` [PATCH v10 12/17] vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() Xie Yongji
2021-07-29  7:34 ` [PATCH v10 13/17] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() Xie Yongji
2021-07-29  7:35 ` [PATCH v10 14/17] vdpa: Support transferring virtual addressing during DMA mapping Xie Yongji
2021-07-29  7:35 ` [PATCH v10 15/17] vduse: Implement an MMU-based software IOTLB Xie Yongji
2021-07-29  7:35 ` [PATCH v10 16/17] vduse: Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2021-07-29  9:00   ` Greg KH
2021-07-29  9:57     ` Yongji Xie
2021-08-03  7:30   ` Jason Wang
2021-08-03  8:39     ` Yongji Xie
2021-07-29  7:35 ` [PATCH v10 17/17] Documentation: Add documentation for VDUSE Xie Yongji
2021-08-03  7:35   ` Jason Wang
2021-08-03  8:52     ` Yongji Xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=417ce5af-4deb-5319-78ce-b74fb4dd0582@arm.com \
    --to=robin.murphy@arm.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=christian.brauner@canonical.com \
    --cc=corbet@lwn.net \
    --cc=dan.carpenter@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jasowang@redhat.com \
    --cc=joe@perches.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mika.penttila@nextfour.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=sgarzare@redhat.com \
    --cc=songmuchun@bytedance.com \
    --cc=stefanha@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    --cc=xiaodong.liu@intel.com \
    --cc=xieyongji@bytedance.com \
    --cc=zhe.he@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).