QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Yan Zhao" <yan.y.zhao@intel.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, "Eugenio Pérez" <eperezma@redhat.com>,
	"Eric Auger" <eric.auger@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
Date: Sun, 28 Jun 2020 10:47:46 -0400
Message-ID: <20200628144746.GA239443@xz-x1> (raw)
In-Reply-To: <8cf25190-53e6-8cbb-372b-e3d4ec714dc5@redhat.com>

On Sun, Jun 28, 2020 at 03:03:41PM +0800, Jason Wang wrote:
> 
> On 2020/6/27 上午5:29, Peter Xu wrote:
> > Hi, Eugenio,
> > 
> > (CCing Eric, Yan and Michael too)
> > 
> > On Fri, Jun 26, 2020 at 08:41:22AM +0200, Eugenio Pérez wrote:
> > > diff --git a/memory.c b/memory.c
> > > index 2f15a4b250..7f789710d2 100644
> > > --- a/memory.c
> > > +++ b/memory.c
> > > @@ -1915,8 +1915,6 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
> > >           return;
> > >       }
> > > -    assert(entry->iova >= notifier->start && entry_end <= notifier->end);
> > I can understand removing the assertion should solve the issue, however imho
> > the major issue is not about this single assertion but the whole addr_mask
> > issue behind with virtio...
> 
> 
> I don't get here, it looks to the the range was from guest IOMMU drivers.

Yes.  Note that I didn't mean that it's a problem in virtio, it's just the fact
that virtio is the only one I know that would like to support arbitrary address
range for the translated region.  I don't know about tcg, but vfio should still
need some kind of page alignment in both the address and the addr_mask.  We
have that assumption too across the memory core when we do translations.

A further cause of the issue is the MSI region when vIOMMU enabled - currently
we implemented the interrupt region using another memory region so it split the
whole DMA region into two parts.  That's really a clean approach to IR
implementation, however that's also a burden to the invalidation part because
then we'll need to handle things like this when the listened range is not page
alighed at all (neither 0-0xfedffff, nor 0xfef0000-MAX).  If without the IR
region (so the whole iommu address range will be a single FlatRange), I think
we probably don't need most of the logic in vtd_address_space_unmap() at all,
then we can directly deliver all the IOTLB invalidations without splitting into
small page aligned ranges to all the iommu notifiers.  Sadly, so far I still
don't have ideal solution for it, because we definitely need IR.

> 
> 
> > 
> > For normal IOTLB invalidations, we were trying our best to always make
> > IOMMUTLBEntry contain a valid addr_mask to be 2**N-1.  E.g., that's what we're
> > doing with the loop in vtd_address_space_unmap().
> 
> 
> I'm sure such such assumption can work for any type of IOMMU.
> 
> 
> > 
> > But this is not the first time that we may want to break this assumption for
> > virtio so that we make the IOTLB a tuple of (start, len), then that len can be
> > not a address mask any more.  That seems to be more efficient for things like
> > vhost because iotlbs there are not page based, so it'll be inefficient if we
> > always guarantee the addr_mask because it'll be quite a lot more roundtrips of
> > the same range of invalidation.  Here we've encountered another issue of
> > triggering the assertion with virtio-net, but only with the old RHEL7 guest.
> > 
> > I'm thinking whether we can make the IOTLB invalidation configurable by
> > specifying whether the backend of the notifier can handle arbitary address
> > range in some way.  So we still have the guaranteed addr_masks by default
> > (since I still don't think totally break the addr_mask restriction is wise...),
> > however we can allow the special backends to take adavantage of using arbitary
> > (start, len) ranges for reasons like performance.
> > 
> > To do that, a quick idea is to introduce a flag IOMMU_NOTIFIER_ARBITRARY_MASK
> > to IOMMUNotifierFlag, to declare that the iommu notifier (and its backend) can
> > take arbitrary address mask, then it can be any value and finally becomes a
> > length rather than an addr_mask.  Then for every iommu notify() we can directly
> > deliver whatever we've got from the upper layer to this notifier.  With the new
> > flag, vhost can do iommu_notifier_init() with UNMAP|ARBITRARY_MASK so it
> > declares this capability.  Then no matter for device iotlb or normal iotlb, we
> > skip the complicated procedure to split a big range into small ranges that are
> > with strict addr_mask, but directly deliver the message to the iommu notifier.
> > E.g., we can skip the loop in vtd_address_space_unmap() if the notifier is with
> > ARBITRARY flag set.
> 
> 
> I'm not sure coupling IOMMU capability to notifier is the best choice.

IMHO it's not an IOMMU capability.  The flag I wanted to introduce is a
capability of the one who listens to the IOMMU TLB updates.  For our case, it's
virtio/vhost's capability to allow arbitrary length. The IOMMU itself
definitely has some limitation on the address range to be bound to an IOTLB
invalidation, e.g., the device-iotlb we're talking here only accept both the
iova address and addr_mask to be aligned to 2**N-1.

> 
> How about just convert to use a range [start, end] for any notifier and move
> the checks (e.g the assert) into the actual notifier implemented (vhost or
> vfio)?

IOMMUTLBEntry itself is the abstraction layer of TLB entry.  Hardware TLB entry
is definitely not arbitrary range either (because AFAICT the hardware should
only cache PFN rather than address, so at least PAGE_SIZE aligned).
Introducing this flag will already make this trickier just to avoid introducing
another similar struct to IOMMUTLBEntry, but I really don't want to make it a
default option...  Not to mention I probably have no reason to urge the rest
iommu notifier users (tcg, vfio) to change their existing good code to suite
any of the backend who can cooperate with arbitrary address ranges...

Thanks,

-- 
Peter Xu



  reply index

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-26  6:41 [RFC v2 0/1] " Eugenio Pérez
2020-06-26  6:41 ` [RFC v2 1/1] " Eugenio Pérez
2020-06-26 21:29   ` Peter Xu
2020-06-27  7:26     ` Yan Zhao
2020-06-27 12:57       ` Peter Xu
2020-06-28  1:36         ` Yan Zhao
2020-06-28  7:03     ` Jason Wang
2020-06-28 14:47       ` Peter Xu [this message]
2020-06-29  5:51         ` Jason Wang
2020-06-29 13:34           ` Peter Xu
2020-06-30  2:41             ` Jason Wang
2020-06-30  8:29               ` Jason Wang
2020-06-30  9:21                 ` Michael S. Tsirkin
2020-06-30  9:23                   ` Jason Wang
2020-06-30 15:20                     ` Peter Xu
2020-07-01  8:11                       ` Jason Wang
2020-07-01 12:16                         ` Peter Xu
2020-07-01 12:30                           ` Jason Wang
2020-07-01 12:41                             ` Peter Xu
2020-07-02  3:00                               ` Jason Wang
2020-06-30 15:39               ` Peter Xu
2020-07-01  8:09                 ` Jason Wang
2020-07-02  3:01                   ` Jason Wang
2020-07-02 15:45                     ` Peter Xu
2020-07-03  7:24                       ` Jason Wang
2020-07-03 13:03                         ` Peter Xu
2020-07-07  8:03                           ` Jason Wang
2020-07-07 19:54                             ` Peter Xu
2020-07-08  5:42                               ` Jason Wang
2020-07-08 14:16                                 ` Peter Xu
2020-07-09  5:58                                   ` Jason Wang
2020-07-09 14:10                                     ` Peter Xu
2020-07-10  6:34                                       ` Jason Wang
2020-07-10 13:30                                         ` Peter Xu
2020-07-13  4:04                                           ` Jason Wang
2020-07-16  1:00                                             ` Peter Xu
2020-07-16  2:54                                               ` Jason Wang
2020-07-17 14:18                                                 ` Peter Xu
2020-07-20  4:02                                                   ` Jason Wang
2020-07-20 13:03                                                     ` Peter Xu
2020-07-21  6:20                                                       ` Jason Wang
2020-07-21 15:10                                                         ` Peter Xu
2020-08-03 16:00                         ` Eugenio Pérez
2020-08-04 20:30                           ` Peter Xu
2020-08-05  5:45                             ` Jason Wang
2020-06-29 15:05 ` [RFC v2 0/1] " Paolo Bonzini
2020-07-03  7:39   ` Eugenio Perez Martin
2020-07-03 10:10     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200628144746.GA239443@xz-x1 \
    --to=peterx@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git