All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: pbonzini <pbonzini@redhat.com>, mst <mst@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>, Chao Gao <chao.gao@intel.com>
Subject: Re: [PATCH] vhost: use large iotlb entry if no IOMMU translation is needed
Date: Wed, 4 Aug 2021 11:19:51 +0800	[thread overview]
Message-ID: <CACGkMEtH-y+q+ROPQB44J0fRJR8+dWcYdUrrYh=zoVa06ZWDcw@mail.gmail.com> (raw)
In-Reply-To: <YQmxFkSo7jZ7pX8Q@t490s>

On Wed, Aug 4, 2021 at 5:11 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Aug 03, 2021 at 04:14:57PM +0800, Jason Wang wrote:
> >
> > 在 2021/8/3 下午1:51, Chao Gao 写道:
> > > On Tue, Aug 03, 2021 at 12:43:58PM +0800, Jason Wang wrote:
> > > > 在 2021/8/3 下午12:29, Chao Gao 写道:
> > > > > Ping. Could someone help to review this patch?
> > > > >
> > > > > Thanks
> > > > > Chao
> > > > >
> > > > > On Wed, Jul 21, 2021 at 03:54:02PM +0800, Chao Gao wrote:
> > > > > > If guest enables IOMMU_PLATFORM for virtio-net, severe network
> > > > > > performance drop is observed even if there is no IOMMU.
> > > >
> > > > We see such reports internally and we're testing a patch series to disable
> > > > vhost IOTLB in this case.
> > > >
> > > > Will post a patch soon.
>
> [1]
>
> > > OK. put me in the CC list. I would like to test with TDX to ensure your patch
> > > fix the performance issue I am facing.
> >
> >
> > Sure.
> >
> >
> > >
> > > >
> > > >
> > > > > >    And disabling
> > > > > > vhost can mitigate the perf issue. Finally, we found the culprit is
> > > > > > frequent iotlb misses: kernel vhost-net has 2048 entries and each
> > > > > > entry is 4K (qemu uses 4K for i386 if no IOMMU); vhost-net can cache
> > > > > > translations for up to 8M (i.e. 4K*2048) IOVAs. If guest uses >8M
> > > > > > memory for DMA, there are some iotlb misses.
> > > > > >
> > > > > > If there is no IOMMU or IOMMU is disabled or IOMMU works in pass-thru
> > > > > > mode, we can optimistically use large, unaligned iotlb entries instead
> > > > > > of 4K-aligned entries to reduce iotlb pressure.
> > > >
> > > > Instead of introducing new general facilities like unaligned IOTLB entry. I
> > > > wonder if we optimize the vtd_iommu_translate() to use e.g 1G instead?
> > > using 1G iotlb entry looks feasible.
> >
> >
> > Want to send a patch?
> >
> >
> > >
> > > >      } else {
> > > >          /* DMAR disabled, passthrough, use 4k-page*/
> > > >          iotlb.iova = addr & VTD_PAGE_MASK_4K;
> > > >          iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
> > > >          iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
> > > >          iotlb.perm = IOMMU_RW;
> > > >          success = true;
> > > >      }
> > > >
> > > >
> > > > > >    Actually, vhost-net
> > > > > > in kernel supports unaligned iotlb entry. The alignment requirement is
> > > > > > imposed by address_space_get_iotlb_entry() and flatview_do_translate().
> > > >
> > > > For the passthrough case, is there anyway to detect them and then disable
> > > > device IOTLB in those case?
> > > yes. I guess so; qemu knows the presence and status of iommu. Currently,
> > > in flatview_do_translate(), memory_region_get_iommu() tells whether a memory
> > > region is behind an iommu.
> >
> >
> > The issues are:
> >
> > 1) how to know the passthrough mode is enabled (note that passthrough mode
> > doesn't mean it doesn't sit behind IOMMU)
>
> memory_region_get_iommu() should return NULL if it's passthrough-ed?

Do you mean something like memory_region_get_iommu(as->root)?

Could it be possible that the iommu was attached to a specified mr but not root.

In [1], I originally try to use pci_device_iommu_address_space() in
virtio_pci_get_dma_as().

But I suffer from the issue that virtio-pci might be initialized
before the e.g intel-iommu, which you try to solve at [2].

Then I switch to introduce a iommu_enabled that compares the as
returned by pci_device_iommu_address_space() against
address_space_memory.

And the iommu_enalbed will be called during vhost start where
intel-iommu is guaranteed to be initialized.

This seems to work. Let me post the patch and let's start there.

>
> > 2) can passthrough mode be disabled on the fly? If yes, we need to deal with
> > them
>
> I don't think it happens in reality; e.g. when iommu=pt is set it's set until
> the next guest reboot.  However I don't know whether there's limitation from
> spec-wise.

Yes, that's what I worry about. Even if it's not limited by the Intel
spec, we might suffer from this in another iommu.

>  Also I don't know whether there's special cases, for example when
> we kexec.
>
> I've two questions..
>
> Jason, when you mentioned the "fix" above [1], shouldn't that also fix the same
> issue, and in a better way? Because ideally I think if we know vhost does not
> need a translation for either iommu_platform=off, or passthrough, dev-iotlb
> layer seems an overhead with no real use.

Yes, see above. Let me post the patch.

>
> The other question is I'm also wondering why we care about iommu_platform=on
> when there's no vIOMMU at all - it's about why we bother setting the flag at
> all?  Or is it a valid use case?

Encrypted VM like SEV or TDX.

In those cases, swiotlb needs to be used in the guest since the
swiotlb pages were not encrypted.

And the iommu_platform=on is the only way to let the guest driver use
DMA API (swiotlb).

(The name iommu_platform is confusing and tricky)

Thanks

>
> Thanks,
>
> --
> Peter Xu
>



      reply	other threads:[~2021-08-04  3:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-21  7:54 [PATCH] vhost: use large iotlb entry if no IOMMU translation is needed Chao Gao
2021-08-03  4:29 ` Chao Gao
2021-08-03  4:43   ` Jason Wang
2021-08-03  5:51     ` Chao Gao
2021-08-03  8:14       ` Jason Wang
2021-08-03 21:11         ` Peter Xu
2021-08-04  3:19           ` Jason Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACGkMEtH-y+q+ROPQB44J0fRJR8+dWcYdUrrYh=zoVa06ZWDcw@mail.gmail.com' \
    --to=jasowang@redhat.com \
    --cc=chao.gao@intel.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.