qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Yan Zhao <yan.y.zhao@intel.com>,
	Juan Quintela <quintela@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, Eric Auger <eric.auger@redhat.com>,
	Avi Kivity <avi@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [RFC v3 0/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
Date: Wed, 12 Aug 2020 17:12:21 -0400	[thread overview]
Message-ID: <20200812211221.GG6353@xz-x1> (raw)
In-Reply-To: <CAJaqyWePxesYQqiG_ATPjwzw1c=xv3Uyw-x8tj5tVJJu__ChLQ@mail.gmail.com>

On Wed, Aug 12, 2020 at 04:33:24PM +0200, Eugenio Perez Martin wrote:
> On Tue, Aug 11, 2020 at 9:28 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Hi, Eugenio,
> >
> > On Tue, Aug 11, 2020 at 08:10:44PM +0200, Eugenio Perez Martin wrote:
> > > Using this patch as a reference, I'm having problems to understand:
> > >
> > > - I'm not sure that the flag name expresses clearly the notifier capability.
> >
> > The old code is kind of messed up for dev-iotlb invalidations, by always
> > sending UNMAP notifications for both iotlb and dev-iotlb invalidations.
> >
> > Now if we introduce the new DEV_IOTLB type, we can separate the two:
> >
> >   - We notify IOMMU_NOTIFIER_UNMAP for iotlb invalidations
> >
> >   - We notify IOMMU_NOTIFIER_DEV_IOTLB for dev-iotlb invalidations
> >
> > Vhost should always be with ats=on when vIOMMU enabled, so it will enable
> > device iotlb.  Then it does not need iotlb (UNMAP) invalidation any more
> > because they'll normally be duplicated (one is to invalidate vIOMMU cache, one
> > is to invalidate device cache).
> >
> > Also, we can drop UNMAP type for vhost if we introduce DEV_IOTLB.  It works
> > just like on the real hardwares - the device won't be able to receive iotlb
> > invalidation messages, but only device iotlb invalidation messages.  Here
> > vhost (or, virtio-pci) is the device.
> >
> > > - What would be the advantages of using another field (NotifierType?)
> > > in the notifier to express that it accepts arbitrary ranges for
> > > unmapping? (If I understood correctly Jason's proposal)
> >
> > (Please refer to above too..)
> >
> > > - Is it possible (or advisable) to skip all the page splitting in
> > > vtd_page_walk if the memory range notifier supports these arbitrary
> > > ranges? What would be the disadvantages? (Maybe in a future patch). It
> > > seems it is advisable to me, but I would like to double confirm.
> >
> > vtd_page_walk is not used for dev-iotlb, we don't need to change that.  We also
> > want to explicitly keep the page granularity of vtd_page_walk for the other
> > IOMMU notifiers, e.g. vfio.
> >
> 
> I'm not sure if I'm understanding it.
> 
> I have here a backtrace in a regular call (not [0,~0ULL]):
> #0  0x000055555597ca63 in memory_region_notify_one_iommu
> (notifier=0x7fffe4976f08, entry=0x7fffddff9d20)
>     at /home/qemu/softmmu/memory.c:1895
> #1  0x000055555597cc87 in memory_region_notify_iommu
> (iommu_mr=0x55555728f2e0, iommu_idx=0, entry=...) at
> /home/qemu/softmmu/memory.c:1938
> #2  0x000055555594b95a in vtd_sync_shadow_page_hook
> (entry=0x7fffddff9e70, private=0x55555728f2e0) at
> /home/qemu/hw/i386/intel_iommu.c:1436
> #3  0x000055555594af7b in vtd_page_walk_one (entry=0x7fffddff9e70,
> info=0x7fffddffa140) at /home/qemu/hw/i386/intel_iommu.c:1173
> #4  0x000055555594b2b3 in vtd_page_walk_level
>     (addr=10531758080, start=4292870144, end=4294967296, level=1,
> read=true, write=true, info=0x7fffddffa140)
>     at /home/qemu/hw/i386/intel_iommu.c:1254
> #5  0x000055555594b225 in vtd_page_walk_level
>     (addr=10530922496, start=3221225472, end=4294967296, level=2,
> read=true, write=true, info=0x7fffddffa140)
>     at /home/qemu/hw/i386/intel_iommu.c:1236
> #6  0x000055555594b225 in vtd_page_walk_level
>     (addr=10529021952, start=0, end=549755813888, level=3, read=true,
> write=true, info=0x7fffddffa140)
>     at /home/qemu/hw/i386/intel_iommu.c:1236
> #7  0x000055555594b3f8 in vtd_page_walk (s=0x555557565210,
> ce=0x7fffddffa1a0, start=0, end=549755813888, info=0x7fffddffa140)
>     at /home/qemu/hw/i386/intel_iommu.c:1293
> #8  0x000055555594ba77 in vtd_sync_shadow_page_table_range
> (vtd_as=0x55555728f270, ce=0x7fffddffa1a0, addr=0,
> size=18446744073709551615)
>     at /home/qemu/hw/i386/intel_iommu.c:1467
> #9  0x000055555594bb50 in vtd_sync_shadow_page_table
> (vtd_as=0x55555728f270) at /home/qemu/hw/i386/intel_iommu.c:1498
> #10 0x000055555594cc5f in vtd_iotlb_domain_invalidate
> (s=0x555557565210, domain_id=3) at
> /home/qemu/hw/i386/intel_iommu.c:1965
> #11 0x000055555594dbae in vtd_process_iotlb_desc (s=0x555557565210,
> inv_desc=0x7fffddffa2b0) at /home/qemu/hw/i386/intel_iommu.c:2371
> #12 0x000055555594dfd3 in vtd_process_inv_desc (s=0x555557565210) at
> /home/qemu/hw/i386/intel_iommu.c:2499
> #13 0x000055555594e1e9 in vtd_fetch_inv_desc (s=0x555557565210) at
> /home/qemu/hw/i386/intel_iommu.c:2568
> #14 0x000055555594e330 in vtd_handle_iqt_write (s=0x555557565210) at
> /home/qemu/hw/i386/intel_iommu.c:2595
> #15 0x000055555594ed90 in vtd_mem_write (opaque=0x555557565210,
> addr=136, val=1888, size=4) at /home/qemu/hw/i386/intel_iommu.c:2842
> #16 0x00005555559787b9 in memory_region_write_accessor
>     (mr=0x555557565540, addr=136, value=0x7fffddffa478, size=4,
> shift=0, mask=4294967295, attrs=...) at
> /home/qemu/softmmu/memory.c:483
> #17 0x00005555559789d7 in access_with_adjusted_size
>     (addr=136, value=0x7fffddffa478, size=4, access_size_min=4,
> access_size_max=8, access_fn=
>     0x5555559786da <memory_region_write_accessor>, mr=0x555557565540,
> attrs=...) at /home/qemu/softmmu/memory.c:544
> #18 0x000055555597b8a5 in memory_region_dispatch_write
> (mr=0x555557565540, addr=136, data=1888, op=MO_32, attrs=...)
>     at /home/qemu/softmmu/memory.c:1465
> #19 0x000055555582b1bf in flatview_write_continue
>     (fv=0x7fffc447c470, addr=4275634312, attrs=...,
> ptr=0x7ffff7dfd028, len=4, addr1=136, l=4, mr=0x555557565540) at
> /home/qemu/exec.c:3176
> #20 0x000055555582b304 in flatview_write (fv=0x7fffc447c470,
> addr=4275634312, attrs=..., buf=0x7ffff7dfd028, len=4)
>     at /home/qemu/exec.c:3216
> #21 0x000055555582b659 in address_space_write
>     (as=0x5555567a9380 <address_space_memory>, addr=4275634312,
> attrs=..., buf=0x7ffff7dfd028, len=4) at /home/qemu/exec.c:3307
> #22 0x000055555582b6c6 in address_space_rw
>     (as=0x5555567a9380 <address_space_memory>, addr=4275634312,
> attrs=..., buf=0x7ffff7dfd028, len=4, is_write=true)
>     at /home/qemu/exec.c:3317
> #23 0x000055555588e3b8 in kvm_cpu_exec (cpu=0x555556bfe9f0) at
> /home/qemu/accel/kvm/kvm-all.c:2518
> #24 0x0000555555972bcf in qemu_kvm_cpu_thread_fn (arg=0x555556bfe9f0)
> at /home/qemu/softmmu/cpus.c:1188
> #25 0x0000555555e08fbd in qemu_thread_start (args=0x555556c24c60) at
> util/qemu-thread-posix.c:521
> #26 0x00007ffff55a714a in start_thread () at /lib64/libpthread.so.0
> #27 0x00007ffff52d8f23 in clone () at /lib64/libc.so.6
> 
> with entry = {target_as = 0x5555567a9380, iova = 0xfff0b000,
> translated_addr = 0x0, addr_mask = 0xfff, perm = 0x0}
> 
> Here we got 3 levels of vtd_page_walk (frames #4-#6). The #6 parameters are:
> 
> (addr=10529021952, start=0, end=549755813888, level=3, read=true, write=true,
>     info=0x7fffddffa140)
> 
> If I understand correctly, the while (iova < end) {} loop in
> vtd_page_walk will break the big range in small pages (4K because of
> level=1, and (end - iova) / subpage_size = 245 pages or iterations).
> That could be a lot of write(2) in vhost_kernel_send_device_iotlb_msg
> in the worst case, or a lot of useless returns in
> memory_region_notify_one_iommu because of (notifier->start > entry_end
> || notifier->end < entry->iova) in the best.
> 
> Am I right with this? I understand that others notifiers (you mention
> vfio) need the granularity, but would a check in some vtd_* function
> for the help with the performance? (not suggesting to introduce it in
> this patch series).

Yeah, I think you're right we need to touch vtd_page_walk(), since
vtd_page_walk() should only notify MAP/UNMAP events, but not DEV_IOTLB.
However we don't need to touch more for the vtd_page_walk() internal logic, so
that the page granularities will be the same as before.

Thanks,

-- 
Peter Xu



      reply	other threads:[~2020-08-12 21:13 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-26  6:41 [RFC v2 0/1] memory: Delete assertion in memory_region_unregister_iommu_notifier Eugenio Pérez
2020-06-26  6:41 ` [RFC v2 1/1] " Eugenio Pérez
2020-06-26 21:29   ` Peter Xu
2020-06-27  7:26     ` Yan Zhao
2020-06-27 12:57       ` Peter Xu
2020-06-28  1:36         ` Yan Zhao
2020-06-28  7:03     ` Jason Wang
2020-06-28 14:47       ` Peter Xu
2020-06-29  5:51         ` Jason Wang
2020-06-29 13:34           ` Peter Xu
2020-06-30  2:41             ` Jason Wang
2020-06-30  8:29               ` Jason Wang
2020-06-30  9:21                 ` Michael S. Tsirkin
2020-06-30  9:23                   ` Jason Wang
2020-06-30 15:20                     ` Peter Xu
2020-07-01  8:11                       ` Jason Wang
2020-07-01 12:16                         ` Peter Xu
2020-07-01 12:30                           ` Jason Wang
2020-07-01 12:41                             ` Peter Xu
2020-07-02  3:00                               ` Jason Wang
2020-06-30 15:39               ` Peter Xu
2020-07-01  8:09                 ` Jason Wang
2020-07-02  3:01                   ` Jason Wang
2020-07-02 15:45                     ` Peter Xu
2020-07-03  7:24                       ` Jason Wang
2020-07-03 13:03                         ` Peter Xu
2020-07-07  8:03                           ` Jason Wang
2020-07-07 19:54                             ` Peter Xu
2020-07-08  5:42                               ` Jason Wang
2020-07-08 14:16                                 ` Peter Xu
2020-07-09  5:58                                   ` Jason Wang
2020-07-09 14:10                                     ` Peter Xu
2020-07-10  6:34                                       ` Jason Wang
2020-07-10 13:30                                         ` Peter Xu
2020-07-13  4:04                                           ` Jason Wang
2020-07-16  1:00                                             ` Peter Xu
2020-07-16  2:54                                               ` Jason Wang
2020-07-17 14:18                                                 ` Peter Xu
2020-07-20  4:02                                                   ` Jason Wang
2020-07-20 13:03                                                     ` Peter Xu
2020-07-21  6:20                                                       ` Jason Wang
2020-07-21 15:10                                                         ` Peter Xu
2020-08-03 16:00                         ` Eugenio Pérez
2020-08-04 20:30                           ` Peter Xu
2020-08-05  5:45                             ` Jason Wang
2020-08-11 17:01     ` Eugenio Perez Martin
2020-08-11 17:10       ` Eugenio Perez Martin
2020-06-29 15:05 ` [RFC v2 0/1] " Paolo Bonzini
2020-07-03  7:39   ` Eugenio Perez Martin
2020-07-03 10:10     ` Paolo Bonzini
2020-08-11 17:55 ` [RFC v3 " Eugenio Pérez
2020-08-11 17:55   ` [RFC v3 1/1] memory: Skip bad range assertion if notifier supports arbitrary masks Eugenio Pérez
2020-08-12  2:24     ` Jason Wang
2020-08-12  8:49       ` Eugenio Perez Martin
2020-08-18 14:24         ` Eugenio Perez Martin
2020-08-19  7:15           ` Jason Wang
2020-08-19  8:22             ` Eugenio Perez Martin
2020-08-19  9:36               ` Jason Wang
2020-08-19 15:50             ` Peter Xu
2020-08-20  2:28               ` Jason Wang
2020-08-21 14:12                 ` Peter Xu
2020-09-01  3:05                   ` Jason Wang
2020-09-01 19:35                     ` Peter Xu
2020-09-02  5:13                       ` Jason Wang
2020-08-11 18:10   ` [RFC v3 0/1] memory: Delete assertion in memory_region_unregister_iommu_notifier Eugenio Perez Martin
2020-08-11 19:27     ` Peter Xu
2020-08-12 14:33       ` Eugenio Perez Martin
2020-08-12 21:12         ` Peter Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200812211221.GG6353@xz-x1 \
    --to=peterx@redhat.com \
    --cc=avi@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).