All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-29 23:54 Jérôme Glisse
  0 siblings, 0 replies; 36+ messages in thread
From: Jérôme Glisse @ 2017-08-29 23:54 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Andrea Arcangeli, Joerg Roedel, kvm, Radim Krčmář,
	linux-rdma, linuxppc-dev, Jack Steiner, Sudeep Dutt, dri-devel,
	Ashutosh Dixit, iommu, Jérôme Glisse, Dimitri Sivanich,
	amd-gfx, xen-devel, Paolo Bonzini, Andrew Morton, Linus Torvalds,
	Dan Williams, Kirill A . Shutemov

(Sorry for so many list cross-posting and big cc)

Please help testing !

The invalidate_page callback suffered from 2 pitfalls. First it used to
happen after page table lock was release and thus a new page might have
been setup for the virtual address before the call to invalidate_page().

This is in a weird way fixed by c7ab0d2fdc840266b39db94538f74207ec2afbf6
which moved the callback under the page table lock. Which also broke
several existing user of the mmu_notifier API that assumed they could
sleep inside this callback.

The second pitfall was invalidate_page being the only callback not taking
a range of address in respect to invalidation but was giving an address
and a page. Lot of the callback implementer assumed this could never be
THP and thus failed to invalidate the appropriate range for THP pages.

By killing this callback we unify the mmu_notifier callback API to always
take a virtual address range as input.

There is now 2 clear API (I am not mentioning the youngess API which is
seldomly used):
  - invalidate_range_start()/end() callback (which allow you to sleep)
  - invalidate_range() where you can not sleep but happen right after
    page table update under page table lock


Note that a lot of existing user feels broken in respect to range_start/
range_end. Many user only have range_start() callback but there is nothing
preventing them to undo what was invalidated in their range_start() callback
after it returns but before any CPU page table update take place.

The code pattern use in kvm or umem odp is an example on how to properly
avoid such race. In a nutshell use some kind of sequence number and active
range invalidation counter to block anything that might undo what the
range_start() callback did.

If you do not care about keeping fully in sync with CPU page table (ie
you can live with CPU page table pointing to new different page for a
given virtual address) then you can take a reference on the pages inside
the range_start callback and drop it in range_end or when your driver
is done with those pages.

Last alternative is to use invalidate_range() if you can do invalidation
without sleeping as invalidate_range() callback happens under the CPU
page table spinlock right after the page table is updated.


Note this is barely tested. I intend to do more testing of next few days
but i do not have access to all hardware that make use of the mmu_notifier
API.


First 2 patches convert existing call of mmu_notifier_invalidate_page()
to mmu_notifier_invalidate_range() and bracket those call with call to
mmu_notifier_invalidate_range_start()/end().

The next 10 patches remove existing invalidate_page() callback as it can
no longer happen.

Finaly the last page remove it completely so it can RIP.

Jérôme Glisse (13):
  dax: update to new mmu_notifier semantic
  mm/rmap: update to new mmu_notifier semantic
  powerpc/powernv: update to new mmu_notifier semantic
  drm/amdgpu: update to new mmu_notifier semantic
  IB/umem: update to new mmu_notifier semantic
  IB/hfi1: update to new mmu_notifier semantic
  iommu/amd: update to new mmu_notifier semantic
  iommu/intel: update to new mmu_notifier semantic
  misc/mic/scif: update to new mmu_notifier semantic
  sgi-gru: update to new mmu_notifier semantic
  xen/gntdev: update to new mmu_notifier semantic
  KVM: update to new mmu_notifier semantic
  mm/mmu_notifier: kill invalidate_page

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>

Cc: linuxppc-dev@lists.ozlabs.org
Cc: dri-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Cc: kvm@vger.kernel.org


 arch/powerpc/platforms/powernv/npu-dma.c | 10 --------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c   | 31 ----------------------
 drivers/infiniband/core/umem_odp.c       | 19 --------------
 drivers/infiniband/hw/hfi1/mmu_rb.c      |  9 -------
 drivers/iommu/amd_iommu_v2.c             |  8 ------
 drivers/iommu/intel-svm.c                |  9 -------
 drivers/misc/mic/scif/scif_dma.c         | 11 --------
 drivers/misc/sgi-gru/grutlbpurge.c       | 12 ---------
 drivers/xen/gntdev.c                     |  8 ------
 fs/dax.c                                 | 19 ++++++++------
 include/linux/mm.h                       |  1 +
 include/linux/mmu_notifier.h             | 25 ------------------
 mm/memory.c                              | 26 +++++++++++++++----
 mm/mmu_notifier.c                        | 14 ----------
 mm/rmap.c                                | 44 +++++++++++++++++++++++++++++---
 virt/kvm/kvm_main.c                      | 42 ------------------------------
 16 files changed, 74 insertions(+), 214 deletions(-)

-- 
2.13.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-09-01 14:47           ` Jeff Cook
@ 2017-09-01 14:50             ` taskboxtester
  -1 siblings, 0 replies; 36+ messages in thread
From: taskboxtester @ 2017-09-01 14:50 UTC (permalink / raw)
  To: Jeff Cook
  Cc: open list:AMD IOMMU (AMD-VI),
	Paolo Bonzini, Dan Williams, linux-rdma, Jack Steiner,
	Dimitri Sivanich, Bernhard Held, Andrew Morton,
	Radim Krčmář,
	amd-gfx, DRI, xen-devel, Joerg Roedel, Jerome Glisse, ppc-dev,
	Linus Torvalds, Kirill A . Shutemov, Andrea Arcangeli,
	Sudeep Dutt, KVM list, Adam Borowski

[-- Attachment #1: Type: text/plain, Size: 7403 bytes --]

taskboxtester@gmail.com liked your message with Boxer for Android.


On Sep 1, 2017 10:48 AM, Jeff Cook <jeff@jeffcook.io> wrote:

On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> > I will wait for people to test and for result of my own test before
> > reposting if need be, otherwise i will post as separate patch.
> >
> > > But from a _very_ quick read-through this looks fine. But it obviously
> > > needs testing.
> > > 
> > > People - *especially* the people who saw issues under KVM - can you
> > > try out Jérôme's patch-series? I aded some people to the cc, the full
> > > series is on lkml. Jérôme - do you have a git branch for people to
> > > test that they could easily pull and try out?
> > 
> > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> > git://people.freedesktop.org/~glisse/linux
> 
> Tested your branch as of 10f07641, on a long list of guest VMs.
> No earth-shattering kaboom.

I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36
hours or so, also without incident.

Unlike most other reporters, I experienced a similar splat on 4.12:

Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class
aes_x86_6
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec
snd_oxygen_lib snd_hda_core                        
Aug 03 15:02:47 kvm_master kernel:  snd_mpu401_uart snd_rawmidi
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio
dm_raid raid456 libcrc32c                 
Aug 03 15:02:47 kvm_master kernel:  crc32c_generic crc32c_intel
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_mod dax raid1 md_mod                                                  
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2
Tainted: P    B D W  O    4.12.3-1-ARCH #1                 
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016                              
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn      
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:
ffffc179880d8000                                             
Aug 03 15:02:47 kvm_master kernel: RIP:
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]                          
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:
00010246                                                     
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:
00000009c07cce77 RCX: dead0000000000ff                               
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:
ffff9fa82d6d6f08 RDI: fffff6e76701f300                               
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:
0000000000100000 R09: 000000000000000d                               
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:
ffff9fa0a56b0000 R12: 00000000009c07cc                               
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:
ffff9f9e19dbb1b8 R15: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: FS:  0000000000000000(0000)
GS:ffff9fac5f340000(0000) knlGS:0000000000000000                    
Aug 03 15:02:47 kvm_master kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033                                               
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:
0000000570a09000 CR4: 00000000003426e0                               
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400                               
Aug 03 15:02:47 kvm_master kernel: Call Trace:                   
Aug 03 15:02:47 kvm_master kernel:  drop_spte+0x1a/0xb0 [kvm]    
Aug 03 15:02:47 kvm_master kernel:  mmu_page_zap_pte+0x9c/0xe0 [kvm]     
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_prepare_zap_page+0x65/0x310
[kvm]
Aug 03 15:02:47 kvm_master kernel: 
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_arch_flush_shadow_all+0xe/0x10
[kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_notifier_release+0x2c/0x40
[kvm]
Aug 03 15:02:47 kvm_master kernel:  __mmu_notifier_release+0x44/0xc0
Aug 03 15:02:47 kvm_master kernel:  exit_mmap+0x142/0x150
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? exit_aio+0xc6/0x100
Aug 03 15:02:47 kvm_master kernel:  mmput_async_fn+0x4c/0x130
Aug 03 15:02:47 kvm_master kernel:  process_one_work+0x1de/0x430
Aug 03 15:02:47 kvm_master kernel:  worker_thread+0x47/0x3f0
Aug 03 15:02:47 kvm_master kernel:  kthread+0x125/0x140
Aug 03 15:02:47 kvm_master kernel:  ? process_one_work+0x430/0x430
Aug 03 15:02:47 kvm_master kernel:  ? kthread_create_on_node+0x70/0x70
Aug 03 15:02:47 kvm_master kernel:  ret_from_fork+0x25/0x30
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c
89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5
53 89 d3 e8 ff 4a fe 
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---

This would typically take 36-48 hours to surface, so we're good so far,
but not completely out of the woods yet. I'm optimistic that since this
patchset changes the mmu_notifier behavior to something safer in
general, this issue will also be resolved by it.

Jeff

> 
> 
> Meow!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
> ⠈⠳⣄⠀⠀⠀⠀ 


[-- Attachment #2: Type: text/html, Size: 12378 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-09-01 14:50             ` taskboxtester
  0 siblings, 0 replies; 36+ messages in thread
From: taskboxtester @ 2017-09-01 14:50 UTC (permalink / raw)
  To: Jeff Cook
  Cc: open list:AMD IOMMU (AMD-VI),
	Paolo Bonzini, Dan Williams, linux-rdma, Jack Steiner,
	Dimitri Sivanich, Bernhard Held, Andrew Morton,
	Radim Krčmář,
	amd-gfx, DRI, xen-devel, Joerg Roedel, Jerome Glisse, ppc-dev,
	Linus Torvalds, Kirill A . Shutemov, Andrea Arcangeli,
	Sudeep Dutt, KVM list, Adam Borowski, Linux Kernel Mailing List,
	Ashutosh Dixit, linux-mm

[-- Attachment #1: Type: text/plain, Size: 7403 bytes --]

taskboxtester@gmail.com liked your message with Boxer for Android.


On Sep 1, 2017 10:48 AM, Jeff Cook <jeff@jeffcook.io> wrote:

On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> > I will wait for people to test and for result of my own test before
> > reposting if need be, otherwise i will post as separate patch.
> >
> > > But from a _very_ quick read-through this looks fine. But it obviously
> > > needs testing.
> > > 
> > > People - *especially* the people who saw issues under KVM - can you
> > > try out Jérôme's patch-series? I aded some people to the cc, the full
> > > series is on lkml. Jérôme - do you have a git branch for people to
> > > test that they could easily pull and try out?
> > 
> > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> > git://people.freedesktop.org/~glisse/linux
> 
> Tested your branch as of 10f07641, on a long list of guest VMs.
> No earth-shattering kaboom.

I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36
hours or so, also without incident.

Unlike most other reporters, I experienced a similar splat on 4.12:

Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class
aes_x86_6
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec
snd_oxygen_lib snd_hda_core                        
Aug 03 15:02:47 kvm_master kernel:  snd_mpu401_uart snd_rawmidi
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio
dm_raid raid456 libcrc32c                 
Aug 03 15:02:47 kvm_master kernel:  crc32c_generic crc32c_intel
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_mod dax raid1 md_mod                                                  
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2
Tainted: P    B D W  O    4.12.3-1-ARCH #1                 
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016                              
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn      
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:
ffffc179880d8000                                             
Aug 03 15:02:47 kvm_master kernel: RIP:
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]                          
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:
00010246                                                     
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:
00000009c07cce77 RCX: dead0000000000ff                               
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:
ffff9fa82d6d6f08 RDI: fffff6e76701f300                               
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:
0000000000100000 R09: 000000000000000d                               
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:
ffff9fa0a56b0000 R12: 00000000009c07cc                               
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:
ffff9f9e19dbb1b8 R15: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: FS:  0000000000000000(0000)
GS:ffff9fac5f340000(0000) knlGS:0000000000000000                    
Aug 03 15:02:47 kvm_master kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033                                               
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:
0000000570a09000 CR4: 00000000003426e0                               
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400                               
Aug 03 15:02:47 kvm_master kernel: Call Trace:                   
Aug 03 15:02:47 kvm_master kernel:  drop_spte+0x1a/0xb0 [kvm]    
Aug 03 15:02:47 kvm_master kernel:  mmu_page_zap_pte+0x9c/0xe0 [kvm]     
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_prepare_zap_page+0x65/0x310
[kvm]
Aug 03 15:02:47 kvm_master kernel: 
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_arch_flush_shadow_all+0xe/0x10
[kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_notifier_release+0x2c/0x40
[kvm]
Aug 03 15:02:47 kvm_master kernel:  __mmu_notifier_release+0x44/0xc0
Aug 03 15:02:47 kvm_master kernel:  exit_mmap+0x142/0x150
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? exit_aio+0xc6/0x100
Aug 03 15:02:47 kvm_master kernel:  mmput_async_fn+0x4c/0x130
Aug 03 15:02:47 kvm_master kernel:  process_one_work+0x1de/0x430
Aug 03 15:02:47 kvm_master kernel:  worker_thread+0x47/0x3f0
Aug 03 15:02:47 kvm_master kernel:  kthread+0x125/0x140
Aug 03 15:02:47 kvm_master kernel:  ? process_one_work+0x430/0x430
Aug 03 15:02:47 kvm_master kernel:  ? kthread_create_on_node+0x70/0x70
Aug 03 15:02:47 kvm_master kernel:  ret_from_fork+0x25/0x30
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c
89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5
53 89 d3 e8 ff 4a fe 
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---

This would typically take 36-48 hours to surface, so we're good so far,
but not completely out of the woods yet. I'm optimistic that since this
patchset changes the mmu_notifier behavior to something safer in
general, this issue will also be resolved by it.

Jeff

> 
> 
> Meow!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
> ⠈⠳⣄⠀⠀⠀⠀ 


[-- Attachment #2: Type: text/html, Size: 12378 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-30 14:57         ` Adam Borowski
  (?)
  (?)
@ 2017-09-01 14:47           ` Jeff Cook
  -1 siblings, 0 replies; 36+ messages in thread
From: Jeff Cook @ 2017-09-01 14:47 UTC (permalink / raw)
  To: Adam Borowski, Jerome Glisse
  Cc: Linus Torvalds, Bernhard Held, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel

On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> > I will wait for people to test and for result of my own test before
> > reposting if need be, otherwise i will post as separate patch.
> >
> > > But from a _very_ quick read-through this looks fine. But it obviously
> > > needs testing.
> > > 
> > > People - *especially* the people who saw issues under KVM - can you
> > > try out Jérôme's patch-series? I aded some people to the cc, the full
> > > series is on lkml. Jérôme - do you have a git branch for people to
> > > test that they could easily pull and try out?
> > 
> > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> > git://people.freedesktop.org/~glisse/linux
> 
> Tested your branch as of 10f07641, on a long list of guest VMs.
> No earth-shattering kaboom.

I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36
hours or so, also without incident.

Unlike most other reporters, I experienced a similar splat on 4.12:

Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class
aes_x86_6
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec
snd_oxygen_lib snd_hda_core                        
Aug 03 15:02:47 kvm_master kernel:  snd_mpu401_uart snd_rawmidi
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio
dm_raid raid456 libcrc32c                 
Aug 03 15:02:47 kvm_master kernel:  crc32c_generic crc32c_intel
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_mod dax raid1 md_mod                                                  
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2
Tainted: P    B D W  O    4.12.3-1-ARCH #1                 
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016                              
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn      
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:
ffffc179880d8000                                             
Aug 03 15:02:47 kvm_master kernel: RIP:
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]                          
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:
00010246                                                     
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:
00000009c07cce77 RCX: dead0000000000ff                               
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:
ffff9fa82d6d6f08 RDI: fffff6e76701f300                               
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:
0000000000100000 R09: 000000000000000d                               
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:
ffff9fa0a56b0000 R12: 00000000009c07cc                               
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:
ffff9f9e19dbb1b8 R15: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: FS:  0000000000000000(0000)
GS:ffff9fac5f340000(0000) knlGS:0000000000000000                    
Aug 03 15:02:47 kvm_master kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033                                               
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:
0000000570a09000 CR4: 00000000003426e0                               
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400                               
Aug 03 15:02:47 kvm_master kernel: Call Trace:                   
Aug 03 15:02:47 kvm_master kernel:  drop_spte+0x1a/0xb0 [kvm]    
Aug 03 15:02:47 kvm_master kernel:  mmu_page_zap_pte+0x9c/0xe0 [kvm]     
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_prepare_zap_page+0x65/0x310
[kvm]
Aug 03 15:02:47 kvm_master kernel: 
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_arch_flush_shadow_all+0xe/0x10
[kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_notifier_release+0x2c/0x40
[kvm]
Aug 03 15:02:47 kvm_master kernel:  __mmu_notifier_release+0x44/0xc0
Aug 03 15:02:47 kvm_master kernel:  exit_mmap+0x142/0x150
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? exit_aio+0xc6/0x100
Aug 03 15:02:47 kvm_master kernel:  mmput_async_fn+0x4c/0x130
Aug 03 15:02:47 kvm_master kernel:  process_one_work+0x1de/0x430
Aug 03 15:02:47 kvm_master kernel:  worker_thread+0x47/0x3f0
Aug 03 15:02:47 kvm_master kernel:  kthread+0x125/0x140
Aug 03 15:02:47 kvm_master kernel:  ? process_one_work+0x430/0x430
Aug 03 15:02:47 kvm_master kernel:  ? kthread_create_on_node+0x70/0x70
Aug 03 15:02:47 kvm_master kernel:  ret_from_fork+0x25/0x30
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c
89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5
53 89 d3 e8 ff 4a fe 
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---

This would typically take 36-48 hours to surface, so we're good so far,
but not completely out of the woods yet. I'm optimistic that since this
patchset changes the mmu_notifier behavior to something safer in
general, this issue will also be resolved by it.

Jeff

> 
> 
> Meow!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
> ⠈⠳⣄⠀⠀⠀⠀ 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-09-01 14:47           ` Jeff Cook
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff Cook @ 2017-09-01 14:47 UTC (permalink / raw)
  To: Adam Borowski, Jerome Glisse
  Cc: Linus Torvalds, Bernhard Held, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> > I will wait for people to test and for result of my own test before
> > reposting if need be, otherwise i will post as separate patch.
> >
> > > But from a _very_ quick read-through this looks fine. But it obviously
> > > needs testing.
> > > 
> > > People - *especially* the people who saw issues under KVM - can you
> > > try out Jérôme's patch-series? I aded some people to the cc, the full
> > > series is on lkml. Jérôme - do you have a git branch for people to
> > > test that they could easily pull and try out?
> > 
> > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> > git://people.freedesktop.org/~glisse/linux
> 
> Tested your branch as of 10f07641, on a long list of guest VMs.
> No earth-shattering kaboom.

I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36
hours or so, also without incident.

Unlike most other reporters, I experienced a similar splat on 4.12:

Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class
aes_x86_6
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec
snd_oxygen_lib snd_hda_core                        
Aug 03 15:02:47 kvm_master kernel:  snd_mpu401_uart snd_rawmidi
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio
dm_raid raid456 libcrc32c                 
Aug 03 15:02:47 kvm_master kernel:  crc32c_generic crc32c_intel
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_mod dax raid1 md_mod                                                  
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2
Tainted: P    B D W  O    4.12.3-1-ARCH #1                 
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016                              
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn      
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:
ffffc179880d8000                                             
Aug 03 15:02:47 kvm_master kernel: RIP:
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]                          
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:
00010246                                                     
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:
00000009c07cce77 RCX: dead0000000000ff                               
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:
ffff9fa82d6d6f08 RDI: fffff6e76701f300                               
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:
0000000000100000 R09: 000000000000000d                               
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:
ffff9fa0a56b0000 R12: 00000000009c07cc                               
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:
ffff9f9e19dbb1b8 R15: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: FS:  0000000000000000(0000)
GS:ffff9fac5f340000(0000) knlGS:0000000000000000                    
Aug 03 15:02:47 kvm_master kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033                                               
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:
0000000570a09000 CR4: 00000000003426e0                               
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400                               
Aug 03 15:02:47 kvm_master kernel: Call Trace:                   
Aug 03 15:02:47 kvm_master kernel:  drop_spte+0x1a/0xb0 [kvm]    
Aug 03 15:02:47 kvm_master kernel:  mmu_page_zap_pte+0x9c/0xe0 [kvm]     
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_prepare_zap_page+0x65/0x310
[kvm]
Aug 03 15:02:47 kvm_master kernel: 
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_arch_flush_shadow_all+0xe/0x10
[kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_notifier_release+0x2c/0x40
[kvm]
Aug 03 15:02:47 kvm_master kernel:  __mmu_notifier_release+0x44/0xc0
Aug 03 15:02:47 kvm_master kernel:  exit_mmap+0x142/0x150
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? exit_aio+0xc6/0x100
Aug 03 15:02:47 kvm_master kernel:  mmput_async_fn+0x4c/0x130
Aug 03 15:02:47 kvm_master kernel:  process_one_work+0x1de/0x430
Aug 03 15:02:47 kvm_master kernel:  worker_thread+0x47/0x3f0
Aug 03 15:02:47 kvm_master kernel:  kthread+0x125/0x140
Aug 03 15:02:47 kvm_master kernel:  ? process_one_work+0x430/0x430
Aug 03 15:02:47 kvm_master kernel:  ? kthread_create_on_node+0x70/0x70
Aug 03 15:02:47 kvm_master kernel:  ret_from_fork+0x25/0x30
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c
89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5
53 89 d3 e8 ff 4a fe 
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---

This would typically take 36-48 hours to surface, so we're good so far,
but not completely out of the woods yet. I'm optimistic that since this
patchset changes the mmu_notifier behavior to something safer in
general, this issue will also be resolved by it.

Jeff

> 
> 
> Meow!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
> ⠈⠳⣄⠀⠀⠀⠀ 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-09-01 14:47           ` Jeff Cook
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff Cook @ 2017-09-01 14:47 UTC (permalink / raw)
  To: Adam Borowski, Jerome Glisse
  Cc: Linus Torvalds, Bernhard Held, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> > I will wait for people to test and for result of my own test before
> > reposting if need be, otherwise i will post as separate patch.
> >
> > > But from a _very_ quick read-through this looks fine. But it obviously
> > > needs testing.
> > > 
> > > People - *especially* the people who saw issues under KVM - can you
> > > try out Jérôme's patch-series? I aded some people to the cc, the full
> > > series is on lkml. Jérôme - do you have a git branch for people to
> > > test that they could easily pull and try out?
> > 
> > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> > git://people.freedesktop.org/~glisse/linux
> 
> Tested your branch as of 10f07641, on a long list of guest VMs.
> No earth-shattering kaboom.

I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36
hours or so, also without incident.

Unlike most other reporters, I experienced a similar splat on 4.12:

Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class
aes_x86_6
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec
snd_oxygen_lib snd_hda_core                        
Aug 03 15:02:47 kvm_master kernel:  snd_mpu401_uart snd_rawmidi
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio
dm_raid raid456 libcrc32c                 
Aug 03 15:02:47 kvm_master kernel:  crc32c_generic crc32c_intel
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_mod dax raid1 md_mod                                                  
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2
Tainted: P    B D W  O    4.12.3-1-ARCH #1                 
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016                              
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn      
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:
ffffc179880d8000                                             
Aug 03 15:02:47 kvm_master kernel: RIP:
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]                          
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:
00010246                                                     
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:
00000009c07cce77 RCX: dead0000000000ff                               
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:
ffff9fa82d6d6f08 RDI: fffff6e76701f300                               
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:
0000000000100000 R09: 000000000000000d                               
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:
ffff9fa0a56b0000 R12: 00000000009c07cc                               
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:
ffff9f9e19dbb1b8 R15: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: FS:  0000000000000000(0000)
GS:ffff9fac5f340000(0000) knlGS:0000000000000000                    
Aug 03 15:02:47 kvm_master kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033                                               
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:
0000000570a09000 CR4: 00000000003426e0                               
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400                               
Aug 03 15:02:47 kvm_master kernel: Call Trace:                   
Aug 03 15:02:47 kvm_master kernel:  drop_spte+0x1a/0xb0 [kvm]    
Aug 03 15:02:47 kvm_master kernel:  mmu_page_zap_pte+0x9c/0xe0 [kvm]     
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_prepare_zap_page+0x65/0x310
[kvm]
Aug 03 15:02:47 kvm_master kernel: 
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_arch_flush_shadow_all+0xe/0x10
[kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_notifier_release+0x2c/0x40
[kvm]
Aug 03 15:02:47 kvm_master kernel:  __mmu_notifier_release+0x44/0xc0
Aug 03 15:02:47 kvm_master kernel:  exit_mmap+0x142/0x150
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? exit_aio+0xc6/0x100
Aug 03 15:02:47 kvm_master kernel:  mmput_async_fn+0x4c/0x130
Aug 03 15:02:47 kvm_master kernel:  process_one_work+0x1de/0x430
Aug 03 15:02:47 kvm_master kernel:  worker_thread+0x47/0x3f0
Aug 03 15:02:47 kvm_master kernel:  kthread+0x125/0x140
Aug 03 15:02:47 kvm_master kernel:  ? process_one_work+0x430/0x430
Aug 03 15:02:47 kvm_master kernel:  ? kthread_create_on_node+0x70/0x70
Aug 03 15:02:47 kvm_master kernel:  ret_from_fork+0x25/0x30
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c
89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5
53 89 d3 e8 ff 4a fe 
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---

This would typically take 36-48 hours to surface, so we're good so far,
but not completely out of the woods yet. I'm optimistic that since this
patchset changes the mmu_notifier behavior to something safer in
general, this issue will also be resolved by it.

Jeff

> 
> 
> Meow!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
> ⠈⠳⣄⠀⠀⠀⠀ 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-09-01 14:47           ` Jeff Cook
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff Cook @ 2017-09-01 14:47 UTC (permalink / raw)
  To: Adam Borowski, Jerome Glisse
  Cc: Linus Torvalds, Bernhard Held, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> > I will wait for people to test and for result of my own test before
> > reposting if need be, otherwise i will post as separate patch.
> >
> > > But from a _very_ quick read-through this looks fine. But it obviously
> > > needs testing.
> > >=20
> > > People - *especially* the people who saw issues under KVM - can you
> > > try out J=C3=A9r=C3=B4me's patch-series? I aded some people to the cc=
, the full
> > > series is on lkml. J=C3=A9r=C3=B4me - do you have a git branch for pe=
ople to
> > > test that they could easily pull and try out?
> >=20
> > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> > git://people.freedesktop.org/~glisse/linux
>=20
> Tested your branch as of 10f07641, on a long list of guest VMs.
> No earth-shattering kaboom.

I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36
hours or so, also without incident.

Unlike most other reporters, I experienced a similar splat on 4.12:

Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class
aes_x86_6
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec
snd_oxygen_lib snd_hda_core=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel:  snd_mpu401_uart snd_rawmidi
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio
dm_raid raid456 libcrc32c=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel:  crc32c_generic crc32c_intel
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_mod dax raid1 md_mod=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2
Tainted: P    B D W  O    4.12.3-1-ARCH #1=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn=20=20=
=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:
ffffc179880d8000=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20
Aug 03 15:02:47 kvm_master kernel: RIP:
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:
00010246=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:
00000009c07cce77 RCX: dead0000000000ff=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:
ffff9fa82d6d6f08 RDI: fffff6e76701f300=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:
0000000000100000 R09: 000000000000000d=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:
ffff9fa0a56b0000 R12: 00000000009c07cc=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:
ffff9f9e19dbb1b8 R15: 0000000000000000=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: FS:  0000000000000000(0000)
GS:ffff9fac5f340000(0000) knlGS:0000000000000000=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:
0000000570a09000 CR4: 00000000003426e0=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel: Call Trace:=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20
Aug 03 15:02:47 kvm_master kernel:  drop_spte+0x1a/0xb0 [kvm]=20=20=20=20
Aug 03 15:02:47 kvm_master kernel:  mmu_page_zap_pte+0x9c/0xe0 [kvm]=20=20=
=20=20=20
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_prepare_zap_page+0x65/0x310
[kvm]
Aug 03 15:02:47 kvm_master kernel:=20
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_arch_flush_shadow_all+0xe/0x10
[kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_notifier_release+0x2c/0x40
[kvm]
Aug 03 15:02:47 kvm_master kernel:  __mmu_notifier_release+0x44/0xc0
Aug 03 15:02:47 kvm_master kernel:  exit_mmap+0x142/0x150
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? exit_aio+0xc6/0x100
Aug 03 15:02:47 kvm_master kernel:  mmput_async_fn+0x4c/0x130
Aug 03 15:02:47 kvm_master kernel:  process_one_work+0x1de/0x430
Aug 03 15:02:47 kvm_master kernel:  worker_thread+0x47/0x3f0
Aug 03 15:02:47 kvm_master kernel:  kthread+0x125/0x140
Aug 03 15:02:47 kvm_master kernel:  ? process_one_work+0x430/0x430
Aug 03 15:02:47 kvm_master kernel:  ? kthread_create_on_node+0x70/0x70
Aug 03 15:02:47 kvm_master kernel:  ret_from_fork+0x25/0x30
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c
89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5
53 89 d3 e8 ff 4a fe=20
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---

This would typically take 36-48 hours to surface, so we're good so far,
but not completely out of the woods yet. I'm optimistic that since this
patchset changes the mmu_notifier behavior to something safer in
general, this issue will also be resolved by it.

Jeff

>=20
>=20
> Meow!
> --=20
> =E2=A2=80=E2=A3=B4=E2=A0=BE=E2=A0=BB=E2=A2=B6=E2=A3=A6=E2=A0=80=20
> =E2=A3=BE=E2=A0=81=E2=A2=B0=E2=A0=92=E2=A0=80=E2=A3=BF=E2=A1=81 Vat kind =
uf sufficiently advanced technology iz dis!?
> =E2=A2=BF=E2=A1=84=E2=A0=98=E2=A0=B7=E2=A0=9A=E2=A0=8B=E2=A0=80          =
                       -- Genghis Ht'rok'din
> =E2=A0=88=E2=A0=B3=E2=A3=84=E2=A0=80=E2=A0=80=E2=A0=80=E2=A0=80=20

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-30 14:57         ` Adam Borowski
  (?)
  (?)
@ 2017-09-01 14:47         ` Jeff Cook
  -1 siblings, 0 replies; 36+ messages in thread
From: Jeff Cook @ 2017-09-01 14:47 UTC (permalink / raw)
  To: Adam Borowski, Jerome Glisse
  Cc: Bernhard Held, KVM list, Radim Krčmář,
	Sudeep Dutt, DRI, linux-mm, Andrea Arcangeli, Dimitri Sivanich,
	linux-rdma, amd-gfx, xen-devel, Joerg Roedel, Jack Steiner,
	Dan Williams, Linus Torvalds, Linux Kernel Mailing List,
	Ashutosh Dixit, open list:AMD IOMMU (AMD-VI),
	Paolo Bonzini, Andrew Morton, ppc-dev

On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> > I will wait for people to test and for result of my own test before
> > reposting if need be, otherwise i will post as separate patch.
> >
> > > But from a _very_ quick read-through this looks fine. But it obviously
> > > needs testing.
> > > 
> > > People - *especially* the people who saw issues under KVM - can you
> > > try out Jérôme's patch-series? I aded some people to the cc, the full
> > > series is on lkml. Jérôme - do you have a git branch for people to
> > > test that they could easily pull and try out?
> > 
> > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> > git://people.freedesktop.org/~glisse/linux
> 
> Tested your branch as of 10f07641, on a long list of guest VMs.
> No earth-shattering kaboom.

I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36
hours or so, also without incident.

Unlike most other reporters, I experienced a similar splat on 4.12:

Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class
aes_x86_6
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec
snd_oxygen_lib snd_hda_core                        
Aug 03 15:02:47 kvm_master kernel:  snd_mpu401_uart snd_rawmidi
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio
dm_raid raid456 libcrc32c                 
Aug 03 15:02:47 kvm_master kernel:  crc32c_generic crc32c_intel
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_mod dax raid1 md_mod                                                  
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2
Tainted: P    B D W  O    4.12.3-1-ARCH #1                 
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016                              
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn      
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:
ffffc179880d8000                                             
Aug 03 15:02:47 kvm_master kernel: RIP:
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]                          
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:
00010246                                                     
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:
00000009c07cce77 RCX: dead0000000000ff                               
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:
ffff9fa82d6d6f08 RDI: fffff6e76701f300                               
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:
0000000000100000 R09: 000000000000000d                               
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:
ffff9fa0a56b0000 R12: 00000000009c07cc                               
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:
ffff9f9e19dbb1b8 R15: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: FS:  0000000000000000(0000)
GS:ffff9fac5f340000(0000) knlGS:0000000000000000                    
Aug 03 15:02:47 kvm_master kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033                                               
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:
0000000570a09000 CR4: 00000000003426e0                               
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400                               
Aug 03 15:02:47 kvm_master kernel: Call Trace:                   
Aug 03 15:02:47 kvm_master kernel:  drop_spte+0x1a/0xb0 [kvm]    
Aug 03 15:02:47 kvm_master kernel:  mmu_page_zap_pte+0x9c/0xe0 [kvm]     
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_prepare_zap_page+0x65/0x310
[kvm]
Aug 03 15:02:47 kvm_master kernel: 
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_arch_flush_shadow_all+0xe/0x10
[kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_notifier_release+0x2c/0x40
[kvm]
Aug 03 15:02:47 kvm_master kernel:  __mmu_notifier_release+0x44/0xc0
Aug 03 15:02:47 kvm_master kernel:  exit_mmap+0x142/0x150
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? exit_aio+0xc6/0x100
Aug 03 15:02:47 kvm_master kernel:  mmput_async_fn+0x4c/0x130
Aug 03 15:02:47 kvm_master kernel:  process_one_work+0x1de/0x430
Aug 03 15:02:47 kvm_master kernel:  worker_thread+0x47/0x3f0
Aug 03 15:02:47 kvm_master kernel:  kthread+0x125/0x140
Aug 03 15:02:47 kvm_master kernel:  ? process_one_work+0x430/0x430
Aug 03 15:02:47 kvm_master kernel:  ? kthread_create_on_node+0x70/0x70
Aug 03 15:02:47 kvm_master kernel:  ret_from_fork+0x25/0x30
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c
89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5
53 89 d3 e8 ff 4a fe 
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---

This would typically take 36-48 hours to surface, so we're good so far,
but not completely out of the woods yet. I'm optimistic that since this
patchset changes the mmu_notifier behavior to something safer in
general, this issue will also be resolved by it.

Jeff

> 
> 
> Meow!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
> ⠈⠳⣄⠀⠀⠀⠀ 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-31 23:19               ` Felix Kuehling
@ 2017-08-31 23:29                 ` Jerome Glisse
  0 siblings, 0 replies; 36+ messages in thread
From: Jerome Glisse @ 2017-08-31 23:29 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: intel-gfx, Christian König, amd-gfx

On Thu, Aug 31, 2017 at 07:19:19PM -0400, Felix Kuehling wrote:
> On 2017-08-31 03:00 PM, Jerome Glisse wrote:
> > I was not saying you should not use mmu_notifier. For achieving B you need
> > mmu_notifier. Note that if you do like ODP/KVM then you do not need to
> > pin page.
> I would like that. I've thought about it before. The one problem I
> couldn't figure out is, where to set the accessed and dirty bits for the
> pages. Now we do it when we unpin. If we don't pin the pages in the
> first place, we don't have a good place for this.
> 
> Our hardware doesn't give us notifications or accessed/dirty bits, so we
> have to assume the worst case that the pages are continuously
> accessed/dirty.
> 
> I'd appreciate any advice how to handle that. (Sorry, I realize this is
> going a bit off topic.) A pointer to a document or source code would be
> great. :)

In invalidate_range_start() ie same place as where you unpin but instead
of unpining you would just call set_page_dirty()

Jérôme
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
       [not found]             ` <20170831190021.GG9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-08-31 23:19               ` Felix Kuehling
  2017-08-31 23:29                 ` Jerome Glisse
  0 siblings, 1 reply; 36+ messages in thread
From: Felix Kuehling @ 2017-08-31 23:19 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: intel-gfx, Christian König,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2017-08-31 03:00 PM, Jerome Glisse wrote:
> I was not saying you should not use mmu_notifier. For achieving B you need
> mmu_notifier. Note that if you do like ODP/KVM then you do not need to
> pin page.
I would like that. I've thought about it before. The one problem I
couldn't figure out is, where to set the accessed and dirty bits for the
pages. Now we do it when we unpin. If we don't pin the pages in the
first place, we don't have a good place for this.

Our hardware doesn't give us notifications or accessed/dirty bits, so we
have to assume the worst case that the pages are continuously
accessed/dirty.

I'd appreciate any advice how to handle that. (Sorry, I realize this is
going a bit off topic.) A pointer to a document or source code would be
great. :)

Thanks,
  Felix

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-31 18:39         ` Felix Kuehling
@ 2017-08-31 19:00           ` Jerome Glisse
       [not found]             ` <20170831190021.GG9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Jerome Glisse @ 2017-08-31 19:00 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: intel-gfx, Christian König, amd-gfx

On Thu, Aug 31, 2017 at 02:39:17PM -0400, Felix Kuehling wrote:
> On 2017-08-31 09:59 AM, Jerome Glisse wrote:
> > [Adding Intel folks as they might be interested in this discussion]
> >
> > On Wed, Aug 30, 2017 at 05:51:52PM -0400, Felix Kuehling wrote:
> >> Hi Jérôme,
> >>
> >> I have some questions about the potential range-start-end race you
> >> mentioned.
> >>
> >> On 2017-08-29 07:54 PM, Jérôme Glisse wrote:
> >>> Note that a lot of existing user feels broken in respect to range_start/
> >>> range_end. Many user only have range_start() callback but there is nothing
> >>> preventing them to undo what was invalidated in their range_start() callback
> >>> after it returns but before any CPU page table update take place.
> >>>
> >>> The code pattern use in kvm or umem odp is an example on how to properly
> >>> avoid such race. In a nutshell use some kind of sequence number and active
> >>> range invalidation counter to block anything that might undo what the
> >>> range_start() callback did.
> >> What happens when we start monitoring an address range after
> >> invaligate_range_start was called? Sounds like we have to keep track of
> >> all active invalidations for just such a case, even in address ranges
> >> that we don't currently care about.
> >>
> >> What are the things we cannot do between invalidate_range_start and
> >> invalidate_range_end? amdgpu calls get_user_pages to re-validate our
> >> userptr mappings after the invalidate_range_start notifier invalidated
> >> it. Do we have to wait for invalidate_range_end before we can call
> >> get_user_pages safely?
> > Well the whole userptr bo object is somewhat broken from the start.
> > You never defined the semantic of it ie what is expected. I can
> > think of 2 differents semantics:
> >   A) a uptr buffer object is a snapshot of a memory at the time of
> >      uptr buffer object creation
> >   B) a uptr buffer object allow GPU to access a range of virtual
> >      address of a process an share coherent view of that range
> >      between CPU and GPU
> >
> > As it was implemented it is more inline with B but it is not defined
> > anywhere AFAICT.
> 
> Yes, we're trying to achieve B, that's why we have an MMU notifier in
> the first place. But it's been a struggle getting it to work properly,
> and we're still dealing with some locking issues and now this one.
> 
> >
> > Anyway getting back to your questions, it kind of doesn't matter as
> > you are using GUP ie you are pinning pages except for one scenario
> > (at least i can only think of one).
> >
> > Problematic case is race between CPU write to zero page or COW and
> > GPU driver doing read only GUP:
> [...]
> 
> Thanks, I was aware of COW but not of the zero-page case. I believe in
> most practical cases our userptr mappings are read-write, so this is
> probably not causing us any real trouble at the moment.
> 
> > So i would first define the semantic of uptr bo and then i would fix
> > accordingly the code. Semantic A is easier to implement and you could
> > just drop the whole mmu_notifier. Maybe it is better to create uptr
> > buffer object everytime you want to snapshot a range of address. I
> > don't think the overhead of buffer creation would matter.
> 
> That doesn't work for KFD and our compute memory model where CPU and GPU
> expect to share the same address space.
> 
> >
> > If you want to close the race for COW and zero page in case of read
> > only GUP there is no other way than what KVM or ODP is doing. I had
> > patchset to simplify all this but i need to bring it back to life.
> 
> OK. I'll look at these to get an idea.
> 
> > Note that other thing might race but as you pin the pages they do
> > not matter. It just mean that if you GUP after range_start() but
> > before range_end() and before CPU page table update then you pinned
> > the same old page again and nothing will happen (migrate will fail,
> > MADV_FREE will nop, ...). So you just did the range_start() callback
> > for nothing in those cases.
> 
> We pin the memory because the GPU wants to access it. So this is
> probably OK if the migration fails. However, your statement that we
> "just did the range_start() callback for nothing" implies that we could
> as well have ignored the range_start callback. But I don't think that's
> true. That way we'd keep a page pinned that is no longer mapped in the
> CPU address space. So the GPU mapping would be out of sync with the CPU
> mapping.

Here is an example of "for nothing":
    CPU0                                CPU1
    > migrate page at addr A
      > invalidate_start addr A
        > unbind_ttm(for addr A)
                                        > use ttm object for addr A
                                        > GUP addr A
      > page table update
      > invalidate_end addr A
      > refcount check fails because
        of GUP
      > restore page table to A

This is what i meant by range_start() for nothing on CPU0 you invalidated
ttm object for nothing but this is because of a race. Fixing COW race
would also fix this race and avoid migration to fail because GPU GUP.

I was not saying you should not use mmu_notifier. For achieving B you need
mmu_notifier. Note that if you do like ODP/KVM then you do not need to
pin page.

Cheers,
Jérôme
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
       [not found]       ` <20170831135953.GA9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-08-31 14:14         ` Christian König
@ 2017-08-31 18:39         ` Felix Kuehling
  2017-08-31 19:00           ` Jerome Glisse
  1 sibling, 1 reply; 36+ messages in thread
From: Felix Kuehling @ 2017-08-31 18:39 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: intel-gfx, Christian König,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2017-08-31 09:59 AM, Jerome Glisse wrote:
> [Adding Intel folks as they might be interested in this discussion]
>
> On Wed, Aug 30, 2017 at 05:51:52PM -0400, Felix Kuehling wrote:
>> Hi Jérôme,
>>
>> I have some questions about the potential range-start-end race you
>> mentioned.
>>
>> On 2017-08-29 07:54 PM, Jérôme Glisse wrote:
>>> Note that a lot of existing user feels broken in respect to range_start/
>>> range_end. Many user only have range_start() callback but there is nothing
>>> preventing them to undo what was invalidated in their range_start() callback
>>> after it returns but before any CPU page table update take place.
>>>
>>> The code pattern use in kvm or umem odp is an example on how to properly
>>> avoid such race. In a nutshell use some kind of sequence number and active
>>> range invalidation counter to block anything that might undo what the
>>> range_start() callback did.
>> What happens when we start monitoring an address range after
>> invaligate_range_start was called? Sounds like we have to keep track of
>> all active invalidations for just such a case, even in address ranges
>> that we don't currently care about.
>>
>> What are the things we cannot do between invalidate_range_start and
>> invalidate_range_end? amdgpu calls get_user_pages to re-validate our
>> userptr mappings after the invalidate_range_start notifier invalidated
>> it. Do we have to wait for invalidate_range_end before we can call
>> get_user_pages safely?
> Well the whole userptr bo object is somewhat broken from the start.
> You never defined the semantic of it ie what is expected. I can
> think of 2 differents semantics:
>   A) a uptr buffer object is a snapshot of a memory at the time of
>      uptr buffer object creation
>   B) a uptr buffer object allow GPU to access a range of virtual
>      address of a process an share coherent view of that range
>      between CPU and GPU
>
> As it was implemented it is more inline with B but it is not defined
> anywhere AFAICT.

Yes, we're trying to achieve B, that's why we have an MMU notifier in
the first place. But it's been a struggle getting it to work properly,
and we're still dealing with some locking issues and now this one.

>
> Anyway getting back to your questions, it kind of doesn't matter as
> you are using GUP ie you are pinning pages except for one scenario
> (at least i can only think of one).
>
> Problematic case is race between CPU write to zero page or COW and
> GPU driver doing read only GUP:
[...]

Thanks, I was aware of COW but not of the zero-page case. I believe in
most practical cases our userptr mappings are read-write, so this is
probably not causing us any real trouble at the moment.

> So i would first define the semantic of uptr bo and then i would fix
> accordingly the code. Semantic A is easier to implement and you could
> just drop the whole mmu_notifier. Maybe it is better to create uptr
> buffer object everytime you want to snapshot a range of address. I
> don't think the overhead of buffer creation would matter.

That doesn't work for KFD and our compute memory model where CPU and GPU
expect to share the same address space.

>
> If you want to close the race for COW and zero page in case of read
> only GUP there is no other way than what KVM or ODP is doing. I had
> patchset to simplify all this but i need to bring it back to life.

OK. I'll look at these to get an idea.

> Note that other thing might race but as you pin the pages they do
> not matter. It just mean that if you GUP after range_start() but
> before range_end() and before CPU page table update then you pinned
> the same old page again and nothing will happen (migrate will fail,
> MADV_FREE will nop, ...). So you just did the range_start() callback
> for nothing in those cases.

We pin the memory because the GPU wants to access it. So this is
probably OK if the migration fails. However, your statement that we
"just did the range_start() callback for nothing" implies that we could
as well have ignored the range_start callback. But I don't think that's
true. That way we'd keep a page pinned that is no longer mapped in the
CPU address space. So the GPU mapping would be out of sync with the CPU
mapping.

>
> (Sorry for taking so long to answer i forgot your mail yesterday with
>  all the other discussion going on).

Thanks. Your reply helps. I'll take a look at KVM or ODP and see how
this is applicable to our driver.

Regards,
  Felix

>
> Cheers,
> Jérôme

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
       [not found]       ` <20170831135953.GA9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-08-31 14:14         ` Christian König
  2017-08-31 18:39         ` Felix Kuehling
  1 sibling, 0 replies; 36+ messages in thread
From: Christian König @ 2017-08-31 14:14 UTC (permalink / raw)
  To: Jerome Glisse, Felix Kuehling
  Cc: intel-gfx, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 31.08.2017 um 15:59 schrieb Jerome Glisse:
> [Adding Intel folks as they might be interested in this discussion]
>
> On Wed, Aug 30, 2017 at 05:51:52PM -0400, Felix Kuehling wrote:
>> Hi Jérôme,
>>
>> I have some questions about the potential range-start-end race you
>> mentioned.
>>
>> On 2017-08-29 07:54 PM, Jérôme Glisse wrote:
>>> Note that a lot of existing user feels broken in respect to range_start/
>>> range_end. Many user only have range_start() callback but there is nothing
>>> preventing them to undo what was invalidated in their range_start() callback
>>> after it returns but before any CPU page table update take place.
>>>
>>> The code pattern use in kvm or umem odp is an example on how to properly
>>> avoid such race. In a nutshell use some kind of sequence number and active
>>> range invalidation counter to block anything that might undo what the
>>> range_start() callback did.
>> What happens when we start monitoring an address range after
>> invaligate_range_start was called? Sounds like we have to keep track of
>> all active invalidations for just such a case, even in address ranges
>> that we don't currently care about.
>>
>> What are the things we cannot do between invalidate_range_start and
>> invalidate_range_end? amdgpu calls get_user_pages to re-validate our
>> userptr mappings after the invalidate_range_start notifier invalidated
>> it. Do we have to wait for invalidate_range_end before we can call
>> get_user_pages safely?
> Well the whole userptr bo object is somewhat broken from the start.
> You never defined the semantic of it ie what is expected. I can
> think of 2 differents semantics:
>    A) a uptr buffer object is a snapshot of a memory at the time of
>       uptr buffer object creation
>    B) a uptr buffer object allow GPU to access a range of virtual
>       address of a process an share coherent view of that range
>       between CPU and GPU
>
> As it was implemented it is more inline with B but it is not defined
> anywhere AFAICT.

Well it is not documented, but the userspace APIs build on top of that 
require semantics B.

Essentially you could have cases where the GPU or the CPU is waiting in 
a busy loop for the other one to change some memory address.

> Anyway getting back to your questions, it kind of doesn't matter as
> you are using GUP ie you are pinning pages except for one scenario
> (at least i can only think of one).
>
> Problematic case is race between CPU write to zero page or COW and
> GPU driver doing read only GUP:
>
>      CPU thread 1                       | CPU thread 2
>      ---------------------------------------------------------------------
>                                         |
>                                         | uptr covering addr A read only
>                                         | .... do stuff with A
>      write fault to addr A              |
>      invalidate_range_start([A, A+1])   | unbind_ttm -> unpin
>                                         | validate bo -> GUP -> zero page
>      lock page table                    |
>      replace zero pfn/COW with new page |
>      unlock page table                  |
>      invalidate_range_end([A, A+1])     |
>
> So here the GPU would be using wrong page for the address. How bad
> is it is undefined as the semantic of uptr is undefine. Given how it
> as been use so far this race is unlikely (i don't think we have many
> userspace that use that feature and do fork).
>
>
> So i would first define the semantic of uptr bo and then i would fix
> accordingly the code. Semantic A is easier to implement and you could
> just drop the whole mmu_notifier. Maybe it is better to create uptr
> buffer object everytime you want to snapshot a range of address. I
> don't think the overhead of buffer creation would matter.

We do support creating userptr without mmu_notifier for exactly that 
purpose, e.g. uploads of snapshots what user space address space looked 
like in a certain moment.

Unfortunately we found that the overhead of buffer creation (and the 
related gup) is way to high to be useful as a throw away object. Just 
memcpy into a BO has just less latency over all.

And yeah, at least I'm perfectly aware of the problems with fork() and 
COW. BTW: It becomes really really ugly if you think about what happens 
when the parent writes to a page first and the GPU then has the child copy.

Regards,
Christian.

> If you want to close the race for COW and zero page in case of read
> only GUP there is no other way than what KVM or ODP is doing. I had
> patchset to simplify all this but i need to bring it back to life.
>
>
> Note that other thing might race but as you pin the pages they do
> not matter. It just mean that if you GUP after range_start() but
> before range_end() and before CPU page table update then you pinned
> the same old page again and nothing will happen (migrate will fail,
> MADV_FREE will nop, ...). So you just did the range_start() callback
> for nothing in those cases.
>
> (Sorry for taking so long to answer i forgot your mail yesterday with
>   all the other discussion going on).
>
> Cheers,
> Jérôme


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-30 21:51   ` Felix Kuehling
@ 2017-08-31 13:59     ` Jerome Glisse
       [not found]       ` <20170831135953.GA9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Jerome Glisse @ 2017-08-31 13:59 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: intel-gfx, Christian König, amd-gfx

[Adding Intel folks as they might be interested in this discussion]

On Wed, Aug 30, 2017 at 05:51:52PM -0400, Felix Kuehling wrote:
> Hi Jérôme,
> 
> I have some questions about the potential range-start-end race you
> mentioned.
> 
> On 2017-08-29 07:54 PM, Jérôme Glisse wrote:
> > Note that a lot of existing user feels broken in respect to range_start/
> > range_end. Many user only have range_start() callback but there is nothing
> > preventing them to undo what was invalidated in their range_start() callback
> > after it returns but before any CPU page table update take place.
> >
> > The code pattern use in kvm or umem odp is an example on how to properly
> > avoid such race. In a nutshell use some kind of sequence number and active
> > range invalidation counter to block anything that might undo what the
> > range_start() callback did.
> What happens when we start monitoring an address range after
> invaligate_range_start was called? Sounds like we have to keep track of
> all active invalidations for just such a case, even in address ranges
> that we don't currently care about.
> 
> What are the things we cannot do between invalidate_range_start and
> invalidate_range_end? amdgpu calls get_user_pages to re-validate our
> userptr mappings after the invalidate_range_start notifier invalidated
> it. Do we have to wait for invalidate_range_end before we can call
> get_user_pages safely?

Well the whole userptr bo object is somewhat broken from the start.
You never defined the semantic of it ie what is expected. I can
think of 2 differents semantics:
  A) a uptr buffer object is a snapshot of a memory at the time of
     uptr buffer object creation
  B) a uptr buffer object allow GPU to access a range of virtual
     address of a process an share coherent view of that range
     between CPU and GPU

As it was implemented it is more inline with B but it is not defined
anywhere AFAICT.

Anyway getting back to your questions, it kind of doesn't matter as
you are using GUP ie you are pinning pages except for one scenario
(at least i can only think of one).

Problematic case is race between CPU write to zero page or COW and
GPU driver doing read only GUP:

    CPU thread 1                       | CPU thread 2
    ---------------------------------------------------------------------
                                       |
                                       | uptr covering addr A read only
                                       | .... do stuff with A
    write fault to addr A              |
    invalidate_range_start([A, A+1])   | unbind_ttm -> unpin
                                       | validate bo -> GUP -> zero page
    lock page table                    |
    replace zero pfn/COW with new page |
    unlock page table                  |
    invalidate_range_end([A, A+1])     |

So here the GPU would be using wrong page for the address. How bad
is it is undefined as the semantic of uptr is undefine. Given how it
as been use so far this race is unlikely (i don't think we have many
userspace that use that feature and do fork).


So i would first define the semantic of uptr bo and then i would fix
accordingly the code. Semantic A is easier to implement and you could
just drop the whole mmu_notifier. Maybe it is better to create uptr
buffer object everytime you want to snapshot a range of address. I
don't think the overhead of buffer creation would matter.


If you want to close the race for COW and zero page in case of read
only GUP there is no other way than what KVM or ODP is doing. I had
patchset to simplify all this but i need to bring it back to life.


Note that other thing might race but as you pin the pages they do
not matter. It just mean that if you GUP after range_start() but
before range_end() and before CPU page table update then you pinned
the same old page again and nothing will happen (migrate will fail,
MADV_FREE will nop, ...). So you just did the range_start() callback
for nothing in those cases.

(Sorry for taking so long to answer i forgot your mail yesterday with
 all the other discussion going on).

Cheers,
Jérôme
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
       [not found] ` <20170829235447.10050-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-08-30  0:11     ` Linus Torvalds
@ 2017-08-30 21:51   ` Felix Kuehling
  2017-08-31 13:59     ` Jerome Glisse
  1 sibling, 1 reply; 36+ messages in thread
From: Felix Kuehling @ 2017-08-30 21:51 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	jglisse-H+wXaHxf7aLQT0dZR+AlfA, Christian König

Hi Jérôme,

I have some questions about the potential range-start-end race you
mentioned.

On 2017-08-29 07:54 PM, Jérôme Glisse wrote:
> Note that a lot of existing user feels broken in respect to range_start/
> range_end. Many user only have range_start() callback but there is nothing
> preventing them to undo what was invalidated in their range_start() callback
> after it returns but before any CPU page table update take place.
>
> The code pattern use in kvm or umem odp is an example on how to properly
> avoid such race. In a nutshell use some kind of sequence number and active
> range invalidation counter to block anything that might undo what the
> range_start() callback did.
What happens when we start monitoring an address range after
invaligate_range_start was called? Sounds like we have to keep track of
all active invalidations for just such a case, even in address ranges
that we don't currently care about.

What are the things we cannot do between invalidate_range_start and
invalidate_range_end? amdgpu calls get_user_pages to re-validate our
userptr mappings after the invalidate_range_start notifier invalidated
it. Do we have to wait for invalidate_range_end before we can call
get_user_pages safely?

Regards,
  Felix
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-30  0:56       ` Jerome Glisse
  (?)
@ 2017-08-30 14:57         ` Adam Borowski
  -1 siblings, 0 replies; 36+ messages in thread
From: Adam Borowski @ 2017-08-30 14:57 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Linus Torvalds, Bernhard Held, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI)

On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> I will wait for people to test and for result of my own test before
> reposting if need be, otherwise i will post as separate patch.
>
> > But from a _very_ quick read-through this looks fine. But it obviously
> > needs testing.
> > 
> > People - *especially* the people who saw issues under KVM - can you
> > try out Jérôme's patch-series? I aded some people to the cc, the full
> > series is on lkml. Jérôme - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Tested your branch as of 10f07641, on a long list of guest VMs.
No earth-shattering kaboom.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
⠈⠳⣄⠀⠀⠀⠀ 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30 14:57         ` Adam Borowski
  0 siblings, 0 replies; 36+ messages in thread
From: Adam Borowski @ 2017-08-30 14:57 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Linus Torvalds, Bernhard Held, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> I will wait for people to test and for result of my own test before
> reposting if need be, otherwise i will post as separate patch.
>
> > But from a _very_ quick read-through this looks fine. But it obviously
> > needs testing.
> > 
> > People - *especially* the people who saw issues under KVM - can you
> > try out Jérôme's patch-series? I aded some people to the cc, the full
> > series is on lkml. Jérôme - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Tested your branch as of 10f07641, on a long list of guest VMs.
No earth-shattering kaboom.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
⠈⠳⣄⠀⠀⠀⠀ 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30 14:57         ` Adam Borowski
  0 siblings, 0 replies; 36+ messages in thread
From: Adam Borowski @ 2017-08-30 14:57 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Linus Torvalds, Bernhard Held, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> I will wait for people to test and for result of my own test before
> reposting if need be, otherwise i will post as separate patch.
>
> > But from a _very_ quick read-through this looks fine. But it obviously
> > needs testing.
> > 
> > People - *especially* the people who saw issues under KVM - can you
> > try out JA(C)rA'me's patch-series? I aded some people to the cc, the full
> > series is on lkml. JA(C)rA'me - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Tested your branch as of 10f07641, on a long list of guest VMs.
No earth-shattering kaboom.


Meow!
-- 
ac?aGBP'a  3/4 a >>ac?aGBP|a ? 
aGBP 3/4 a ?ac?a ?a ?aGBP?a!? Vat kind uf sufficiently advanced technology iz dis!?
ac?a!?a ?a .a ?a ?a ?                                 -- Genghis Ht'rok'din
a ?a 3aGBP?a ?a ?a ?a ? 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-30  0:56       ` Jerome Glisse
                         ` (3 preceding siblings ...)
  (?)
@ 2017-08-30 14:57       ` Adam Borowski
  -1 siblings, 0 replies; 36+ messages in thread
From: Adam Borowski @ 2017-08-30 14:57 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Bernhard Held, KVM list, Radim Krčmář,
	Sudeep Dutt, DRI, linux-mm, Andrea Arcangeli, Dimitri Sivanich,
	linux-rdma, amd-gfx, xen-devel, Joerg Roedel, Jack Steiner,
	Dan Williams, Linus Torvalds, Linux Kernel Mailing List,
	Ashutosh Dixit, open list:AMD IOMMU (AMD-VI),
	Paolo Bonzini, Andrew Morton, ppc-dev

On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> I will wait for people to test and for result of my own test before
> reposting if need be, otherwise i will post as separate patch.
>
> > But from a _very_ quick read-through this looks fine. But it obviously
> > needs testing.
> > 
> > People - *especially* the people who saw issues under KVM - can you
> > try out Jérôme's patch-series? I aded some people to the cc, the full
> > series is on lkml. Jérôme - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Tested your branch as of 10f07641, on a long list of guest VMs.
No earth-shattering kaboom.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
⠈⠳⣄⠀⠀⠀⠀ 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-30  0:56       ` Jerome Glisse
  (?)
  (?)
@ 2017-08-30  8:40         ` Mike Galbraith
  -1 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-08-30  8:40 UTC (permalink / raw)
  To: Jerome Glisse, Linus Torvalds
  Cc: Bernhard Held, Adam Borowski, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel

On Tue, 2017-08-29 at 20:56 -0400, Jerome Glisse wrote:
> On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> 
> > People - *especially* the people who saw issues under KVM - can you
> > try out Jérôme's patch-series? I aded some people to the cc, the full
> > series is on lkml. Jérôme - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Looks good here.

I reproduced fairly quickly with RT host and 1 RT guest by just having
the guest do a parallel kbuild over NFS (the guest had to be restored
afterward, was corrupted).  I'm currently flogging 2 guests as well as
the host, whimper free.  I'll let the lot broil for while longer, but
at this point, smoke/flame appearance seems comfortingly unlikely.

	-Mike


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30  8:40         ` Mike Galbraith
  0 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-08-30  8:40 UTC (permalink / raw)
  To: Jerome Glisse, Linus Torvalds
  Cc: Bernhard Held, Adam Borowski, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, 2017-08-29 at 20:56 -0400, Jerome Glisse wrote:
> On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> 
> > People - *especially* the people who saw issues under KVM - can you
> > try out Jérôme's patch-series? I aded some people to the cc, the full
> > series is on lkml. Jérôme - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Looks good here.

I reproduced fairly quickly with RT host and 1 RT guest by just having
the guest do a parallel kbuild over NFS (the guest had to be restored
afterward, was corrupted).  I'm currently flogging 2 guests as well as
the host, whimper free.  I'll let the lot broil for while longer, but
at this point, smoke/flame appearance seems comfortingly unlikely.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30  8:40         ` Mike Galbraith
  0 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-08-30  8:40 UTC (permalink / raw)
  To: Jerome Glisse, Linus Torvalds
  Cc: Bernhard Held, Adam Borowski, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, 2017-08-29 at 20:56 -0400, Jerome Glisse wrote:
> On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> 
> > People - *especially* the people who saw issues under KVM - can you
> > try out Jérôme's patch-series? I aded some people to the cc, the full
> > series is on lkml. Jérôme - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Looks good here.

I reproduced fairly quickly with RT host and 1 RT guest by just having
the guest do a parallel kbuild over NFS (the guest had to be restored
afterward, was corrupted).  I'm currently flogging 2 guests as well as
the host, whimper free.  I'll let the lot broil for while longer, but
at this point, smoke/flame appearance seems comfortingly unlikely.

	-Mike


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30  8:40         ` Mike Galbraith
  0 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-08-30  8:40 UTC (permalink / raw)
  To: Jerome Glisse, Linus Torvalds
  Cc: Bernhard Held, Adam Borowski, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, 2017-08-29 at 20:56 -0400, Jerome Glisse wrote:
> On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
>=20
> > People - *especially* the people who saw issues under KVM - can you
> > try out J=C3=A9r=C3=B4me's patch-series? I aded some people to the cc, =
the full
> > series is on lkml. J=C3=A9r=C3=B4me - do you have a git branch for peop=
le to
> > test that they could easily pull and try out?
>=20
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Looks good here.

I reproduced fairly quickly with RT host and 1 RT guest by just having
the guest do a parallel kbuild over NFS (the guest had to be restored
afterward, was corrupted). =C2=A0I'm currently flogging 2 guests as well as
the host, whimper free. =C2=A0I'll let the lot broil for while longer, but
at this point, smoke/flame appearance seems comfortingly unlikely.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-30  0:56       ` Jerome Glisse
  (?)
  (?)
@ 2017-08-30  8:40       ` Mike Galbraith
  -1 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-08-30  8:40 UTC (permalink / raw)
  To: Jerome Glisse, Linus Torvalds
  Cc: Bernhard Held, KVM list, Radim Krčmář,
	Sudeep Dutt, DRI, linux-mm, Andrea Arcangeli, Dimitri Sivanich,
	linux-rdma, amd-gfx, xen-devel, Adam Borowski, Joerg Roedel,
	Jack Steiner, Dan Williams, Linux Kernel Mailing List,
	Ashutosh Dixit, open list:AMD IOMMU (AMD-VI),
	Paolo Bonzini, Andrew Morton, ppc-dev

On Tue, 2017-08-29 at 20:56 -0400, Jerome Glisse wrote:
> On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> 
> > People - *especially* the people who saw issues under KVM - can you
> > try out Jérôme's patch-series? I aded some people to the cc, the full
> > series is on lkml. Jérôme - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Looks good here.

I reproduced fairly quickly with RT host and 1 RT guest by just having
the guest do a parallel kbuild over NFS (the guest had to be restored
afterward, was corrupted).  I'm currently flogging 2 guests as well as
the host, whimper free.  I'll let the lot broil for while longer, but
at this point, smoke/flame appearance seems comfortingly unlikely.

	-Mike



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-30  0:11     ` Linus Torvalds
  (?)
@ 2017-08-30  0:56       ` Jerome Glisse
  -1 siblings, 0 replies; 36+ messages in thread
From: Jerome Glisse @ 2017-08-30  0:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bernhard Held, KVM list, Radim Krčmář,
	Sudeep Dutt, DRI, linux-mm, Andrea Arcangeli, Dimitri Sivanich,
	linux-rdma, amd-gfx, xen-devel, Adam Borowski, Joerg Roedel,
	Jack Steiner, Dan Williams, Linux Kernel Mailing List,
	Ashutosh Dixit, open list:AMD IOMMU (AMD-VI),
	Paolo Bonzini, Andrew Morton, ppc-dev

On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
> >
> > Note this is barely tested. I intend to do more testing of next few days
> > but i do not have access to all hardware that make use of the mmu_notifier
> > API.
> 
> Thanks for doing this.
> 
> > First 2 patches convert existing call of mmu_notifier_invalidate_page()
> > to mmu_notifier_invalidate_range() and bracket those call with call to
> > mmu_notifier_invalidate_range_start()/end().
> 
> Ok, those two patches are a bit more complex than I was hoping for,
> but not *too* bad.
> 
> And the final end result certainly looks nice:
> 
> >  16 files changed, 74 insertions(+), 214 deletions(-)
> 
> Yeah, removing all those invalidate_page() notifiers certainly makes
> for a nice patch.
> 
> And I actually think you missed some more lines that can now be
> removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
> needed either, so you can remove all of those too (most of them are
> empty inline functions, but x86 has one that actually does something.
> 
> So there's an added 30 or so dead lines that should be removed in the
> kvm patch, I think.

Yes i missed that. I will wait for people to test and for result of my
own test before reposting if need be, otherwise i will post as separate
patch.

> 
> But from a _very_ quick read-through this looks fine. But it obviously
> needs testing.
> 
> People - *especially* the people who saw issues under KVM - can you
> try out Jérôme's patch-series? I aded some people to the cc, the full
> series is on lkml. Jérôme - do you have a git branch for people to
> test that they could easily pull and try out?

https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
git://people.freedesktop.org/~glisse/linux

(Sorry if that tree is bit big it has a lot of dead thing i need
 to push a clean and slim one)

Jérôme
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30  0:56       ` Jerome Glisse
  0 siblings, 0 replies; 36+ messages in thread
From: Jerome Glisse @ 2017-08-30  0:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bernhard Held, Adam Borowski, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
> >
> > Note this is barely tested. I intend to do more testing of next few days
> > but i do not have access to all hardware that make use of the mmu_notifier
> > API.
> 
> Thanks for doing this.
> 
> > First 2 patches convert existing call of mmu_notifier_invalidate_page()
> > to mmu_notifier_invalidate_range() and bracket those call with call to
> > mmu_notifier_invalidate_range_start()/end().
> 
> Ok, those two patches are a bit more complex than I was hoping for,
> but not *too* bad.
> 
> And the final end result certainly looks nice:
> 
> >  16 files changed, 74 insertions(+), 214 deletions(-)
> 
> Yeah, removing all those invalidate_page() notifiers certainly makes
> for a nice patch.
> 
> And I actually think you missed some more lines that can now be
> removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
> needed either, so you can remove all of those too (most of them are
> empty inline functions, but x86 has one that actually does something.
> 
> So there's an added 30 or so dead lines that should be removed in the
> kvm patch, I think.

Yes i missed that. I will wait for people to test and for result of my
own test before reposting if need be, otherwise i will post as separate
patch.

> 
> But from a _very_ quick read-through this looks fine. But it obviously
> needs testing.
> 
> People - *especially* the people who saw issues under KVM - can you
> try out Jérôme's patch-series? I aded some people to the cc, the full
> series is on lkml. Jérôme - do you have a git branch for people to
> test that they could easily pull and try out?

https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
git://people.freedesktop.org/~glisse/linux

(Sorry if that tree is bit big it has a lot of dead thing i need
 to push a clean and slim one)

Jérôme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30  0:56       ` Jerome Glisse
  0 siblings, 0 replies; 36+ messages in thread
From: Jerome Glisse @ 2017-08-30  0:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bernhard Held, Adam Borowski, Linux Kernel Mailing List,
	linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli,
	Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit,
	Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
	Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 4:54 PM, Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > Note this is barely tested. I intend to do more testing of next few days
> > but i do not have access to all hardware that make use of the mmu_notifier
> > API.
> 
> Thanks for doing this.
> 
> > First 2 patches convert existing call of mmu_notifier_invalidate_page()
> > to mmu_notifier_invalidate_range() and bracket those call with call to
> > mmu_notifier_invalidate_range_start()/end().
> 
> Ok, those two patches are a bit more complex than I was hoping for,
> but not *too* bad.
> 
> And the final end result certainly looks nice:
> 
> >  16 files changed, 74 insertions(+), 214 deletions(-)
> 
> Yeah, removing all those invalidate_page() notifiers certainly makes
> for a nice patch.
> 
> And I actually think you missed some more lines that can now be
> removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
> needed either, so you can remove all of those too (most of them are
> empty inline functions, but x86 has one that actually does something.
> 
> So there's an added 30 or so dead lines that should be removed in the
> kvm patch, I think.

Yes i missed that. I will wait for people to test and for result of my
own test before reposting if need be, otherwise i will post as separate
patch.

> 
> But from a _very_ quick read-through this looks fine. But it obviously
> needs testing.
> 
> People - *especially* the people who saw issues under KVM - can you
> try out Jerome's patch-series? I aded some people to the cc, the full
> series is on lkml. Jerome - do you have a git branch for people to
> test that they could easily pull and try out?

https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
git://people.freedesktop.org/~glisse/linux

(Sorry if that tree is bit big it has a lot of dead thing i need
 to push a clean and slim one)

Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-30  0:11     ` Linus Torvalds
                       ` (2 preceding siblings ...)
  (?)
@ 2017-08-30  0:56     ` Jerome Glisse
  -1 siblings, 0 replies; 36+ messages in thread
From: Jerome Glisse @ 2017-08-30  0:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bernhard Held, KVM list, Radim Krčmář,
	Sudeep Dutt, DRI, linux-mm, Andrea Arcangeli, Dimitri Sivanich,
	linux-rdma, amd-gfx, xen-devel, Adam Borowski, Joerg Roedel,
	Jack Steiner, Dan Williams, Linux Kernel Mailing List,
	Ashutosh Dixit, open list:AMD IOMMU (AMD-VI),
	Paolo Bonzini, Andrew Morton, ppc-dev

On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
> >
> > Note this is barely tested. I intend to do more testing of next few days
> > but i do not have access to all hardware that make use of the mmu_notifier
> > API.
> 
> Thanks for doing this.
> 
> > First 2 patches convert existing call of mmu_notifier_invalidate_page()
> > to mmu_notifier_invalidate_range() and bracket those call with call to
> > mmu_notifier_invalidate_range_start()/end().
> 
> Ok, those two patches are a bit more complex than I was hoping for,
> but not *too* bad.
> 
> And the final end result certainly looks nice:
> 
> >  16 files changed, 74 insertions(+), 214 deletions(-)
> 
> Yeah, removing all those invalidate_page() notifiers certainly makes
> for a nice patch.
> 
> And I actually think you missed some more lines that can now be
> removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
> needed either, so you can remove all of those too (most of them are
> empty inline functions, but x86 has one that actually does something.
> 
> So there's an added 30 or so dead lines that should be removed in the
> kvm patch, I think.

Yes i missed that. I will wait for people to test and for result of my
own test before reposting if need be, otherwise i will post as separate
patch.

> 
> But from a _very_ quick read-through this looks fine. But it obviously
> needs testing.
> 
> People - *especially* the people who saw issues under KVM - can you
> try out Jérôme's patch-series? I aded some people to the cc, the full
> series is on lkml. Jérôme - do you have a git branch for people to
> test that they could easily pull and try out?

https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
git://people.freedesktop.org/~glisse/linux

(Sorry if that tree is bit big it has a lot of dead thing i need
 to push a clean and slim one)

Jérôme

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-29 23:54 ` Jérôme Glisse
  (?)
  (?)
@ 2017-08-30  0:11     ` Linus Torvalds
  -1 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-08-30  0:11 UTC (permalink / raw)
  To: Jérôme Glisse, Bernhard Held, Adam Borowski
  Cc: Andrea Arcangeli, Joerg Roedel, KVM list,
	Radim Krčmář,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Steiner,
	Linux Kernel Mailing List, DRI, Sudeep Dutt, Ashutosh Dixit,
	linux-mm, open list:AMD IOMMU (AMD-VI),
	Dimitri Sivanich, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	xen-devel, Paolo Bonzini, Andrew Morton, ppc-dev, Dan Williams,
	Kirill A . Shutemov

On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
>
> Note this is barely tested. I intend to do more testing of next few days
> but i do not have access to all hardware that make use of the mmu_notifier
> API.

Thanks for doing this.

> First 2 patches convert existing call of mmu_notifier_invalidate_page()
> to mmu_notifier_invalidate_range() and bracket those call with call to
> mmu_notifier_invalidate_range_start()/end().

Ok, those two patches are a bit more complex than I was hoping for,
but not *too* bad.

And the final end result certainly looks nice:

>  16 files changed, 74 insertions(+), 214 deletions(-)

Yeah, removing all those invalidate_page() notifiers certainly makes
for a nice patch.

And I actually think you missed some more lines that can now be
removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
needed either, so you can remove all of those too (most of them are
empty inline functions, but x86 has one that actually does something.

So there's an added 30 or so dead lines that should be removed in the
kvm patch, I think.

But from a _very_ quick read-through this looks fine. But it obviously
needs testing.

People - *especially* the people who saw issues under KVM - can you
try out Jérôme's patch-series? I aded some people to the cc, the full
series is on lkml. Jérôme - do you have a git branch for people to
test that they could easily pull and try out?

                    Linus
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30  0:11     ` Linus Torvalds
  0 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-08-30  0:11 UTC (permalink / raw)
  To: Jérôme Glisse, Bernhard Held, Adam Borowski
  Cc: Linux Kernel Mailing List, linux-mm, Kirill A . Shutemov,
	Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams,
	Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner,
	Paolo Bonzini, Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
>
> Note this is barely tested. I intend to do more testing of next few days
> but i do not have access to all hardware that make use of the mmu_notifier
> API.

Thanks for doing this.

> First 2 patches convert existing call of mmu_notifier_invalidate_page()
> to mmu_notifier_invalidate_range() and bracket those call with call to
> mmu_notifier_invalidate_range_start()/end().

Ok, those two patches are a bit more complex than I was hoping for,
but not *too* bad.

And the final end result certainly looks nice:

>  16 files changed, 74 insertions(+), 214 deletions(-)

Yeah, removing all those invalidate_page() notifiers certainly makes
for a nice patch.

And I actually think you missed some more lines that can now be
removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
needed either, so you can remove all of those too (most of them are
empty inline functions, but x86 has one that actually does something.

So there's an added 30 or so dead lines that should be removed in the
kvm patch, I think.

But from a _very_ quick read-through this looks fine. But it obviously
needs testing.

People - *especially* the people who saw issues under KVM - can you
try out Jérôme's patch-series? I aded some people to the cc, the full
series is on lkml. Jérôme - do you have a git branch for people to
test that they could easily pull and try out?

                    Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30  0:11     ` Linus Torvalds
  0 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-08-30  0:11 UTC (permalink / raw)
  To: Jérôme Glisse, Bernhard Held, Adam Borowski
  Cc: Linux Kernel Mailing List, linux-mm, Kirill A . Shutemov,
	Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams,
	Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner,
	Paolo Bonzini, Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
>
> Note this is barely tested. I intend to do more testing of next few days
> but i do not have access to all hardware that make use of the mmu_notifier
> API.

Thanks for doing this.

> First 2 patches convert existing call of mmu_notifier_invalidate_page()
> to mmu_notifier_invalidate_range() and bracket those call with call to
> mmu_notifier_invalidate_range_start()/end().

Ok, those two patches are a bit more complex than I was hoping for,
but not *too* bad.

And the final end result certainly looks nice:

>  16 files changed, 74 insertions(+), 214 deletions(-)

Yeah, removing all those invalidate_page() notifiers certainly makes
for a nice patch.

And I actually think you missed some more lines that can now be
removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
needed either, so you can remove all of those too (most of them are
empty inline functions, but x86 has one that actually does something.

So there's an added 30 or so dead lines that should be removed in the
kvm patch, I think.

But from a _very_ quick read-through this looks fine. But it obviously
needs testing.

People - *especially* the people who saw issues under KVM - can you
try out Jérôme's patch-series? I aded some people to the cc, the full
series is on lkml. Jérôme - do you have a git branch for people to
test that they could easily pull and try out?

                    Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-30  0:11     ` Linus Torvalds
  0 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-08-30  0:11 UTC (permalink / raw)
  To: Jérôme Glisse, Bernhard Held, Adam Borowski
  Cc: Linux Kernel Mailing List, linux-mm, Kirill A . Shutemov,
	Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams,
	Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner,
	Paolo Bonzini, Radim Krčmář,
	ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI),
	xen-devel, KVM list

On Tue, Aug 29, 2017 at 4:54 PM, J=C3=A9r=C3=B4me Glisse <jglisse@redhat.co=
m> wrote:
>
> Note this is barely tested. I intend to do more testing of next few days
> but i do not have access to all hardware that make use of the mmu_notifie=
r
> API.

Thanks for doing this.

> First 2 patches convert existing call of mmu_notifier_invalidate_page()
> to mmu_notifier_invalidate_range() and bracket those call with call to
> mmu_notifier_invalidate_range_start()/end().

Ok, those two patches are a bit more complex than I was hoping for,
but not *too* bad.

And the final end result certainly looks nice:

>  16 files changed, 74 insertions(+), 214 deletions(-)

Yeah, removing all those invalidate_page() notifiers certainly makes
for a nice patch.

And I actually think you missed some more lines that can now be
removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
needed either, so you can remove all of those too (most of them are
empty inline functions, but x86 has one that actually does something.

So there's an added 30 or so dead lines that should be removed in the
kvm patch, I think.

But from a _very_ quick read-through this looks fine. But it obviously
needs testing.

People - *especially* the people who saw issues under KVM - can you
try out J=C3=A9r=C3=B4me's patch-series? I aded some people to the cc, the =
full
series is on lkml. J=C3=A9r=C3=B4me - do you have a git branch for people t=
o
test that they could easily pull and try out?

                    Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback
  2017-08-29 23:54 ` Jérôme Glisse
                   ` (2 preceding siblings ...)
  (?)
@ 2017-08-30  0:11 ` Linus Torvalds
  -1 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-08-30  0:11 UTC (permalink / raw)
  To: Jérôme Glisse, Bernhard Held, Adam Borowski
  Cc: Andrea Arcangeli, Joerg Roedel, KVM list,
	Radim Krčmář,
	linux-rdma, Jack Steiner, Linux Kernel Mailing List, DRI,
	Sudeep Dutt, Ashutosh Dixit, linux-mm,
	open list:AMD IOMMU (AMD-VI),
	Dimitri Sivanich, amd-gfx, xen-devel, Paolo Bonzini,
	Andrew Morton, ppc-dev, Dan Williams, Kirill A . Shutemov

On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
>
> Note this is barely tested. I intend to do more testing of next few days
> but i do not have access to all hardware that make use of the mmu_notifier
> API.

Thanks for doing this.

> First 2 patches convert existing call of mmu_notifier_invalidate_page()
> to mmu_notifier_invalidate_range() and bracket those call with call to
> mmu_notifier_invalidate_range_start()/end().

Ok, those two patches are a bit more complex than I was hoping for,
but not *too* bad.

And the final end result certainly looks nice:

>  16 files changed, 74 insertions(+), 214 deletions(-)

Yeah, removing all those invalidate_page() notifiers certainly makes
for a nice patch.

And I actually think you missed some more lines that can now be
removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
needed either, so you can remove all of those too (most of them are
empty inline functions, but x86 has one that actually does something.

So there's an added 30 or so dead lines that should be removed in the
kvm patch, I think.

But from a _very_ quick read-through this looks fine. But it obviously
needs testing.

People - *especially* the people who saw issues under KVM - can you
try out Jérôme's patch-series? I aded some people to the cc, the full
series is on lkml. Jérôme - do you have a git branch for people to
test that they could easily pull and try out?

                    Linus

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-29 23:54 ` Jérôme Glisse
  0 siblings, 0 replies; 36+ messages in thread
From: Jérôme Glisse @ 2017-08-29 23:54 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: Jérôme Glisse, Kirill A . Shutemov, Linus Torvalds,
	Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams,
	Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner,
	Paolo Bonzini, Radim Krčmář,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b, kvm

(Sorry for so many list cross-posting and big cc)

Please help testing !

The invalidate_page callback suffered from 2 pitfalls. First it used to
happen after page table lock was release and thus a new page might have
been setup for the virtual address before the call to invalidate_page().

This is in a weird way fixed by c7ab0d2fdc840266b39db94538f74207ec2afbf6
which moved the callback under the page table lock. Which also broke
several existing user of the mmu_notifier API that assumed they could
sleep inside this callback.

The second pitfall was invalidate_page being the only callback not taking
a range of address in respect to invalidation but was giving an address
and a page. Lot of the callback implementer assumed this could never be
THP and thus failed to invalidate the appropriate range for THP pages.

By killing this callback we unify the mmu_notifier callback API to always
take a virtual address range as input.

There is now 2 clear API (I am not mentioning the youngess API which is
seldomly used):
  - invalidate_range_start()/end() callback (which allow you to sleep)
  - invalidate_range() where you can not sleep but happen right after
    page table update under page table lock


Note that a lot of existing user feels broken in respect to range_start/
range_end. Many user only have range_start() callback but there is nothing
preventing them to undo what was invalidated in their range_start() callback
after it returns but before any CPU page table update take place.

The code pattern use in kvm or umem odp is an example on how to properly
avoid such race. In a nutshell use some kind of sequence number and active
range invalidation counter to block anything that might undo what the
range_start() callback did.

If you do not care about keeping fully in sync with CPU page table (ie
you can live with CPU page table pointing to new different page for a
given virtual address) then you can take a reference on the pages inside
the range_start callback and drop it in range_end or when your driver
is done with those pages.

Last alternative is to use invalidate_range() if you can do invalidation
without sleeping as invalidate_range() callback happens under the CPU
page table spinlock right after the page table is updated.


Note this is barely tested. I intend to do more testing of next few days
but i do not have access to all hardware that make use of the mmu_notifier
API.


First 2 patches convert existing call of mmu_notifier_invalidate_page()
to mmu_notifier_invalidate_range() and bracket those call with call to
mmu_notifier_invalidate_range_start()/end().

The next 10 patches remove existing invalidate_page() callback as it can
no longer happen.

Finaly the last page remove it completely so it can RIP.

Jérôme Glisse (13):
  dax: update to new mmu_notifier semantic
  mm/rmap: update to new mmu_notifier semantic
  powerpc/powernv: update to new mmu_notifier semantic
  drm/amdgpu: update to new mmu_notifier semantic
  IB/umem: update to new mmu_notifier semantic
  IB/hfi1: update to new mmu_notifier semantic
  iommu/amd: update to new mmu_notifier semantic
  iommu/intel: update to new mmu_notifier semantic
  misc/mic/scif: update to new mmu_notifier semantic
  sgi-gru: update to new mmu_notifier semantic
  xen/gntdev: update to new mmu_notifier semantic
  KVM: update to new mmu_notifier semantic
  mm/mmu_notifier: kill invalidate_page

Cc: Kirill A. Shutemov <kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Cc: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
Cc: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Sudeep Dutt <sudeep.dutt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Ashutosh Dixit <ashutosh.dixit-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Dimitri Sivanich <sivanich-sJ/iWh9BUns@public.gmane.org>
Cc: Jack Steiner <steiner-sJ/iWh9BUns@public.gmane.org>
Cc: Paolo Bonzini <pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Radim Krčmář <rkrcmar-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Cc: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Cc: xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b@public.gmane.org
Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org


 arch/powerpc/platforms/powernv/npu-dma.c | 10 --------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c   | 31 ----------------------
 drivers/infiniband/core/umem_odp.c       | 19 --------------
 drivers/infiniband/hw/hfi1/mmu_rb.c      |  9 -------
 drivers/iommu/amd_iommu_v2.c             |  8 ------
 drivers/iommu/intel-svm.c                |  9 -------
 drivers/misc/mic/scif/scif_dma.c         | 11 --------
 drivers/misc/sgi-gru/grutlbpurge.c       | 12 ---------
 drivers/xen/gntdev.c                     |  8 ------
 fs/dax.c                                 | 19 ++++++++------
 include/linux/mm.h                       |  1 +
 include/linux/mmu_notifier.h             | 25 ------------------
 mm/memory.c                              | 26 +++++++++++++++----
 mm/mmu_notifier.c                        | 14 ----------
 mm/rmap.c                                | 44 +++++++++++++++++++++++++++++---
 virt/kvm/kvm_main.c                      | 42 ------------------------------
 16 files changed, 74 insertions(+), 214 deletions(-)

-- 
2.13.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-29 23:54 ` Jérôme Glisse
  0 siblings, 0 replies; 36+ messages in thread
From: Jérôme Glisse @ 2017-08-29 23:54 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jérôme Glisse, Kirill A . Shutemov, Linus Torvalds,
	Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams,
	Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner,
	Paolo Bonzini, Radim Krčmář,
	linuxppc-dev, dri-devel, amd-gfx, linux-rdma, iommu, xen-devel,
	kvm

(Sorry for so many list cross-posting and big cc)

Please help testing !

The invalidate_page callback suffered from 2 pitfalls. First it used to
happen after page table lock was release and thus a new page might have
been setup for the virtual address before the call to invalidate_page().

This is in a weird way fixed by c7ab0d2fdc840266b39db94538f74207ec2afbf6
which moved the callback under the page table lock. Which also broke
several existing user of the mmu_notifier API that assumed they could
sleep inside this callback.

The second pitfall was invalidate_page being the only callback not taking
a range of address in respect to invalidation but was giving an address
and a page. Lot of the callback implementer assumed this could never be
THP and thus failed to invalidate the appropriate range for THP pages.

By killing this callback we unify the mmu_notifier callback API to always
take a virtual address range as input.

There is now 2 clear API (I am not mentioning the youngess API which is
seldomly used):
  - invalidate_range_start()/end() callback (which allow you to sleep)
  - invalidate_range() where you can not sleep but happen right after
    page table update under page table lock


Note that a lot of existing user feels broken in respect to range_start/
range_end. Many user only have range_start() callback but there is nothing
preventing them to undo what was invalidated in their range_start() callback
after it returns but before any CPU page table update take place.

The code pattern use in kvm or umem odp is an example on how to properly
avoid such race. In a nutshell use some kind of sequence number and active
range invalidation counter to block anything that might undo what the
range_start() callback did.

If you do not care about keeping fully in sync with CPU page table (ie
you can live with CPU page table pointing to new different page for a
given virtual address) then you can take a reference on the pages inside
the range_start callback and drop it in range_end or when your driver
is done with those pages.

Last alternative is to use invalidate_range() if you can do invalidation
without sleeping as invalidate_range() callback happens under the CPU
page table spinlock right after the page table is updated.


Note this is barely tested. I intend to do more testing of next few days
but i do not have access to all hardware that make use of the mmu_notifier
API.


First 2 patches convert existing call of mmu_notifier_invalidate_page()
to mmu_notifier_invalidate_range() and bracket those call with call to
mmu_notifier_invalidate_range_start()/end().

The next 10 patches remove existing invalidate_page() callback as it can
no longer happen.

Finaly the last page remove it completely so it can RIP.

Jérôme Glisse (13):
  dax: update to new mmu_notifier semantic
  mm/rmap: update to new mmu_notifier semantic
  powerpc/powernv: update to new mmu_notifier semantic
  drm/amdgpu: update to new mmu_notifier semantic
  IB/umem: update to new mmu_notifier semantic
  IB/hfi1: update to new mmu_notifier semantic
  iommu/amd: update to new mmu_notifier semantic
  iommu/intel: update to new mmu_notifier semantic
  misc/mic/scif: update to new mmu_notifier semantic
  sgi-gru: update to new mmu_notifier semantic
  xen/gntdev: update to new mmu_notifier semantic
  KVM: update to new mmu_notifier semantic
  mm/mmu_notifier: kill invalidate_page

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>

Cc: linuxppc-dev@lists.ozlabs.org
Cc: dri-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Cc: kvm@vger.kernel.org


 arch/powerpc/platforms/powernv/npu-dma.c | 10 --------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c   | 31 ----------------------
 drivers/infiniband/core/umem_odp.c       | 19 --------------
 drivers/infiniband/hw/hfi1/mmu_rb.c      |  9 -------
 drivers/iommu/amd_iommu_v2.c             |  8 ------
 drivers/iommu/intel-svm.c                |  9 -------
 drivers/misc/mic/scif/scif_dma.c         | 11 --------
 drivers/misc/sgi-gru/grutlbpurge.c       | 12 ---------
 drivers/xen/gntdev.c                     |  8 ------
 fs/dax.c                                 | 19 ++++++++------
 include/linux/mm.h                       |  1 +
 include/linux/mmu_notifier.h             | 25 ------------------
 mm/memory.c                              | 26 +++++++++++++++----
 mm/mmu_notifier.c                        | 14 ----------
 mm/rmap.c                                | 44 +++++++++++++++++++++++++++++---
 virt/kvm/kvm_main.c                      | 42 ------------------------------
 16 files changed, 74 insertions(+), 214 deletions(-)

-- 
2.13.5

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-29 23:54 ` Jérôme Glisse
  0 siblings, 0 replies; 36+ messages in thread
From: Jérôme Glisse @ 2017-08-29 23:54 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jérôme Glisse, Kirill A . Shutemov, Linus Torvalds,
	Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams,
	Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner,
	Paolo Bonzini, Radim Krčmář,
	linuxppc-dev, dri-devel, amd-gfx, linux-rdma, iommu, xen-devel,
	kvm

(Sorry for so many list cross-posting and big cc)

Please help testing !

The invalidate_page callback suffered from 2 pitfalls. First it used to
happen after page table lock was release and thus a new page might have
been setup for the virtual address before the call to invalidate_page().

This is in a weird way fixed by c7ab0d2fdc840266b39db94538f74207ec2afbf6
which moved the callback under the page table lock. Which also broke
several existing user of the mmu_notifier API that assumed they could
sleep inside this callback.

The second pitfall was invalidate_page being the only callback not taking
a range of address in respect to invalidation but was giving an address
and a page. Lot of the callback implementer assumed this could never be
THP and thus failed to invalidate the appropriate range for THP pages.

By killing this callback we unify the mmu_notifier callback API to always
take a virtual address range as input.

There is now 2 clear API (I am not mentioning the youngess API which is
seldomly used):
  - invalidate_range_start()/end() callback (which allow you to sleep)
  - invalidate_range() where you can not sleep but happen right after
    page table update under page table lock


Note that a lot of existing user feels broken in respect to range_start/
range_end. Many user only have range_start() callback but there is nothing
preventing them to undo what was invalidated in their range_start() callback
after it returns but before any CPU page table update take place.

The code pattern use in kvm or umem odp is an example on how to properly
avoid such race. In a nutshell use some kind of sequence number and active
range invalidation counter to block anything that might undo what the
range_start() callback did.

If you do not care about keeping fully in sync with CPU page table (ie
you can live with CPU page table pointing to new different page for a
given virtual address) then you can take a reference on the pages inside
the range_start callback and drop it in range_end or when your driver
is done with those pages.

Last alternative is to use invalidate_range() if you can do invalidation
without sleeping as invalidate_range() callback happens under the CPU
page table spinlock right after the page table is updated.


Note this is barely tested. I intend to do more testing of next few days
but i do not have access to all hardware that make use of the mmu_notifier
API.


First 2 patches convert existing call of mmu_notifier_invalidate_page()
to mmu_notifier_invalidate_range() and bracket those call with call to
mmu_notifier_invalidate_range_start()/end().

The next 10 patches remove existing invalidate_page() callback as it can
no longer happen.

Finaly the last page remove it completely so it can RIP.

JA(C)rA'me Glisse (13):
  dax: update to new mmu_notifier semantic
  mm/rmap: update to new mmu_notifier semantic
  powerpc/powernv: update to new mmu_notifier semantic
  drm/amdgpu: update to new mmu_notifier semantic
  IB/umem: update to new mmu_notifier semantic
  IB/hfi1: update to new mmu_notifier semantic
  iommu/amd: update to new mmu_notifier semantic
  iommu/intel: update to new mmu_notifier semantic
  misc/mic/scif: update to new mmu_notifier semantic
  sgi-gru: update to new mmu_notifier semantic
  xen/gntdev: update to new mmu_notifier semantic
  KVM: update to new mmu_notifier semantic
  mm/mmu_notifier: kill invalidate_page

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrA?mA!A? <rkrcmar@redhat.com>

Cc: linuxppc-dev@lists.ozlabs.org
Cc: dri-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Cc: kvm@vger.kernel.org


 arch/powerpc/platforms/powernv/npu-dma.c | 10 --------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c   | 31 ----------------------
 drivers/infiniband/core/umem_odp.c       | 19 --------------
 drivers/infiniband/hw/hfi1/mmu_rb.c      |  9 -------
 drivers/iommu/amd_iommu_v2.c             |  8 ------
 drivers/iommu/intel-svm.c                |  9 -------
 drivers/misc/mic/scif/scif_dma.c         | 11 --------
 drivers/misc/sgi-gru/grutlbpurge.c       | 12 ---------
 drivers/xen/gntdev.c                     |  8 ------
 fs/dax.c                                 | 19 ++++++++------
 include/linux/mm.h                       |  1 +
 include/linux/mmu_notifier.h             | 25 ------------------
 mm/memory.c                              | 26 +++++++++++++++----
 mm/mmu_notifier.c                        | 14 ----------
 mm/rmap.c                                | 44 +++++++++++++++++++++++++++++---
 virt/kvm/kvm_main.c                      | 42 ------------------------------
 16 files changed, 74 insertions(+), 214 deletions(-)

-- 
2.13.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-09-01 14:50 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 23:54 [PATCH 00/13] mmu_notifier kill invalidate_page callback Jérôme Glisse
  -- strict thread matches above, loose matches on Subject: below --
2017-08-29 23:54 Jérôme Glisse
2017-08-29 23:54 ` Jérôme Glisse
2017-08-29 23:54 ` Jérôme Glisse
     [not found] ` <20170829235447.10050-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-30  0:11   ` Linus Torvalds
2017-08-30  0:11     ` Linus Torvalds
2017-08-30  0:11     ` Linus Torvalds
2017-08-30  0:11     ` Linus Torvalds
2017-08-30  0:56     ` Jerome Glisse
2017-08-30  0:56     ` Jerome Glisse
2017-08-30  0:56       ` Jerome Glisse
2017-08-30  0:56       ` Jerome Glisse
2017-08-30  8:40       ` Mike Galbraith
2017-08-30  8:40       ` Mike Galbraith
2017-08-30  8:40         ` Mike Galbraith
2017-08-30  8:40         ` Mike Galbraith
2017-08-30  8:40         ` Mike Galbraith
2017-08-30 14:57       ` Adam Borowski
2017-08-30 14:57       ` Adam Borowski
2017-08-30 14:57         ` Adam Borowski
2017-08-30 14:57         ` Adam Borowski
2017-09-01 14:47         ` Jeff Cook
2017-09-01 14:47         ` Jeff Cook
2017-09-01 14:47           ` Jeff Cook
2017-09-01 14:47           ` Jeff Cook
2017-09-01 14:47           ` Jeff Cook
2017-09-01 14:50           ` taskboxtester
2017-09-01 14:50             ` taskboxtester
2017-08-30 21:51   ` Felix Kuehling
2017-08-31 13:59     ` Jerome Glisse
     [not found]       ` <20170831135953.GA9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-31 14:14         ` Christian König
2017-08-31 18:39         ` Felix Kuehling
2017-08-31 19:00           ` Jerome Glisse
     [not found]             ` <20170831190021.GG9227-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-31 23:19               ` Felix Kuehling
2017-08-31 23:29                 ` Jerome Glisse
2017-08-30  0:11 ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.