kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Rosato <mjrosato@linux.ibm.com>
To: Christian Borntraeger <borntraeger@linux.ibm.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Tony Krowiak <akrowiak@linux.ibm.com>,
	"Jason J . Herne" <jjherne@linux.ibm.com>,
	Marc Hartmayer <mhartmay@linux.ibm.com>,
	Eric Farman <farman@linux.ibm.com>,
	Cornelia Huck <cohuck@redhat.com>,
	kvm@vger.kernel.org, Qian Cai <cai@lca.pw>,
	Joerg Roedel <jroedel@suse.de>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	linux-s390 <linux-s390@vger.kernel.org>
Subject: Re: [PATCH v2] vfio: Follow a strict lifetime for struct iommu_group
Date: Wed, 5 Oct 2022 09:46:45 -0400	[thread overview]
Message-ID: <8982bc22-9afa-dde4-9f4e-38948db58789@linux.ibm.com> (raw)
In-Reply-To: <33bc5258-5c95-99ee-a952-5b0b2826da3a@linux.ibm.com>

On 10/4/22 2:22 PM, Matthew Rosato wrote:
> On 10/4/22 1:36 PM, Christian Borntraeger wrote:
>>
>>
>> Am 04.10.22 um 18:28 schrieb Jason Gunthorpe:
>>> On Tue, Oct 04, 2022 at 05:44:53PM +0200, Christian Borntraeger wrote:
>>>
>>>>> Does some userspace have the group FD open when it stucks like this,
>>>>> eg what does fuser say?
>>>>
>>>> /proc/<virtnodedevd>/fd
>>>> 51480 0 dr-x------. 2 root root  0  4. Okt 17:16 .
>>>> 43593 0 dr-xr-xr-x. 9 root root  0  4. Okt 17:16 ..
>>>> 65252 0 lr-x------. 1 root root 64  4. Okt 17:42 0 -> /dev/null
>>>> 65253 0 lrwx------. 1 root root 64  4. Okt 17:42 1 -> 'socket:[51479]'
>>>> 65261 0 lrwx------. 1 root root 64  4. Okt 17:42 10 -> 'anon_inode:[eventfd]'
>>>> 65262 0 lrwx------. 1 root root 64  4. Okt 17:42 11 -> 'socket:[51485]'
>>>> 65263 0 lrwx------. 1 root root 64  4. Okt 17:42 12 -> 'socket:[51487]'
>>>> 65264 0 lrwx------. 1 root root 64  4. Okt 17:42 13 -> 'socket:[51486]'
>>>> 65265 0 lrwx------. 1 root root 64  4. Okt 17:42 14 -> 'anon_inode:[eventfd]'
>>>> 65266 0 lrwx------. 1 root root 64  4. Okt 17:42 15 -> 'socket:[60421]'
>>>> 65267 0 lrwx------. 1 root root 64  4. Okt 17:42 16 -> 'anon_inode:[eventfd]'
>>>> 65268 0 lrwx------. 1 root root 64  4. Okt 17:42 17 -> 'socket:[28008]'
>>>> 65269 0 l-wx------. 1 root root 64  4. Okt 17:42 18 -> /run/libvirt/nodedev/driver.pid
>>>> 65270 0 lrwx------. 1 root root 64  4. Okt 17:42 19 -> 'socket:[28818]'
>>>> 65254 0 lrwx------. 1 root root 64  4. Okt 17:42 2 -> 'socket:[51479]'
>>>> 65271 0 lr-x------. 1 root root 64  4. Okt 17:42 20 -> '/dev/vfio/3 (deleted)'
>>>
>>> Seems like a userspace bug to keep the group FD open after the /dev/
>>> file has been deleted :|
>>>
>>> What do you think about this?
>>>
>>> commit a54a852b1484b1605917a8f4d80691db333b25ed
>>> Author: Jason Gunthorpe <jgg@ziepe.ca>
>>> Date:   Tue Oct 4 13:14:37 2022 -0300
>>>
>>>      vfio: Make the group FD disassociate from the iommu_group
>>>           Allow the vfio_group struct to exist with a NULL iommu_group pointer. When
>>>      the pointer is NULL the vfio_group users promise not to touch the
>>>      iommu_group. This allows a driver to be hot unplugged while userspace is
>>>      keeping the group FD open.
>>>           SPAPR mode is excluded from this behavior because of how it wrongly hacks
>>>      part of its iommu interface through KVM. Due to this we loose control over
>>>      what it is doing and cannot revoke the iommu_group usage in the IOMMU
>>>      layer via vfio_group_detach_container().
>>>           Thus, for SPAPR the group FDs must still be closed before a device can be
>>>      hot unplugged.
>>>           This fixes a userspace regression where we learned that virtnodedevd
>>>      leaves a group FD open even though the /dev/ node for it has been deleted
>>>      and all the drivers for it unplugged.
>>>           Fixes: ca5f21b25749 ("vfio: Follow a strict lifetime for struct iommu_group")
>>>      Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com>
>>>      Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>>
>> Almost :-)
>>
>> drivers/vfio/vfio_main.c: In function 'vfio_file_is_group':
>> drivers/vfio/vfio_main.c:1606:47: error: expected ')' before ';' token
>>  1606 |         return (file->f_op == &vfio_group_fops;
>>       |                ~                              ^
>>       |                                               )
>> drivers/vfio/vfio_main.c:1606:48: error: expected ';' before '}' token
>>  1606 |         return (file->f_op == &vfio_group_fops;
>>       |                                                ^
>>       |                                                ;
>>  1607 | }
>>       | ~
>>
>>
>> With that fixed I get:
>>
>> ERROR: modpost: "vfio_file_is_group" [drivers/vfio/pci/vfio-pci-core.ko] undefined!
>>
>> With that worked around (m -> y)
> 
> 
> Looks like this can be solved with EXPORT_SYMBOL_GPL(vfio_file_is_group);
> 
> Also:
> 
> arch/s390/kvm/../../../virt/kvm/vfio.c:64:28: warning: ‘kvm_vfio_file_iommu_group’ defined but not used [-Wunused-function]
>    64 | static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
>       |                            ^~~~~~~~~~~~~~~~~~~~~~~~~
> 
> kvm_vfio_file_iommu_group looks like it is now SPAPR-only
> 
>>
>>
>> Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
>>
>> At least the vfio-ap part
> 
> Nope, with this s390 vfio-pci at least breaks:
> 
> [  132.943389] kernel BUG at lib/list_debug.c:53!
> [  132.943406] monitor event: 0040 ilc:2 [#1] SMP 
> [  132.943410] Modules linked in: vfio_pci kvm vfio_pci_core irqbypass vfio_virqfd vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defr
> ag_ipv4 ip_set nf_tables nfnetlink sunrpc mlx5_ib ism smc ib_uverbs ib_core uvdevice s390_trng tape_3590 tape tape_class eadm_sch vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha mlx5_core aes_s390 des_s390 libdes sha3_512_s390 nvme sha3_256_s390 sha512_s390 sh
> a256_s390 nvme_core sha1_s390 sha_common zfcp scsi_transport_fc pkey zcrypt rng_core autofs4 [last unloaded: vfio_pci]
> [  132.943457] CPU: 12 PID: 4991 Comm: nose2 Tainted: G        W          6.0.0-rc4 #40
> [  132.943460] Hardware name: IBM 3931 A01 782 (LPAR)
> [  132.943462] Krnl PSW : 0704c00180000000 00000000cbc90568 (__list_del_entry_valid+0xd8/0xf0)
> [  132.943469]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [  132.943474] Krnl GPRS: 8000000000000001 0000000900000027 000000000000004e 00000000ccc1ffe0
> [  132.943477]            00000000fffeffff 00000009fc290000 0000000000000000 0000000000000080
> [  132.943480]            00000000acc86438 0000000000000000 00000000acc86420 00000000a1492800
> [  132.943483]            00000000922a0000 000003ffb9dce260 00000000cbc90564 0000038004a6b9f8
> [  132.943489] Krnl Code: 00000000cbc90558: c0200045eff3        larl    %r2,00000000cc54e53e
> [  132.943489]            00000000cbc9055e: c0e50022c7d9        brasl   %r14,00000000cc0e9510
> [  132.943489]           #00000000cbc90564: af000000            mc      0,0
> [  132.943489]           >00000000cbc90568: b9040032            lgr     %r3,%r2
> [  132.943489]            00000000cbc9056c: c0200045efd4        larl    %r2,00000000cc54e514
> [  132.943489]            00000000cbc90572: c0e50022c7cf        brasl   %r14,00000000cc0e9510
> [  132.943489]            00000000cbc90578: af000000            mc      0,0
> [  132.943489]            00000000cbc9057c: 0707                bcr     0,%r7
> [  132.943510] Call Trace:
> [  132.943512]  [<00000000cbc90568>] __list_del_entry_valid+0xd8/0xf0 
> [  132.943515] ([<00000000cbc90564>] __list_del_entry_valid+0xd4/0xf0)
> [  132.943518]  [<000003ff8011a1b8>] vfio_group_detach_container+0x88/0x170 [vfio] 
> [  132.943524]  [<000003ff801176c0>] vfio_device_remove_group.isra.0+0xb0/0x1e0 [vfio] 
> [  132.943529]  [<000003ff804f9e54>] vfio_pci_core_unregister_device+0x34/0x80 [vfio_pci_core] 
> [  132.943535]  [<000003ff804ae1c4>] vfio_pci_remove+0x2c/0x40 [vfio_pci] 
> [  132.943539]  [<00000000cbd58c3c>] pci_device_remove+0x3c/0x98 
> [  132.943542]  [<00000000cbdbdbce>] device_release_driver_internal+0x1c6/0x288 
> [  132.943545]  [<00000000cbd4e284>] pci_stop_bus_device+0x94/0xc0 
> [  132.943549]  [<00000000cbd4e570>] pci_stop_and_remove_bus_device_locked+0x30/0x48 
> [  132.943552]  [<00000000cb55d980>] zpci_bus_remove_device+0x68/0xa8 
> [  132.943555]  [<00000000cb556e82>] zpci_deconfigure_device+0x3a/0xe0 
> [  132.943558]  [<00000000cbd65d04>] power_write_file+0x7c/0x130 
> [  132.943561]  [<00000000cb8fbc90>] kernfs_fop_write_iter+0x138/0x210 
> [  132.943565]  [<00000000cb837344>] vfs_write+0x194/0x2e0 "
> [  132.943568]  [<00000000cb8376fa>] ksys_write+0x6a/0xf8 
> [  132.943571]  [<00000000cc0f918c>] __do_syscall+0x1d4/0x200 
> [  132.943575]  [<00000000cc107e42>] system_call+0x82/0xb0 
> [  132.943577] Last Breaking-Event-Address:
> [  132.943579]  [<00000000cc0e955c>] _printk+0x4c/0x58
> [  132.943585] Kernel panic - not syncing: Fatal exception: panic_on_oops

(again, with the follow-up applied) Besides the panic above I just noticed there is also this warning that immediately precedes and is perhaps more useful.  Re: what triggers the WARN, both group->owner and group->owner_cnt are already 0:

[  375.262923] WARNING: CPU: 8 PID: 5182 at drivers/iommu/iommu.c:3211 iommu_group_release_dma_owner+0x38/0x90
[  375.262932] Modules linked in: vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb macvtap macvlan tap vfio_pci vfio_pci_core irqbypass vfio_virqfd kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf
_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc mlx5_ib ib_uverbs ism smc ib_core uvdevice s390_trng eadm_sch tape_3590 tape tape_class vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 mlx5_core libdes sha3_512_s390 sha3_256_s390
 nvme sha512_s390 sha256_s390 sha1_s390 sha_common nvme_core zfcp scsi_transport_fc pkey zcrypt rng_core autofs4
[  375.262969] CPU: 8 PID: 5182 Comm: nose2 Not tainted 6.0.0-rc4 #50
[  375.262971] Hardware name: IBM 3931 A01 782 (LPAR)
[  375.262972] Krnl PSW : 0704c00180000000 00000001d1cd8b34 (iommu_group_release_dma_owner+0x3c/0x90)
[  375.262976]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  375.262979] Krnl GPRS: 8000000000000001 0000000000000000 00000000844dd058 00000000ba60c200
[  375.262981]            00000000fffeffff 00000009fc290000 0000000000000000 0000000000000080
[  375.262984]            00000000b1a6d840 00000000b1a6d858 00000000844dd058 00000000844dd000
[  375.262986]            00000000ba60c200 000003ff962ce260 00000001d1cd8b26 0000038003d0f9d8
[  375.262994] Krnl Code: 00000001d1cd8b26: e310b0c00012        lt      %r1,192(%r11)
[  375.262994]            00000001d1cd8b2c: a774000c            brc     7,00000001d1cd8b44
[  375.262994]           #00000001d1cd8b30: af000000            mc      0,0
[  375.262994]           >00000001d1cd8b34: b904002a            lgr     %r2,%r10
[  375.262994]            00000001d1cd8b38: ebaff0a00004        lmg     %r10,%r15,160(%r15)
[  375.262994]            00000001d1cd8b3e: c0f4001aa84d        brcl    15,00000001d202dbd8
[  375.262994]            00000001d1cd8b44: e310b0c80002        ltg     %r1,200(%r11)
[  375.262994]            00000001d1cd8b4a: a784fff3            brc     8,00000001d1cd8b30
[  375.263048] Call Trace:
[  375.263051]  [<00000001d1cd8b34>] iommu_group_release_dma_owner+0x3c/0x90 
[  375.263058]  [<000003ff801431c8>] vfio_group_detach_container+0x98/0x1a0 [vfio] 
[  375.263067]  [<000003ff801406c0>] vfio_device_remove_group.isra.0+0xb0/0x1e0 [vfio] 
[  375.263071]  [<000003ff80540e54>] vfio_pci_core_unregister_device+0x34/0x80 [vfio_pci_core] 
[  375.263079]  [<000003ff804f31c4>] vfio_pci_remove+0x2c/0x40 [vfio_pci] 
[  375.263082]  [<00000001d1c84c3c>] pci_device_remove+0x3c/0x98 
[  375.263085]  [<00000001d1ce9bce>] device_release_driver_internal+0x1c6/0x288 
[  375.263090]  [<00000001d1c7a284>] pci_stop_bus_device+0x94/0xc0 
[  375.263093]  [<00000001d1c7a570>] pci_stop_and_remove_bus_device_locked+0x30/0x48 
[  375.263096]  [<00000001d1489980>] zpci_bus_remove_device+0x68/0xa8 
[  375.263100]  [<00000001d1482e82>] zpci_deconfigure_device+0x3a/0xe0 
[  375.263104]  [<00000001d1c91d04>] power_write_file+0x7c/0x130 
[  375.263108]  [<00000001d1827c90>] kernfs_fop_write_iter+0x138/0x210 
[  375.263114]  [<00000001d1763344>] vfs_write+0x194/0x2e0 
[  375.263119]  [<00000001d17636fa>] ksys_write+0x6a/0xf8 
[  375.263121]  [<00000001d202518c>] __do_syscall+0x1d4/0x200 
[  375.263127]  [<00000001d2033e42>] system_call+0x82/0xb0 
[  375.263132] Last Breaking-Event-Address:
[  375.263132]  [<00000001d202f394>] mutex_lock+0x1c/0x28

  parent reply	other threads:[~2022-10-05 13:47 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-23  0:06 [PATCH v2] vfio: Follow a strict lifetime for struct iommu_group Jason Gunthorpe
2022-09-26 17:03 ` Matthew Rosato
2022-09-27 20:05   ` Alex Williamson
2022-10-04 15:19     ` Christian Borntraeger
2022-10-04 15:40       ` Jason Gunthorpe
2022-10-04 15:44         ` Christian Borntraeger
2022-10-04 16:28           ` Jason Gunthorpe
2022-10-04 17:15             ` Christian Borntraeger
2022-10-04 17:22               ` Jason Gunthorpe
2022-10-04 17:36             ` Christian Borntraeger
2022-10-04 17:48               ` Christian Borntraeger
2022-10-04 18:22               ` Matthew Rosato
2022-10-04 18:56                 ` Eric Farman
2022-10-05 13:46                 ` Matthew Rosato [this message]
2022-10-05 13:57                   ` Jason Gunthorpe
2022-10-05 14:00                     ` Christian Borntraeger
2022-10-05 14:01                     ` Jason Gunthorpe
2022-10-05 14:19                       ` Christian Borntraeger
2022-10-06 11:55                         ` Christian Borntraeger
2022-10-05 14:21                       ` Matthew Rosato
2022-10-05 15:40                         ` Matthew Rosato
2022-10-05 14:01                     ` Matthew Rosato
2022-10-04 19:59   ` Matthew Rosato
2022-10-04 20:19     ` Alex Williamson
2022-10-04 22:30       ` Jason Gunthorpe
2022-09-27  6:34 ` Yi Liu
2022-09-27 13:12   ` Jason Gunthorpe
2022-09-28  3:51 ` Tian, Kevin
2022-09-28 15:26   ` Jason Gunthorpe
2022-09-28 23:54     ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8982bc22-9afa-dde4-9f4e-38948db58789@linux.ibm.com \
    --to=mjrosato@linux.ibm.com \
    --cc=akrowiak@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=cai@lca.pw \
    --cc=cohuck@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=jgg@nvidia.com \
    --cc=jjherne@linux.ibm.com \
    --cc=jroedel@suse.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mhartmay@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).