linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
@ 2020-07-22 11:57 WeitaoWangoc
  2020-07-22 12:44 ` Greg KH
  2020-07-23  3:53 ` Alex Williamson
  0 siblings, 2 replies; 17+ messages in thread
From: WeitaoWangoc @ 2020-07-22 11:57 UTC (permalink / raw)
  To: gregkh, mathias.nyman, ulf.hansson, vkoul, hslester96, linux-usb,
	linux-kernel, Carsten_Schmid, efremov
  Cc: tonywwang, weitaowang, CobeChen, TimGuo, wwt8723

This bug is found in Zhaoxin platform, but it's a commom code bug.
Fail sequence:
step1: Unbind UHCI controller from native driver;
step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one vfio
       group's device list and set UHCI's dev->driver_data to struct vfio-pci(for UHCI)
step3: Unbind EHCI controller from native driver, will try to tell UHCI native driver
       that "I'm removed by set companion_hcd->self.hs_companion to NULL. However,
       companion_hcd get from UHCI's dev->driver_data that has modified by vfio-pci
       already.So, the vfio-pci structure will be damaged!
step4: Bind EHCI controller to vfio-pci driver, which will put EHCI controller in the
       same vfio group as UHCI controller;
     ... ...
step5: Unbind UHCI controller from vfio-pci, which will delete UHCI from vfio group'
       device list that has been damaged in step 3. So,delete operation can random
       result into a NULL pointer dereference with the below stack dump.
step6: Bind UHCI controller to native driver;
step7: Unbind EHCI controller from vfio-pci, which will try to remove EHCI controller
       from the vfio group;
step8: Bind EHCI controller to native driver;

[  929.114641] uhci_hcd 0000:00:10.0: remove, state 1
[  929.114652] usb usb1: USB disconnect, device number 1
[  929.114655] usb 1-1: USB disconnect, device number 2
[  929.270313] usb 1-2: USB disconnect, device number 3
[  929.318404] uhci_hcd 0000:00:10.0: USB bus 1 deregistered
[  929.343029] uhci_hcd 0000:00:10.1: remove, state 4
[  929.343045] usb usb3: USB disconnect, device number 1
[  929.343685] uhci_hcd 0000:00:10.1: USB bus 3 deregistered
[  929.369087] ehci-pci 0000:00:10.7: remove, state 4
[  929.369102] usb usb4: USB disconnect, device number 1
[  929.370325] ehci-pci 0000:00:10.7: USB bus 4 deregistered
[  932.398494] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  932.398496] PGD 42a67d067 P4D 42a67d067 PUD 42a65f067 PMD 0
[  932.398502] Oops: 0002 [#2] SMP NOPTI
[  932.398505] CPU: 2 PID: 7824 Comm: vfio_unbind.sh Tainted: P   D  4.19.65-2020051917-rainos #1
[  932.398506] Hardware name: Shanghai Zhaoxin Semiconductor Co., Ltd. HX002EH/HX002EH,
               BIOS HX002EH0_01_R480_R_200408 04/08/2020
[  932.398513] RIP: 0010:vfio_device_put+0x31/0xa0 [vfio]
[  932.398515] Code: 89 e5 41 54 53 4c 8b 67 18 48 89 fb 49 8d 74 24 30 e8 e3 0e f3 de
                84 c0 74 67 48 8b 53 20 48 8b 43 28 48 8b 7b 18 48 89 42 08 <48> 89 10
                48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 b8 00 02 00
[  932.398516] RSP: 0018:ffffbbfd04cffc18 EFLAGS: 00010202
[  932.398518] RAX: 0000000000000000 RBX: ffff92c7ea717880 RCX: 0000000000000000
[  932.398519] RDX: ffff92c7ea713620 RSI: ffff92c7ea713630 RDI: ffff92c7ea713600
[  932.398521] RBP: ffffbbfd04cffc28 R08: ffff92c7f02a8080 R09: ffff92c7efc03980
[  932.398522] R10: ffffbbfd04cff9a8 R11: 0000000000000000 R12: ffff92c7ea713600
[  932.398523] R13: ffff92c7ed8bb0a8 R14: ffff92c7ea717880 R15: 0000000000000000
[  932.398525] FS:  00007f3031500740(0000) GS:ffff92c7f0280000(0000) knlGS:0000000000000000
[  932.398526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  932.398527] CR2: 0000000000000000 CR3: 0000000428626004 CR4: 0000000000160ee0
[  932.398528] Call Trace:
[  932.398534]  vfio_del_group_dev+0xe8/0x2a0 [vfio]
[  932.398539]  ? __blocking_notifier_call_chain+0x52/0x60
[  932.398542]  ? do_wait_intr_irq+0x90/0x90
[  932.398546]  ? iommu_bus_notifier+0x75/0x100
[  932.398551]  vfio_pci_remove+0x20/0xa0 [vfio_pci]
[  932.398554]  pci_device_remove+0x3e/0xc0
[  932.398557]  device_release_driver_internal+0x17a/0x240
[  932.398560]  device_release_driver+0x12/0x20
[  932.398561]  unbind_store+0xee/0x180
[  932.398564]  drv_attr_store+0x27/0x40
[  932.398567]  sysfs_kf_write+0x3c/0x50
[  932.398568]  kernfs_fop_write+0x125/0x1a0
[  932.398572]  __vfs_write+0x3a/0x190
[  932.398575]  ? apparmor_file_permission+0x1a/0x20
[  932.398577]  ? security_file_permission+0x3b/0xc0
[  932.398581]  ? _cond_resched+0x1a/0x50
[  932.398582]  vfs_write+0xb8/0x1b0
[  932.398584]  ksys_write+0x5c/0xe0
[  932.398586]  __x64_sys_write+0x1a/0x20
[  932.398589]  do_syscall_64+0x5a/0x110
[  932.398592]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Using virt-manager/qemu to boot guest os, we can see the same fail sequence!

Fix this by check for UHCI driver loaded or not before modifiy UHCI's dev->driver_data,
which will happen in ehci native driver probe/remove.

Signed-off-by: WeitaoWangoc <WeitaoWang-oc@zhaoxin.com>
---
 drivers/usb/core/hcd-pci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index 1547aa6..484f2a0 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem);
 #define CL_OHCI                PCI_CLASS_SERIAL_USB_OHCI
 #define CL_EHCI                PCI_CLASS_SERIAL_USB_EHCI

+#define PCI_DEV_DRV_FLAG       2
 static inline int is_ohci_or_uhci(struct pci_dev *pdev)
 {
        return pdev->class == CL_OHCI || pdev->class == CL_UHCI;
@@ -68,6 +69,8 @@ static void for_each_companion(struct pci_dev *pdev, struct usb_hcd *hcd,
                if (companion->class != CL_UHCI && companion->class != CL_OHCI &&
                                companion->class != CL_EHCI)
                        continue;
+               if (!(companion->priv_flags & PCI_DEV_DRV_FLAG))
+                       continue;

                companion_hcd = pci_get_drvdata(companion);
                if (!companion_hcd || !companion_hcd->self.root_hub)
@@ -254,6 +257,7 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct pci_device_id *id,
        }

        pci_set_master(dev);
+       dev->priv_flags | = PCI_DEV_DRV_FLAG;

        /* Note: dev_set_drvdata must be called while holding the rwsem */
        if (dev->class == CL_EHCI) {
@@ -326,6 +330,7 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
        local_irq_disable();
        usb_hcd_irq(0, hcd);
        local_irq_enable();
+       dev->priv_flags & = ~PCI_DEV_DRV_FLAG;

        /* Note: dev_set_drvdata must be called while holding the rwsem */
        if (dev->class == CL_EHCI) {
--
2.7.4



保密声明:
本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。
CONFIDENTIAL NOTE:
This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-22 11:57 [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci WeitaoWangoc
@ 2020-07-22 12:44 ` Greg KH
  2020-07-22 14:59   ` Alan Stern
  2020-07-23  2:58   ` Weitao Wang(BJ-RD)
  2020-07-23  3:53 ` Alex Williamson
  1 sibling, 2 replies; 17+ messages in thread
From: Greg KH @ 2020-07-22 12:44 UTC (permalink / raw)
  To: WeitaoWangoc
  Cc: mathias.nyman, ulf.hansson, vkoul, hslester96, linux-usb,
	linux-kernel, Carsten_Schmid, efremov, tonywwang, weitaowang,
	CobeChen, TimGuo, wwt8723

On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote:
> This bug is found in Zhaoxin platform, but it's a commom code bug.
> Fail sequence:
> step1: Unbind UHCI controller from native driver;
> step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one vfio
>        group's device list and set UHCI's dev->driver_data to struct vfio-pci(for UHCI)

Hah, that works?  How do you do that properly?  What code does that?

> step3: Unbind EHCI controller from native driver, will try to tell UHCI native driver
>        that "I'm removed by set companion_hcd->self.hs_companion to NULL. However,
>        companion_hcd get from UHCI's dev->driver_data that has modified by vfio-pci
>        already.So, the vfio-pci structure will be damaged!

Because you are messing around with bind/unbind "by hand", which is
never guaranteed to work properly.

It's amazing any of that works at all...

> step4: Bind EHCI controller to vfio-pci driver, which will put EHCI controller in the
>        same vfio group as UHCI controller;
>      ... ...
> step5: Unbind UHCI controller from vfio-pci, which will delete UHCI from vfio group'
>        device list that has been damaged in step 3. So,delete operation can random
>        result into a NULL pointer dereference with the below stack dump.
> step6: Bind UHCI controller to native driver;
> step7: Unbind EHCI controller from vfio-pci, which will try to remove EHCI controller
>        from the vfio group;
> step8: Bind EHCI controller to native driver;
> 
> [  929.114641] uhci_hcd 0000:00:10.0: remove, state 1
> [  929.114652] usb usb1: USB disconnect, device number 1
> [  929.114655] usb 1-1: USB disconnect, device number 2
> [  929.270313] usb 1-2: USB disconnect, device number 3
> [  929.318404] uhci_hcd 0000:00:10.0: USB bus 1 deregistered
> [  929.343029] uhci_hcd 0000:00:10.1: remove, state 4
> [  929.343045] usb usb3: USB disconnect, device number 1
> [  929.343685] uhci_hcd 0000:00:10.1: USB bus 3 deregistered
> [  929.369087] ehci-pci 0000:00:10.7: remove, state 4
> [  929.369102] usb usb4: USB disconnect, device number 1
> [  929.370325] ehci-pci 0000:00:10.7: USB bus 4 deregistered
> [  932.398494] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> [  932.398496] PGD 42a67d067 P4D 42a67d067 PUD 42a65f067 PMD 0
> [  932.398502] Oops: 0002 [#2] SMP NOPTI
> [  932.398505] CPU: 2 PID: 7824 Comm: vfio_unbind.sh Tainted: P   D  4.19.65-2020051917-rainos #1

4.19 is a bit old, what about 5.7?

> [  932.398506] Hardware name: Shanghai Zhaoxin Semiconductor Co., Ltd. HX002EH/HX002EH,
>                BIOS HX002EH0_01_R480_R_200408 04/08/2020
> [  932.398513] RIP: 0010:vfio_device_put+0x31/0xa0 [vfio]
> [  932.398515] Code: 89 e5 41 54 53 4c 8b 67 18 48 89 fb 49 8d 74 24 30 e8 e3 0e f3 de
>                 84 c0 74 67 48 8b 53 20 48 8b 43 28 48 8b 7b 18 48 89 42 08 <48> 89 10
>                 48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 b8 00 02 00
> [  932.398516] RSP: 0018:ffffbbfd04cffc18 EFLAGS: 00010202
> [  932.398518] RAX: 0000000000000000 RBX: ffff92c7ea717880 RCX: 0000000000000000
> [  932.398519] RDX: ffff92c7ea713620 RSI: ffff92c7ea713630 RDI: ffff92c7ea713600
> [  932.398521] RBP: ffffbbfd04cffc28 R08: ffff92c7f02a8080 R09: ffff92c7efc03980
> [  932.398522] R10: ffffbbfd04cff9a8 R11: 0000000000000000 R12: ffff92c7ea713600
> [  932.398523] R13: ffff92c7ed8bb0a8 R14: ffff92c7ea717880 R15: 0000000000000000
> [  932.398525] FS:  00007f3031500740(0000) GS:ffff92c7f0280000(0000) knlGS:0000000000000000
> [  932.398526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  932.398527] CR2: 0000000000000000 CR3: 0000000428626004 CR4: 0000000000160ee0
> [  932.398528] Call Trace:
> [  932.398534]  vfio_del_group_dev+0xe8/0x2a0 [vfio]
> [  932.398539]  ? __blocking_notifier_call_chain+0x52/0x60
> [  932.398542]  ? do_wait_intr_irq+0x90/0x90
> [  932.398546]  ? iommu_bus_notifier+0x75/0x100
> [  932.398551]  vfio_pci_remove+0x20/0xa0 [vfio_pci]
> [  932.398554]  pci_device_remove+0x3e/0xc0
> [  932.398557]  device_release_driver_internal+0x17a/0x240
> [  932.398560]  device_release_driver+0x12/0x20
> [  932.398561]  unbind_store+0xee/0x180
> [  932.398564]  drv_attr_store+0x27/0x40
> [  932.398567]  sysfs_kf_write+0x3c/0x50
> [  932.398568]  kernfs_fop_write+0x125/0x1a0
> [  932.398572]  __vfs_write+0x3a/0x190
> [  932.398575]  ? apparmor_file_permission+0x1a/0x20
> [  932.398577]  ? security_file_permission+0x3b/0xc0
> [  932.398581]  ? _cond_resched+0x1a/0x50
> [  932.398582]  vfs_write+0xb8/0x1b0
> [  932.398584]  ksys_write+0x5c/0xe0
> [  932.398586]  __x64_sys_write+0x1a/0x20
> [  932.398589]  do_syscall_64+0x5a/0x110
> [  932.398592]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Using virt-manager/qemu to boot guest os, we can see the same fail sequence!
> 
> Fix this by check for UHCI driver loaded or not before modifiy UHCI's dev->driver_data,
> which will happen in ehci native driver probe/remove.
> 
> Signed-off-by: WeitaoWangoc <WeitaoWang-oc@zhaoxin.com>
> ---
>  drivers/usb/core/hcd-pci.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> index 1547aa6..484f2a0 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem);
>  #define CL_OHCI                PCI_CLASS_SERIAL_USB_OHCI
>  #define CL_EHCI                PCI_CLASS_SERIAL_USB_EHCI
> 
> +#define PCI_DEV_DRV_FLAG       2

Why is the USB core allowed to touch a private PCI structure field?

I do not think this is allowed.

And why is this a USB host controller-specific issue, shouldn't this
type of thing be fixed in the PCI core?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-22 12:44 ` Greg KH
@ 2020-07-22 14:59   ` Alan Stern
  2020-07-23  2:59     ` 答复: " Weitao Wang(BJ-RD)
  2020-07-23  2:58   ` Weitao Wang(BJ-RD)
  1 sibling, 1 reply; 17+ messages in thread
From: Alan Stern @ 2020-07-22 14:59 UTC (permalink / raw)
  To: Greg KH
  Cc: WeitaoWangoc, mathias.nyman, ulf.hansson, vkoul, hslester96,
	linux-usb, linux-kernel, Carsten_Schmid, efremov, tonywwang,
	weitaowang, CobeChen, TimGuo, wwt8723

On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote:
> On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote:
> > This bug is found in Zhaoxin platform, but it's a commom code bug.
> > Fail sequence:
> > step1: Unbind UHCI controller from native driver;
> > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one vfio
> >        group's device list and set UHCI's dev->driver_data to struct vfio-pci(for UHCI)
> 
> Hah, that works?  How do you do that properly?  What code does that?

Yeah, that can't possibly work.  The USB core expects that any host 
controller device (or at least, any PCI host controller device) has its 
driver_data set to point to a struct usb_hcd.  It doesn't expect a host 
controller to be bound to anything other than a host controller driver.

Things could easily go very wrong here.  For example, suppose at this 
point the ehci-hcd driver just happens to bind to the EHCI controller.  
When this happens, the EHCI controller hardware takes over all the USB 
connections that were routed to the UHCI controller.  How will vfio-pci 
deal with that?  Pretty ungracefully, I imagine.

The only way to make this work at all is to unbind both uhci-hcd and 
ehci-hcd first.  Then after both are finished you can safely bind 
vfio-pci to the EHCI controller and the UHCI controllers (in that 
order).

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* 答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-22 12:44 ` Greg KH
  2020-07-22 14:59   ` Alan Stern
@ 2020-07-23  2:58   ` Weitao Wang(BJ-RD)
  1 sibling, 0 replies; 17+ messages in thread
From: Weitao Wang(BJ-RD) @ 2020-07-23  2:58 UTC (permalink / raw)
  To: Greg KH, WeitaoWang-oc
  Cc: mathias.nyman, ulf.hansson, vkoul, hslester96, linux-usb,
	linux-kernel, Carsten_Schmid, efremov, Tony W. Wang(XA-RD),
	Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723

  
On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote:
> On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote:
> > This bug is found in Zhaoxin platform, but it's a commom code bug.
> > Fail sequence:
> > step1: Unbind UHCI controller from native driver;
> > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one
> vfio
> >        group's device list and set UHCI's dev->driver_data to struct
> vfio-pci(for UHCI)
>
> Hah, that works?  How do you do that properly?  What code does that?

Drivers/vfio/vfio.c
The function vfio_group_create_device will set UHCI dev->driver_data
to vfio-device struct.

> > step3: Unbind EHCI controller from native driver, will try to tell UHCI native
> driver
> >        that "I'm removed by set companion_hcd->self.hs_companion to
> NULL. However,
> >        companion_hcd get from UHCI's dev->driver_data that has modified
> by vfio-pci
> >        already.So, the vfio-pci structure will be damaged!
>
> Because you are messing around with bind/unbind "by hand", which is
> never guaranteed to work properly.
>
> It's amazing any of that works at all...

If adjust the sequence of UHCI/EHCI unbind form native driver, that
can avoid Null Pointer dereference. Eg. Step3-->stet4-->step1-->step2.
Our experiments prove that this sequence is indeed good as expected.
However, some application software such as virt-manager/qemu assign
/EHCI to guest OS has the same bind/unbind sequence as test “by hand”.

> 4.19.65-2020051917-rainos #1
>
> 4.19 is a bit old, what about 5.7?

5.7.7 has the same issue.

> > +#define PCI_DEV_DRV_FLAG       2
>
> Why is the USB core allowed to touch a private PCI structure field?
>
> I do not think this is allowed.
>
> And why is this a USB host controller-specific issue, shouldn't this
> type of thing be fixed in the PCI core?

When ehci hcd_driver bind or unbind to ehci, it will touch/modify UHCI usb_hcd
that get from UHCI's dev->driver_data. However, UHCI maybe bind to a
another driver(vfio-pci) and it's driver_data will be rewritten. what we
should do is to prevent ehci_hcd to modify UHCI's dev->driver_data when UHCI
has already bound to vfio-pci.

Thanks
weitaowang



保密声明:
本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。
CONFIDENTIAL NOTE:
This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* 答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-22 14:59   ` Alan Stern
@ 2020-07-23  2:59     ` Weitao Wang(BJ-RD)
  2020-07-23  4:18       ` Alex Williamson
  2020-07-23  5:25       ` Greg KH
  0 siblings, 2 replies; 17+ messages in thread
From: Weitao Wang(BJ-RD) @ 2020-07-23  2:59 UTC (permalink / raw)
  To: Alan Stern, Greg KH
  Cc: WeitaoWang-oc, mathias.nyman, ulf.hansson, vkoul, hslester96,
	linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723


On , Jul 22, 2020 at 02:44:14PM +0200, Alan wrote:
> On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote:
> > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote:
> > > This bug is found in Zhaoxin platform, but it's a commom code bug.
> > > Fail sequence:
> > > step1: Unbind UHCI controller from native driver;
> > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one
> vfio
> > >        group's device list and set UHCI's dev->driver_data to struct
> vfio-pci(for UHCI)
> >
> > Hah, that works?  How do you do that properly?  What code does that?
>
> Yeah, that can't possibly work.  The USB core expects that any host
> controller device (or at least, any PCI host controller device) has its
> driver_data set to point to a struct usb_hcd.  It doesn't expect a host
> controller to be bound to anything other than a host controller driver.
>
> Things could easily go very wrong here.  For example, suppose at this
> point the ehci-hcd driver just happens to bind to the EHCI controller.
> When this happens, the EHCI controller hardware takes over all the USB
> connections that were routed to the UHCI controller.  How will vfio-pci
> deal with that?  Pretty ungracefully, I imagine.
>
> The only way to make this work at all is to unbind both uhci-hcd and
> ehci-hcd first.  Then after both are finished you can safely bind
> vfio-pci to the EHCI controller and the UHCI controllers (in that
> order).
>
I'm agree with you, unbind both uhci-hcd and ehci-hcd first then bind to
vfio-pci is a more reasonable sequence. Our experiments prove that this
sequence is indeed good as expected.
However, I did not find a formal document to prescribe this order.
Unfortunately, some application software such as virt-manager/qemu assign
UHCI/EHCI to guest OS has the same bind/unbind sequence as test “by hand”.
Do we need to consider compatibility with this application scenario?

The following log is captured when starting then shutdown the
virtual machine.

/* starting virtual machine*/
[ 2785.250001] audit: type=1400 audit(1594375837.191:48): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2008 comm="apparmor_parser"
[ 2785.467510] uhci_hcd 0000:00:10.0: remove, state 4
[ 2785.472426] usb usb1: USB disconnect, device number 1
/*bind 0000:00:10.0 to vfio-pci*/
[ 2785.478798] uhci_hcd 0000:00:10.0: USB bus 1 deregistered
[ 2785.568741] uhci_hcd 0000:00:10.1: remove, state 1
[ 2785.573562] usb usb2: USB disconnect, device number 1
[ 2785.578793] usb 2-2: USB disconnect, device number 3
[ 2785.758016] uhci_hcd 0000:00:10.1: USB bus 2 deregistered
/*bind 0000:00:10.1 to vfio-pci*/
[ 2786.037448] ehci-pci 0000:00:10.7: remove, state 4
[ 2786.042460] usb usb3: USB disconnect, device number 1
[ 2786.048700] ehci-pci 0000:00:10.7: USB bus 3 deregistered
/*bind 0000:00:10.7 to vfio-pci*/
[ 2787.518041] audit: type=1400 audit(1594375839.459:49): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2034 comm="apparmor_parser"
[ 2788.290706] audit: type=1400 audit(1594375840.231:50): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2037 comm="apparmor_parser"
[ 2788.960070] audit: type=1400 audit(1594375840.899:51): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2040 comm="apparmor_parser"
[ 2788.968821] virbr0: port 2(vnet0) entered blocking state
[ 2788.988159] virbr0: port 2(vnet0) entered disabled state
[ 2788.993711] device vnet0 entered promiscuous mode
[ 2788.999453] virbr0: port 2(vnet0) entered blocking state
[ 2789.005053] virbr0: port 2(vnet0) entered listening state
[ 2789.098717] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 2789.564241] audit: type=1400 audit(1594375841.507:52): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2065 comm="apparmor_parser"
[ 2791.028028] virbr0: port 2(vnet0) entered learning state
[ 2793.047999] virbr0: port 2(vnet0) entered forwarding state
[ 2793.053449] virbr0: topology change detected, propagating
[ 2793.433604] vfio_cap_init: 0000:00:10.7 hiding cap 0xa

/* shutdown virtual machine*/
[ 3838.772058] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3838.815819] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3838.871002] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3838.884606] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3838.894514] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3838.896791] rfkill: input handler enabled
[ 3838.903896] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3838.907998] systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/packagekit_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=952 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
[ 3838.949500] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3839.002757] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3839.182053] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3839.191313] systemd-journald[286]: Successfully sent stream file descriptor to service manager.
[ 3838.302725] libvirt-guests.sh[2161]: Running guests on default URI: generic
[ 3838.306783] libvirt-guests.sh[2161]: Shutting down guests on default URI...
[ 3838.415103] libvirt-guests.sh[2161]: Starting shutdown on guest: generic
plymouth-poweroff.service
[  OK  ] Stopped Firmware update daemon.
[  OK  ] Stopped Session 1 of user wang.
[  OK  ] Removed slice User Slice of wang.
         Stopping Permit User Sessions...
         Stopping Login Service...
[  OK  ] Stopped Permit User Sessions.
[  OK  ] Unmounted Mount unit for core, revision 9289.
[  OK  ] Unmounted Mount unit for gnome-system-monitor, revision 148.
[  OK  ] Unmounted Mount unit for gnome-3-34-1804, revision 33.
[  OK  ] Unmounted Mount unit for gnome-logs, revision 81.
[  OK  ] Unmounted Mount unit for core18, revision 1754.
[  OK  ] Unmounted Mount unit for gnome-3-28-1804, revision 116.
[  OK  ] Unmounted Mount unit for gnome-characters, revision 550.
[  OK  ] Unmounted Mount unit for gnome-3-34-1804, revision 36.
[  OK  ] Unmounted Mount unit for core, revision 9436.
[  OK  ] Unmounted Mount unit for gnome-characters, revision 539.
[  OK  ] Unmounted Mount unit for gtk-common-themes, revision 1440.
[  OK  ] Unmounted Mount unit for gtk-common-themes, revision 1506.
[  OK  ] Unmounted Mount unit for core18, revision 1668.
[  OK  ] Unmounted Mount unit for gnome-calculator, revision 544.
[  OK  ] Unmounted Mount unit for gnome-calculator, revision 748.
[  OK  ] Unmounted Mount unit for gnome-logs, revision 100.
[  OK  ] Unmounted Mount unit for gnome-3-28-1804, revision 128.
[  OK  ] Stopped Login Service.
[  OK  ] Stopped target User and Group Name Lookups.
[ 3839.471635] libvirt-guests.sh[2161]: Waiting for 1 guests to shut down, 120 seconds left
[ 3841.824842] virbr0: port 2(vnet0) entered disabled state
[ 3841.832949] device vnet0 left promiscuous mode
[ 3841.837393] virbr0: port 2(vnet0) entered disabled state
[[ 3843.167495] audit: type=1400 audit(1594376895.107:53): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2301 comm="apparmor_parser"
[ 3843.246397] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 3843.254157] PGD 0 P4D 0
[ 3843.256671] Oops: 0002 [#1] SMP NOPTI
[ 3843.260301] CPU: 1 PID: 1812 Comm: libvirtd Not tainted 4.19.65 #10
[ 3843.266511] Hardware name: Shanghai Zhaoxin Semiconductor Co., Ltd. HX002EA/HX002EA, BIOS HX002EA0_03_R490_D_200707 07/07/2020
[ 3843.277804] RIP: 0010:vfio_device_put+0xa5/0x140 [vfio]
[ 3843.283129] Code: 1c 4e ce 48 8b 73 28 48 8b 53 20 48 c7 c7 48 54 61 c0 e8 51 1c 4e ce 48 8b 53 20 48 8b 43 28 48 c7 c7 80 54 61 c0 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 05 00 01 00
[ 3843.301726] RSP: 0018:ffff8fde2210bc28 EFLAGS: 00010282
[ 3843.306905] RAX: 0000000000000000 RBX: ffff8fde1a59ed00 RCX: 0000000000000000
[ 3843.313975] RDX: ffff8fdde27d6820 RSI: ffff8fde2fa96438 RDI: ffffffffc0615480
[ 3843.321118] RBP: ffff8fde2210bc48 R08: 0000000000000c90 R09: 3d7478656e2c303d
[ 3843.328190] R10: ffff8fdde2284520 R11: 3032383664373265 R12: ffff8fdde27d6800
[ 3843.335314] R13: ffff8fde1a59ed20 R14: ffff8fdde27d6800 R15: 0000000000000000
[ 3843.342390] FS:  00007f91fe3d6700(0000) GS:ffff8fde2fa80000(0000) knlGS:0000000000000000
[ 3843.350457] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3843.356221] CR2: 0000000000000000 CR3: 0000000425fec005 CR4: 0000000000160ee0
[ 3843.363312] Call Trace:
[ 3843.365778]  vfio_del_group_dev+0x105/0x2e0 [vfio]
[ 3843.370537]  ? do_wait_intr_irq+0x90/0x90
[ 3843.374550]  ? vprintk_func+0x47/0xc0
[ 3843.378202]  vfio_pci_remove+0x20/0xe0 [vfio_pci]
[ 3843.382900]  pci_device_remove+0x51/0xd0
[ 3843.386799]  device_release_driver_internal+0x18d/0x250
[ 3843.392042]  device_release_driver+0x12/0x20
[ 3843.396324]  unbind_store+0xbd/0x190
[ 3843.399869]  drv_attr_store+0x27/0x40
[ 3843.403541]  sysfs_kf_write+0x3c/0x50
[ 3843.407251]  kernfs_fop_write+0x125/0x1a0
[ 3843.411431]  __vfs_write+0x3a/0x190
[ 3843.414902]  ? apparmor_file_permission+0x1a/0x20
[ 3843.419648]  ? security_file_permission+0x31/0xc0
[ 3843.424648]  ? _cond_resched+0x19/0x40
[ 3843.428407]  vfs_write+0xb1/0x1a0
[ 3843.431696]  ksys_write+0x5c/0xe0
[ 3843.435048]  __x64_sys_write+0x1a/0x20
[ 3843.439056]  do_syscall_64+0x5a/0x120
[ 3843.442713]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3843.447757] RIP: 0033:0x7f924e2902b7
[ 3843.451385] Code: 44 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 5b fd ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 94 fd ff ff 48
[ 3843.470696] RSP: 002b:00007f91fe3d5810 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[ 3843.478508] RAX: ffffffffffffffda RBX: 0000000000000016 RCX: 00007f924e2902b7
[ 3843.485627] RDX: 000000000000000c RSI: 00007f921001c784 RDI: 0000000000000016
[ 3843.493001] RBP: 00007f921001c784 R08: 0000000000000000 R09: 000000000000000d
[ 3843.500120] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000c
[ 3843.507413] R13: 0000000000000000 R14: 0000000000000016 R15: 00007f91e8112e20
[ 3843.514541] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter amdgpu chash gpu_sched nls_iso8859_1 snd_hda_codec_hdmi radeon snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm via_cputemp hwmon_vid kvm_intel kvm snd_seq_midi ttm snd_seq_midi_event irqbypass snd_rawmidi crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel snd_seq pcbc aesni_intel aes_x86_64 crypto_simd drm snd_seq_device cryptd glue_helper joydev snd_timer input_leds serio_raw snd i2c_algo_bit fb_sys_fops syscopyarea video sysfillrect sysimgblt
[ 3843.586463]  soundcore mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic ahci psmouse r8169 usbhid hid libahci realtek
[ 3843.600637] CR2: 0000000000000000
[ 3843.604108] ---[ end trace 65d72623b84bf7a3 ]---
[ 3843.608839] RIP: 0010:vfio_device_put+0xa5/0x140 [vfio]
[ 3843.614209] Code: 1c 4e ce 48 8b 73 28 48 8b 53 20 48 c7 c7 48 54 61 c0 e8 51 1c 4e ce 48 8b 53 20 48 8b 43 28 48 c7 c7 80 54 61 c0 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 05 00 01 00
[ 3843.633419] RSP: 0018:ffff8fde2210bc28 EFLAGS: 00010282
[ 3843.638816] RAX: 0000000000000000 RBX: ffff8fde1a59ed00 RCX: 0000000000000000
[ 3843.646133] RDX: ffff8fdde27d6820 RSI: ffff8fde2fa96438 RDI: ffffffffc0615480
[ 3843.653551] RBP: ffff8fde2210bc48 R08: 0000000000000c90 R09: 3d7478656e2c303d
[ 3843.661304] R10: ffff8fdde2284520 R11: 3032383664373265 R12: ffff8fdde27d6800
[ 3843.668540] R13: ffff8fde1a59ed20 R14: ffff8fdde27d6800 R15: 0000000000000000
[ 3843.676029] FS:  00007f91fe3d6700(0000) GS:ffff8fde2fa80000(0000) knlGS:0000000000000000
[ 3843.684398] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3843.690269] CR2: 0000000000000000 CR3: 0000000425fec005 CR4: 0000000000160ee0

Thanks
weitaowang


保密声明:
本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。
CONFIDENTIAL NOTE:
This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-22 11:57 [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci WeitaoWangoc
  2020-07-22 12:44 ` Greg KH
@ 2020-07-23  3:53 ` Alex Williamson
  2020-07-23  8:36   ` 答复: " WeitaoWang-oc
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2020-07-23  3:53 UTC (permalink / raw)
  To: WeitaoWangoc
  Cc: gregkh, mathias.nyman, ulf.hansson, vkoul, hslester96, linux-usb,
	linux-kernel, Carsten_Schmid, efremov, tonywwang, weitaowang,
	CobeChen, TimGuo, wwt8723

On Wed, 22 Jul 2020 19:57:48 +0800
WeitaoWangoc <WeitaoWang-oc@zhaoxin.com> wrote:

> This bug is found in Zhaoxin platform, but it's a commom code bug.
> Fail sequence:
> step1: Unbind UHCI controller from native driver;
> step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one vfio
>        group's device list and set UHCI's dev->driver_data to struct vfio-pci(for UHCI)
> step3: Unbind EHCI controller from native driver, will try to tell UHCI native driver
>        that "I'm removed by set companion_hcd->self.hs_companion to NULL. However,
>        companion_hcd get from UHCI's dev->driver_data that has modified by vfio-pci
>        already.So, the vfio-pci structure will be damaged!
> step4: Bind EHCI controller to vfio-pci driver, which will put EHCI controller in the
>        same vfio group as UHCI controller;
>      ... ...
> step5: Unbind UHCI controller from vfio-pci, which will delete UHCI from vfio group'
>        device list that has been damaged in step 3. So,delete operation can random
>        result into a NULL pointer dereference with the below stack dump.
> step6: Bind UHCI controller to native driver;
> step7: Unbind EHCI controller from vfio-pci, which will try to remove EHCI controller
>        from the vfio group;
> step8: Bind EHCI controller to native driver;
> 
> [  929.114641] uhci_hcd 0000:00:10.0: remove, state 1
> [  929.114652] usb usb1: USB disconnect, device number 1
> [  929.114655] usb 1-1: USB disconnect, device number 2
> [  929.270313] usb 1-2: USB disconnect, device number 3
> [  929.318404] uhci_hcd 0000:00:10.0: USB bus 1 deregistered
> [  929.343029] uhci_hcd 0000:00:10.1: remove, state 4
> [  929.343045] usb usb3: USB disconnect, device number 1
> [  929.343685] uhci_hcd 0000:00:10.1: USB bus 3 deregistered
> [  929.369087] ehci-pci 0000:00:10.7: remove, state 4
> [  929.369102] usb usb4: USB disconnect, device number 1
> [  929.370325] ehci-pci 0000:00:10.7: USB bus 4 deregistered
> [  932.398494] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> [  932.398496] PGD 42a67d067 P4D 42a67d067 PUD 42a65f067 PMD 0
> [  932.398502] Oops: 0002 [#2] SMP NOPTI
> [  932.398505] CPU: 2 PID: 7824 Comm: vfio_unbind.sh Tainted: P   D  4.19.65-2020051917-rainos #1
> [  932.398506] Hardware name: Shanghai Zhaoxin Semiconductor Co., Ltd. HX002EH/HX002EH,
>                BIOS HX002EH0_01_R480_R_200408 04/08/2020
> [  932.398513] RIP: 0010:vfio_device_put+0x31/0xa0 [vfio]
> [  932.398515] Code: 89 e5 41 54 53 4c 8b 67 18 48 89 fb 49 8d 74 24 30 e8 e3 0e f3 de
>                 84 c0 74 67 48 8b 53 20 48 8b 43 28 48 8b 7b 18 48 89 42 08 <48> 89 10
>                 48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 b8 00 02 00
> [  932.398516] RSP: 0018:ffffbbfd04cffc18 EFLAGS: 00010202
> [  932.398518] RAX: 0000000000000000 RBX: ffff92c7ea717880 RCX: 0000000000000000
> [  932.398519] RDX: ffff92c7ea713620 RSI: ffff92c7ea713630 RDI: ffff92c7ea713600
> [  932.398521] RBP: ffffbbfd04cffc28 R08: ffff92c7f02a8080 R09: ffff92c7efc03980
> [  932.398522] R10: ffffbbfd04cff9a8 R11: 0000000000000000 R12: ffff92c7ea713600
> [  932.398523] R13: ffff92c7ed8bb0a8 R14: ffff92c7ea717880 R15: 0000000000000000
> [  932.398525] FS:  00007f3031500740(0000) GS:ffff92c7f0280000(0000) knlGS:0000000000000000
> [  932.398526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  932.398527] CR2: 0000000000000000 CR3: 0000000428626004 CR4: 0000000000160ee0
> [  932.398528] Call Trace:
> [  932.398534]  vfio_del_group_dev+0xe8/0x2a0 [vfio]
> [  932.398539]  ? __blocking_notifier_call_chain+0x52/0x60
> [  932.398542]  ? do_wait_intr_irq+0x90/0x90
> [  932.398546]  ? iommu_bus_notifier+0x75/0x100
> [  932.398551]  vfio_pci_remove+0x20/0xa0 [vfio_pci]
> [  932.398554]  pci_device_remove+0x3e/0xc0
> [  932.398557]  device_release_driver_internal+0x17a/0x240
> [  932.398560]  device_release_driver+0x12/0x20
> [  932.398561]  unbind_store+0xee/0x180
> [  932.398564]  drv_attr_store+0x27/0x40
> [  932.398567]  sysfs_kf_write+0x3c/0x50
> [  932.398568]  kernfs_fop_write+0x125/0x1a0
> [  932.398572]  __vfs_write+0x3a/0x190
> [  932.398575]  ? apparmor_file_permission+0x1a/0x20
> [  932.398577]  ? security_file_permission+0x3b/0xc0
> [  932.398581]  ? _cond_resched+0x1a/0x50
> [  932.398582]  vfs_write+0xb8/0x1b0
> [  932.398584]  ksys_write+0x5c/0xe0
> [  932.398586]  __x64_sys_write+0x1a/0x20
> [  932.398589]  do_syscall_64+0x5a/0x110
> [  932.398592]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Using virt-manager/qemu to boot guest os, we can see the same fail sequence!
> 
> Fix this by check for UHCI driver loaded or not before modifiy UHCI's dev->driver_data,
> which will happen in ehci native driver probe/remove.
> 
> Signed-off-by: WeitaoWangoc <WeitaoWang-oc@zhaoxin.com>
> ---
>  drivers/usb/core/hcd-pci.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> index 1547aa6..484f2a0 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem);
>  #define CL_OHCI                PCI_CLASS_SERIAL_USB_OHCI
>  #define CL_EHCI                PCI_CLASS_SERIAL_USB_EHCI
> 
> +#define PCI_DEV_DRV_FLAG       2
>  static inline int is_ohci_or_uhci(struct pci_dev *pdev)
>  {
>         return pdev->class == CL_OHCI || pdev->class == CL_UHCI;
> @@ -68,6 +69,8 @@ static void for_each_companion(struct pci_dev *pdev, struct usb_hcd *hcd,
>                 if (companion->class != CL_UHCI && companion->class != CL_OHCI &&
>                                 companion->class != CL_EHCI)
>                         continue;
> +               if (!(companion->priv_flags & PCI_DEV_DRV_FLAG))

But pci_dev.priv_flags is private data for the driver that currently
owns the device, which could be vfio-pci.  This is really no different
than assuming the structure at device.driver_data.  If vfio-pci were to
make legitimate use of pci_dev.priv_flags, this could simply blow up
again.  Should there instead be some sort of registration interface
where hcd complaint drivers register their devices and only those
registered devices can have their driver private data arbitrarily poked
by another driver?  Thanks,

Alex

> +                       continue;
> 
>                 companion_hcd = pci_get_drvdata(companion);
>                 if (!companion_hcd || !companion_hcd->self.root_hub)
> @@ -254,6 +257,7 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct pci_device_id *id,
>         }
> 
>         pci_set_master(dev);
> +       dev->priv_flags | = PCI_DEV_DRV_FLAG;
> 
>         /* Note: dev_set_drvdata must be called while holding the rwsem */
>         if (dev->class == CL_EHCI) {
> @@ -326,6 +330,7 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
>         local_irq_disable();
>         usb_hcd_irq(0, hcd);
>         local_irq_enable();
> +       dev->priv_flags & = ~PCI_DEV_DRV_FLAG;
> 
>         /* Note: dev_set_drvdata must be called while holding the rwsem */
>         if (dev->class == CL_EHCI) {
> --
> 2.7.4
> 
> 
> 
> 保密声明:
> 本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。
> CONFIDENTIAL NOTE:
> This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-23  2:59     ` 答复: " Weitao Wang(BJ-RD)
@ 2020-07-23  4:18       ` Alex Williamson
  2020-07-23 15:38         ` Alan Stern
  2020-07-23  5:25       ` Greg KH
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2020-07-23  4:18 UTC (permalink / raw)
  To: Weitao Wang(BJ-RD)
  Cc: Alan Stern, Greg KH, WeitaoWang-oc, mathias.nyman, ulf.hansson,
	vkoul, hslester96, linux-usb, linux-kernel, Carsten_Schmid,
	efremov, Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723

On Thu, 23 Jul 2020 02:59:55 +0000
"Weitao Wang(BJ-RD)" <WeitaoWang@zhaoxin.com> wrote:

> On , Jul 22, 2020 at 02:44:14PM +0200, Alan wrote:
> > On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote:  
> > > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote:  
> > > > This bug is found in Zhaoxin platform, but it's a commom code bug.
> > > > Fail sequence:
> > > > step1: Unbind UHCI controller from native driver;
> > > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one  
> > vfio  
> > > >        group's device list and set UHCI's dev->driver_data to struct  
> > vfio-pci(for UHCI)  
> > >
> > > Hah, that works?  How do you do that properly?  What code does that?  
> >
> > Yeah, that can't possibly work.  The USB core expects that any host
> > controller device (or at least, any PCI host controller device) has its
> > driver_data set to point to a struct usb_hcd.  It doesn't expect a host
> > controller to be bound to anything other than a host controller driver.
> >
> > Things could easily go very wrong here.  For example, suppose at this
> > point the ehci-hcd driver just happens to bind to the EHCI controller.
> > When this happens, the EHCI controller hardware takes over all the USB
> > connections that were routed to the UHCI controller.  How will vfio-pci
> > deal with that?  Pretty ungracefully, I imagine.

The issue I believe we're seeing here is not with vfio-pci trying to do
anything with the device, the IOMMU grouping would prevent a user from
opening any device within the group while other devices within the
group are bound to host drivers.  So it should be fine if the EHCI
device takes over the other ports, but it's not ok that ehci-hcd
assumes the driver private data for any other UHCI/OHCI/EHCI device in
the same slot is something that it's free to modify.  It really seems
there should be some sort of companion device registration/opt-in
rather than modifying unvalidated data.

> > The only way to make this work at all is to unbind both uhci-hcd and
> > ehci-hcd first.  Then after both are finished you can safely bind
> > vfio-pci to the EHCI controller and the UHCI controllers (in that
> > order).
> >  
> I'm agree with you, unbind both uhci-hcd and ehci-hcd first then bind to
> vfio-pci is a more reasonable sequence. Our experiments prove that this
> sequence is indeed good as expected.
> However, I did not find a formal document to prescribe this order.
> Unfortunately, some application software such as virt-manager/qemu assign
> UHCI/EHCI to guest OS has the same bind/unbind sequence as test “by hand”.
> Do we need to consider compatibility with this application scenario?

Unbinding all functions first, before binding any to vfio-pci should
indeed work, thanks to the for_each_companion() function at least
testing for null private data before going further.  I'd still argue
though that these hcd drivers are overstepping their authority by
walking the PCI bus and assuming any device in the same slot, matching
a set of class codes, is making use of a driver with a known data
structure that they're allowed to modify.  Even if we claim that the
user needs to know what they're doing when they change driver binding,
that's a pretty subtle interaction with no validation.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-23  2:59     ` 答复: " Weitao Wang(BJ-RD)
  2020-07-23  4:18       ` Alex Williamson
@ 2020-07-23  5:25       ` Greg KH
  1 sibling, 0 replies; 17+ messages in thread
From: Greg KH @ 2020-07-23  5:25 UTC (permalink / raw)
  To: Weitao Wang(BJ-RD)
  Cc: Alan Stern, WeitaoWang-oc, mathias.nyman, ulf.hansson, vkoul,
	hslester96, linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723

On Thu, Jul 23, 2020 at 02:59:55AM +0000, Weitao Wang(BJ-RD) wrote:
> CONFIDENTIAL NOTE:
> This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited.


This footer is not compatible with Linux mailing lists, sorry, I am not
allowed to respond to it.

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* 答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-23  3:53 ` Alex Williamson
@ 2020-07-23  8:36   ` WeitaoWang-oc
  2020-07-23  8:41     ` gregkh
  0 siblings, 1 reply; 17+ messages in thread
From: WeitaoWang-oc @ 2020-07-23  8:36 UTC (permalink / raw)
  To: Alex Williamson, WeitaoWang-oc
  Cc: gregkh, mathias.nyman, ulf.hansson, vkoul, hslester96, linux-usb,
	linux-kernel, Carsten_Schmid, efremov, Tony W. Wang(XA-RD),
	Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723


On Thu,23 July 2020 04:18:00 +0000 Alex wrote:
> On Wed, 22 Jul 2020 19:57:48 +0800
> WeitaoWangoc <WeitaoWang-oc@zhaoxin.com> wrote:
> 
> >  drivers/usb/core/hcd-pci.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> > index 1547aa6..484f2a0 100644
> > --- a/drivers/usb/core/hcd-pci.c
> > +++ b/drivers/usb/core/hcd-pci.c
> > @@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem);
> >  #define CL_OHCI                PCI_CLASS_SERIAL_USB_OHCI
> >  #define CL_EHCI                PCI_CLASS_SERIAL_USB_EHCI
> >
> > +#define PCI_DEV_DRV_FLAG       2
> >  static inline int is_ohci_or_uhci(struct pci_dev *pdev)  {
> >         return pdev->class == CL_OHCI || pdev->class == CL_UHCI; @@
> > -68,6 +69,8 @@ static void for_each_companion(struct pci_dev *pdev, struct
> usb_hcd *hcd,
> >                 if (companion->class != CL_UHCI && companion->class !=
> CL_OHCI &&
> >                                 companion->class != CL_EHCI)
> >                         continue;
> > +               if (!(companion->priv_flags & PCI_DEV_DRV_FLAG))
> 
> But pci_dev.priv_flags is private data for the driver that currently
> owns the device, which could be vfio-pci.  This is really no different
> than assuming the structure at device.driver_data.  If vfio-pci were to
> make legitimate use of pci_dev.priv_flags, this could simply blow up
> again.  Should there instead be some sort of registration interface
> where hcd complaint drivers register their devices and only those
> registered devices can have their driver private data arbitrarily poked
> by another driver?  Thanks,

Thanks for your explanation. Set pci_dev.priv_flags is really not a 
reasonable approach. Are there any more detailed suggestions 
to patch this issue?

Thanks
Weitaowang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-23  8:36   ` 答复: " WeitaoWang-oc
@ 2020-07-23  8:41     ` gregkh
  0 siblings, 0 replies; 17+ messages in thread
From: gregkh @ 2020-07-23  8:41 UTC (permalink / raw)
  To: WeitaoWang-oc
  Cc: Alex Williamson, mathias.nyman, ulf.hansson, vkoul, hslester96,
	linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723

On Thu, Jul 23, 2020 at 08:36:25AM +0000, WeitaoWang-oc wrote:
> 
> On Thu,23 July 2020 04:18:00 +0000 Alex wrote:
> > On Wed, 22 Jul 2020 19:57:48 +0800
> > WeitaoWangoc <WeitaoWang-oc@zhaoxin.com> wrote:
> > 
> > >  drivers/usb/core/hcd-pci.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> > > index 1547aa6..484f2a0 100644
> > > --- a/drivers/usb/core/hcd-pci.c
> > > +++ b/drivers/usb/core/hcd-pci.c
> > > @@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem);
> > >  #define CL_OHCI                PCI_CLASS_SERIAL_USB_OHCI
> > >  #define CL_EHCI                PCI_CLASS_SERIAL_USB_EHCI
> > >
> > > +#define PCI_DEV_DRV_FLAG       2
> > >  static inline int is_ohci_or_uhci(struct pci_dev *pdev)  {
> > >         return pdev->class == CL_OHCI || pdev->class == CL_UHCI; @@
> > > -68,6 +69,8 @@ static void for_each_companion(struct pci_dev *pdev, struct
> > usb_hcd *hcd,
> > >                 if (companion->class != CL_UHCI && companion->class !=
> > CL_OHCI &&
> > >                                 companion->class != CL_EHCI)
> > >                         continue;
> > > +               if (!(companion->priv_flags & PCI_DEV_DRV_FLAG))
> > 
> > But pci_dev.priv_flags is private data for the driver that currently
> > owns the device, which could be vfio-pci.  This is really no different
> > than assuming the structure at device.driver_data.  If vfio-pci were to
> > make legitimate use of pci_dev.priv_flags, this could simply blow up
> > again.  Should there instead be some sort of registration interface
> > where hcd complaint drivers register their devices and only those
> > registered devices can have their driver private data arbitrarily poked
> > by another driver?  Thanks,
> 
> Thanks for your explanation. Set pci_dev.priv_flags is really not a 
> reasonable approach. Are there any more detailed suggestions 
> to patch this issue?

This is not a kernel issue, it is a "do not do this in this way from
userspace" issue :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-23  4:18       ` Alex Williamson
@ 2020-07-23 15:38         ` Alan Stern
  2020-07-23 16:17           ` Alex Williamson
  0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2020-07-23 15:38 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Weitao Wang(BJ-RD),
	Greg KH, WeitaoWang-oc, mathias.nyman, ulf.hansson, vkoul,
	hslester96, linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723

On Wed, Jul 22, 2020 at 10:18:17PM -0600, Alex Williamson wrote:
> On Thu, 23 Jul 2020 02:59:55 +0000
> "Weitao Wang(BJ-RD)" <WeitaoWang@zhaoxin.com> wrote:
> 
> > On , Jul 22, 2020 at 02:44:14PM +0200, Alan wrote:
> > > On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote:  
> > > > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote:  
> > > > > This bug is found in Zhaoxin platform, but it's a commom code bug.
> > > > > Fail sequence:
> > > > > step1: Unbind UHCI controller from native driver;
> > > > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one  
> > > vfio  
> > > > >        group's device list and set UHCI's dev->driver_data to struct  
> > > vfio-pci(for UHCI)  
> > > >
> > > > Hah, that works?  How do you do that properly?  What code does that?  
> > >
> > > Yeah, that can't possibly work.  The USB core expects that any host
> > > controller device (or at least, any PCI host controller device) has its
> > > driver_data set to point to a struct usb_hcd.  It doesn't expect a host
> > > controller to be bound to anything other than a host controller driver.
> > >
> > > Things could easily go very wrong here.  For example, suppose at this
> > > point the ehci-hcd driver just happens to bind to the EHCI controller.
> > > When this happens, the EHCI controller hardware takes over all the USB
> > > connections that were routed to the UHCI controller.  How will vfio-pci
> > > deal with that?  Pretty ungracefully, I imagine.
> 
> The issue I believe we're seeing here is not with vfio-pci trying to do
> anything with the device, the IOMMU grouping would prevent a user from
> opening any device within the group while other devices within the
> group are bound to host drivers.

You've lost me.  (A) What is IOMMU grouping?  (B) How does it prevent 
users from opening devices?  (C) What do users have to do with the 
problem anyway (USB host controllers and drivers have to do things on 
their own even without user intervention)?

>  So it should be fine if the EHCI
> device takes over the other ports, but it's not ok that ehci-hcd
> assumes the driver private data for any other UHCI/OHCI/EHCI device in
> the same slot is something that it's free to modify.  It really seems
> there should be some sort of companion device registration/opt-in
> rather than modifying unvalidated data.

Until now that hasn't been necessary, since nobody wanted to bind a 
different driver to these devices.

> > > The only way to make this work at all is to unbind both uhci-hcd and
> > > ehci-hcd first.  Then after both are finished you can safely bind
> > > vfio-pci to the EHCI controller and the UHCI controllers (in that
> > > order).
> > >  
> > I'm agree with you, unbind both uhci-hcd and ehci-hcd first then bind to
> > vfio-pci is a more reasonable sequence. Our experiments prove that this
> > sequence is indeed good as expected.
> > However, I did not find a formal document to prescribe this order.
> > Unfortunately, some application software such as virt-manager/qemu assign
> > UHCI/EHCI to guest OS has the same bind/unbind sequence as test “by hand”.
> > Do we need to consider compatibility with this application scenario?
> 
> Unbinding all functions first, before binding any to vfio-pci should
> indeed work, thanks to the for_each_companion() function at least
> testing for null private data before going further.  I'd still argue
> though that these hcd drivers are overstepping their authority by
> walking the PCI bus and assuming any device in the same slot, matching
> a set of class codes, is making use of a driver with a known data
> structure that they're allowed to modify.

Until recently that has been a valid assumption.

>  Even if we claim that the
> user needs to know what they're doing when they change driver binding,
> that's a pretty subtle interaction with no validation.  Thanks,

It's worse than that.  We're not just dealing with a software 
interaction issue -- the _hardware_ for these devices also interacts.  
That's the real reason why the driver for the device on one slot has to 
be aware of the driver for the device on a different slot.

Adding a mechanism for software registration or validation won't fix the 
hardware issues.  Relying on a protocol that requires all the devices to 
be unbound before any of them are bound to a new class of drivers, on 
the other hand, will fix the problem.

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-23 15:38         ` Alan Stern
@ 2020-07-23 16:17           ` Alex Williamson
  2020-07-23 16:38             ` Alan Stern
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2020-07-23 16:17 UTC (permalink / raw)
  To: Alan Stern
  Cc: Weitao Wang(BJ-RD),
	Greg KH, WeitaoWang-oc, mathias.nyman, ulf.hansson, vkoul,
	hslester96, linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723

On Thu, 23 Jul 2020 11:38:21 -0400
Alan Stern <stern@rowland.harvard.edu> wrote:

> On Wed, Jul 22, 2020 at 10:18:17PM -0600, Alex Williamson wrote:
> > On Thu, 23 Jul 2020 02:59:55 +0000
> > "Weitao Wang(BJ-RD)" <WeitaoWang@zhaoxin.com> wrote:
> >   
> > > On , Jul 22, 2020 at 02:44:14PM +0200, Alan wrote:  
> > > > On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote:    
> > > > > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote:    
> > > > > > This bug is found in Zhaoxin platform, but it's a commom code bug.
> > > > > > Fail sequence:
> > > > > > step1: Unbind UHCI controller from native driver;
> > > > > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one    
> > > > vfio    
> > > > > >        group's device list and set UHCI's dev->driver_data to struct    
> > > > vfio-pci(for UHCI)    
> > > > >
> > > > > Hah, that works?  How do you do that properly?  What code does that?    
> > > >
> > > > Yeah, that can't possibly work.  The USB core expects that any host
> > > > controller device (or at least, any PCI host controller device) has its
> > > > driver_data set to point to a struct usb_hcd.  It doesn't expect a host
> > > > controller to be bound to anything other than a host controller driver.
> > > >
> > > > Things could easily go very wrong here.  For example, suppose at this
> > > > point the ehci-hcd driver just happens to bind to the EHCI controller.
> > > > When this happens, the EHCI controller hardware takes over all the USB
> > > > connections that were routed to the UHCI controller.  How will vfio-pci
> > > > deal with that?  Pretty ungracefully, I imagine.  
> > 
> > The issue I believe we're seeing here is not with vfio-pci trying to do
> > anything with the device, the IOMMU grouping would prevent a user from
> > opening any device within the group while other devices within the
> > group are bound to host drivers.  
> 
> You've lost me.  (A) What is IOMMU grouping?  (B) How does it prevent 
> users from opening devices?  (C) What do users have to do with the 
> problem anyway (USB host controllers and drivers have to do things on 
> their own even without user intervention)?

The alternate driver in question here is vfio-pci, which allows IOMMU
protected userspace driver access to a device.  A primary use case of
vfio is to assign PCI devices to a VM, where QEMU is the userspace
driver.  An IOMMU group is a set of one or more devices that are
considered to be DMA isolated from other groups of devices.  DMA
isolation includes, for instance, the potential for peer-to-peer
between devices which cannot be managed by the IOMMU.  DMA between PCIe
functions within a multifunction slot are generally considered to be
non-isolated from one another unless they implement PCIe Access Control
Services (ACS) to indicate isolation.  I've never seen a USB controller
implement ACS, nor should they due to the interactions between
functions, therefore all the USB functions within a slot will be
grouped together.

The vfio framework will not allow users to access groups for which some
of the devices within the group are bound to active host drivers,
therefore in this scenario where we have one USB function bound to a
host driver and the other bound to vfio-pci, the user would not be able
to access the device and the vfio-pci usage of the device is
essentially nothing more than a stub driver until driver binding of the
other devices within the group changes.  IOW, vfio-pci is not trying to
make use of the device with this split binding, the report here is only
a fault seen in the process of moving all of the functions to vfio-pci,
which would then make the devices accessible to the user.

> >  So it should be fine if the EHCI
> > device takes over the other ports, but it's not ok that ehci-hcd
> > assumes the driver private data for any other UHCI/OHCI/EHCI device in
> > the same slot is something that it's free to modify.  It really seems
> > there should be some sort of companion device registration/opt-in
> > rather than modifying unvalidated data.  
> 
> Until now that hasn't been necessary, since nobody wanted to bind a 
> different driver to these devices.
> 
> > > > The only way to make this work at all is to unbind both uhci-hcd and
> > > > ehci-hcd first.  Then after both are finished you can safely bind
> > > > vfio-pci to the EHCI controller and the UHCI controllers (in that
> > > > order).
> > > >    
> > > I'm agree with you, unbind both uhci-hcd and ehci-hcd first then bind to
> > > vfio-pci is a more reasonable sequence. Our experiments prove that this
> > > sequence is indeed good as expected.
> > > However, I did not find a formal document to prescribe this order.
> > > Unfortunately, some application software such as virt-manager/qemu assign
> > > UHCI/EHCI to guest OS has the same bind/unbind sequence as test “by hand”.
> > > Do we need to consider compatibility with this application scenario?  
> > 
> > Unbinding all functions first, before binding any to vfio-pci should
> > indeed work, thanks to the for_each_companion() function at least
> > testing for null private data before going further.  I'd still argue
> > though that these hcd drivers are overstepping their authority by
> > walking the PCI bus and assuming any device in the same slot, matching
> > a set of class codes, is making use of a driver with a known data
> > structure that they're allowed to modify.  
> 
> Until recently that has been a valid assumption.
> 
> >  Even if we claim that the
> > user needs to know what they're doing when they change driver binding,
> > that's a pretty subtle interaction with no validation.  Thanks,  
> 
> It's worse than that.  We're not just dealing with a software 
> interaction issue -- the _hardware_ for these devices also interacts.  
> That's the real reason why the driver for the device on one slot has to 
> be aware of the driver for the device on a different slot.
> 
> Adding a mechanism for software registration or validation won't fix the 
> hardware issues.  Relying on a protocol that requires all the devices to 
> be unbound before any of them are bound to a new class of drivers, on 
> the other hand, will fix the problem.

The IOMMU grouping restriction does solve the hardware issue, so long
as one driver doesn't blindly assume the driver private data for
another device and modify it.  I do agree that your solution would
work, requiring all devices are unbound before any can be bound, but it
also seems difficult to manage.  The issue is largely unique to USB
AFAIK.  On the other hand, drivers coordinating with each other to
register their _private_ data as share-able within a set of drivers
seems like a much more direct and explicit interaction between the
drivers.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-23 16:17           ` Alex Williamson
@ 2020-07-23 16:38             ` Alan Stern
  2020-07-24 12:57               ` 答复: " WeitaoWang-oc
  2020-08-19  3:23               ` TimGuo-oc
  0 siblings, 2 replies; 17+ messages in thread
From: Alan Stern @ 2020-07-23 16:38 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Weitao Wang(BJ-RD),
	Greg KH, WeitaoWang-oc, mathias.nyman, ulf.hansson, vkoul,
	hslester96, linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723

On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote:
> The IOMMU grouping restriction does solve the hardware issue, so long
> as one driver doesn't blindly assume the driver private data for
> another device and modify it.

Correction: The IOMMU grouping restriction solves the hardware issue for 
vfio-pci.  It won't necessarily help if some other driver comes along 
and wants to bind to this hardware.

>   I do agree that your solution would
> work, requiring all devices are unbound before any can be bound, but it
> also seems difficult to manage.  The issue is largely unique to USB
> AFAIK.  On the other hand, drivers coordinating with each other to
> register their _private_ data as share-able within a set of drivers
> seems like a much more direct and explicit interaction between the
> drivers.  Thanks,

Yes, that makes sense.  But it would have to be implemented in the 
driver core, not in particular subsystems like USB or PCI.  And it might 
be seen as overkill, given that only UHCI/OHCI/EHCI devices require this 
sort of sharing AFAIK.

Also, when you think about it, what form would such coordination among 
drivers take?  From your description, it sounds like the drivers would 
agree to avoid accessing each other's private data if the proper 
registration wasn't in place.

On the other hand, a stronger and perhaps more robust approach would be 
to enforce the condition that non-cooperating drivers are never bound to 
devices in the same group at the same time.  That's basically what I'm 
proposing here -- the question is whether the enforcement should be 
instituted in the kernel or should merely be part of a standard protocol 
followed by userspace drivers.

Given that it's currently needed in only one place, it seems reasonable 
to leave this as a "gentlemen's agreement" in userspace for the time 
being instead of adding it to the kernel.

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* 答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-23 16:38             ` Alan Stern
@ 2020-07-24 12:57               ` WeitaoWang-oc
  2020-07-24 19:17                 ` Alex Williamson
  2020-08-19  3:23               ` TimGuo-oc
  1 sibling, 1 reply; 17+ messages in thread
From: WeitaoWang-oc @ 2020-07-24 12:57 UTC (permalink / raw)
  To: Alan Stern, Alex Williamson
  Cc: Greg KH, WeitaoWang-oc, mathias.nyman, ulf.hansson, vkoul,
	hslester96, linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723


On Thu, 23 Jul 2020 12:38:21 -0400, Alan wrote:
> On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote:
> > The IOMMU grouping restriction does solve the hardware issue, so long
> > as one driver doesn't blindly assume the driver private data for
> > another device and modify it.
> 
> Correction: The IOMMU grouping restriction solves the hardware issue for
> vfio-pci.  It won't necessarily help if some other driver comes along
> and wants to bind to this hardware.
> 
> >   I do agree that your solution would
> > work, requiring all devices are unbound before any can be bound, but it
> > also seems difficult to manage.  The issue is largely unique to USB
> > AFAIK.  On the other hand, drivers coordinating with each other to
> > register their _private_ data as share-able within a set of drivers
> > seems like a much more direct and explicit interaction between the
> > drivers.  Thanks,
> 
> Yes, that makes sense.  But it would have to be implemented in the
> driver core, not in particular subsystems like USB or PCI.  And it might
> be seen as overkill, given that only UHCI/OHCI/EHCI devices require this
> sort of sharing AFAIK.
> 
> Also, when you think about it, what form would such coordination among
> drivers take?  From your description, it sounds like the drivers would
> agree to avoid accessing each other's private data if the proper
> registration wasn't in place.
> 
> On the other hand, a stronger and perhaps more robust approach would be
> to enforce the condition that non-cooperating drivers are never bound to
> devices in the same group at the same time.  That's basically what I'm
> proposing here -- the question is whether the enforcement should be
> instituted in the kernel or should merely be part of a standard protocol
> followed by userspace drivers.
> 
> Given that it's currently needed in only one place, it seems reasonable
> to leave this as a "gentlemen's agreement" in userspace for the time
> being instead of adding it to the kernel.
> 	

Provided that EHCI and UHCI host controller declare not support P2P and
ACS. So, we can assign EHCI and UHCI host controller to different IOMMU 
group separately. We assign EHCI host controller to host and assign UHCI
host controller to VM. Then, ehci_hcd driver load/unload operation in host
will cause the same issue as discussed

Thanks
Weitao


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-24 12:57               ` 答复: " WeitaoWang-oc
@ 2020-07-24 19:17                 ` Alex Williamson
  2020-07-28  5:53                   ` 答复: " WeitaoWang-oc
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2020-07-24 19:17 UTC (permalink / raw)
  To: WeitaoWang-oc
  Cc: Alan Stern, Greg KH, mathias.nyman, ulf.hansson, vkoul,
	hslester96, linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723

On Fri, 24 Jul 2020 12:57:49 +0000
WeitaoWang-oc <WeitaoWang-oc@zhaoxin.com> wrote:

> On Thu, 23 Jul 2020 12:38:21 -0400, Alan wrote:
> > On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote:  
> > > The IOMMU grouping restriction does solve the hardware issue, so long
> > > as one driver doesn't blindly assume the driver private data for
> > > another device and modify it.  
> > 
> > Correction: The IOMMU grouping restriction solves the hardware issue for
> > vfio-pci.  It won't necessarily help if some other driver comes along
> > and wants to bind to this hardware.
> >   
> > >   I do agree that your solution would
> > > work, requiring all devices are unbound before any can be bound, but it
> > > also seems difficult to manage.  The issue is largely unique to USB
> > > AFAIK.  On the other hand, drivers coordinating with each other to
> > > register their _private_ data as share-able within a set of drivers
> > > seems like a much more direct and explicit interaction between the
> > > drivers.  Thanks,  
> > 
> > Yes, that makes sense.  But it would have to be implemented in the
> > driver core, not in particular subsystems like USB or PCI.  And it might
> > be seen as overkill, given that only UHCI/OHCI/EHCI devices require this
> > sort of sharing AFAIK.
> > 
> > Also, when you think about it, what form would such coordination among
> > drivers take?  From your description, it sounds like the drivers would
> > agree to avoid accessing each other's private data if the proper
> > registration wasn't in place.
> > 
> > On the other hand, a stronger and perhaps more robust approach would be
> > to enforce the condition that non-cooperating drivers are never bound to
> > devices in the same group at the same time.  That's basically what I'm
> > proposing here -- the question is whether the enforcement should be
> > instituted in the kernel or should merely be part of a standard protocol
> > followed by userspace drivers.
> > 
> > Given that it's currently needed in only one place, it seems reasonable
> > to leave this as a "gentlemen's agreement" in userspace for the time
> > being instead of adding it to the kernel.
> > 	  
> 
> Provided that EHCI and UHCI host controller declare not support P2P and
> ACS. So, we can assign EHCI and UHCI host controller to different IOMMU 
> group separately. We assign EHCI host controller to host and assign UHCI
> host controller to VM. Then, ehci_hcd driver load/unload operation in host
> will cause the same issue as discussed

And you have an example of such a device?  I expect these do not exist,
nor should they.  It seems like it would be an improper use of ACS.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* 答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-24 19:17                 ` Alex Williamson
@ 2020-07-28  5:53                   ` WeitaoWang-oc
  0 siblings, 0 replies; 17+ messages in thread
From: WeitaoWang-oc @ 2020-07-28  5:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alan Stern, Greg KH, mathias.nyman, ulf.hansson, vkoul,
	hslester96, linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD), Tim Guo(BJ-RD),
	wwt8723


On Fri, 24 Jul 2020 19:17:49 +0000 Alex wrote:
> On Fri, 24 Jul 2020 12:57:49 +0000
> WeitaoWang-oc <WeitaoWang-oc@zhaoxin.com> wrote:
> 
> > On Thu, 23 Jul 2020 12:38:21 -0400, Alan wrote:
> > > On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote:
> > > > The IOMMU grouping restriction does solve the hardware issue, so long
> > > > as one driver doesn't blindly assume the driver private data for
> > > > another device and modify it.
> > >
> > > Correction: The IOMMU grouping restriction solves the hardware issue for
> > > vfio-pci.  It won't necessarily help if some other driver comes along
> > > and wants to bind to this hardware.
> > >
> > > >   I do agree that your solution would
> > > > work, requiring all devices are unbound before any can be bound, but it
> > > > also seems difficult to manage.  The issue is largely unique to USB
> > > > AFAIK.  On the other hand, drivers coordinating with each other to
> > > > register their _private_ data as share-able within a set of drivers
> > > > seems like a much more direct and explicit interaction between the
> > > > drivers.  Thanks,
> > >
> > > Yes, that makes sense.  But it would have to be implemented in the
> > > driver core, not in particular subsystems like USB or PCI.  And it might
> > > be seen as overkill, given that only UHCI/OHCI/EHCI devices require this
> > > sort of sharing AFAIK.
> > >
> > > Also, when you think about it, what form would such coordination among
> > > drivers take?  From your description, it sounds like the drivers would
> > > agree to avoid accessing each other's private data if the proper
> > > registration wasn't in place.
> > >
> > > On the other hand, a stronger and perhaps more robust approach would be
> > > to enforce the condition that non-cooperating drivers are never bound to
> > > devices in the same group at the same time.  That's basically what I'm
> > > proposing here -- the question is whether the enforcement should be
> > > instituted in the kernel or should merely be part of a standard protocol
> > > followed by userspace drivers.
> > >
> > > Given that it's currently needed in only one place, it seems reasonable
> > > to leave this as a "gentlemen's agreement" in userspace for the time
> > > being instead of adding it to the kernel.
> > >
> >
> > Provided that EHCI and UHCI host controller declare not support P2P and
> > ACS. So, we can assign EHCI and UHCI host controller to different IOMMU
> > group separately. We assign EHCI host controller to host and assign UHCI
> > host controller to VM. Then, ehci_hcd driver load/unload operation in host
> > will cause the same issue as discussed
> 
> And you have an example of such a device?  I expect these do not exist,
> nor should they.  It seems like it would be an improper use of ACS.
> Thanks,


In kernel source code tree drivers/pci/quirks.c,
There is a device list declared by pci_dev_acs_enabled. In which list, such as 
multi-function device without acs capability not allow peer-to-peer bewteen 
functions. Those device can be assign to to different IOMMU group separately.

Thnaks
weitao


^ permalink raw reply	[flat|nested] 17+ messages in thread

* 答复: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
  2020-07-23 16:38             ` Alan Stern
  2020-07-24 12:57               ` 答复: " WeitaoWang-oc
@ 2020-08-19  3:23               ` TimGuo-oc
  1 sibling, 0 replies; 17+ messages in thread
From: TimGuo-oc @ 2020-08-19  3:23 UTC (permalink / raw)
  To: Alan Stern, Alex Williamson
  Cc: Weitao Wang(BJ-RD),
	Greg KH, WeitaoWang-oc, mathias.nyman, ulf.hansson, vkoul,
	hslester96, linux-usb, linux-kernel, Carsten_Schmid, efremov,
	Tony W. Wang(XA-RD), Cobe Chen(BJ-RD),
	wwt8723

> -----Original Mail-----
> Sender : Alan Stern <stern@rowland.harvard.edu>
> Time : 2020/7/24  0:39
> Receiver : Alex Williamson <alex.williamson@redhat.com>
> CC : Weitao Wang(BJ-RD) <WeitaoWang@zhaoxin.com>; Greg KH
> <gregkh@linuxfoundation.org>; WeitaoWang-oc
> <WeitaoWang-oc@zhaoxin.com>; mathias.nyman@linux.intel.com;
> ulf.hansson@linaro.org; vkoul@kernel.org; hslester96@gmail.com;
> linux-usb@vger.kernel.org; linux-kernel@vger.kernel.org;
> Carsten_Schmid@mentor.com; efremov@linux.com; Tony W. Wang(XA-RD)
> <TonyWWang@zhaoxin.com>; Cobe Chen(BJ-RD) <CobeChen@zhaoxin.com>;
> Tim Guo(BJ-RD) <TimGuo@zhaoxin.com>; wwt8723@163.com
> Subject : Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci
> 
> On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote:
> > The IOMMU grouping restriction does solve the hardware issue, so long
> > as one driver doesn't blindly assume the driver private data for
> > another device and modify it.
> 
> Correction: The IOMMU grouping restriction solves the hardware issue for
> vfio-pci.  It won't necessarily help if some other driver comes along and wants
> to bind to this hardware.
> 
> >   I do agree that your solution would
> > work, requiring all devices are unbound before any can be bound, but
> > it also seems difficult to manage.  The issue is largely unique to USB
> > AFAIK.  On the other hand, drivers coordinating with each other to
> > register their _private_ data as share-able within a set of drivers
> > seems like a much more direct and explicit interaction between the
> > drivers.  Thanks,
> 
> Yes, that makes sense.  But it would have to be implemented in the driver core,
> not in particular subsystems like USB or PCI.  And it might be seen as overkill,
> given that only UHCI/OHCI/EHCI devices require this sort of sharing AFAIK.
> 
> Also, when you think about it, what form would such coordination among drivers
> take?  From your description, it sounds like the drivers would agree to avoid
> accessing each other's private data if the proper registration wasn't in place.
> 
> On the other hand, a stronger and perhaps more robust approach would be to
> enforce the condition that non-cooperating drivers are never bound to devices
> in the same group at the same time.  That's basically what I'm proposing here
> -- the question is whether the enforcement should be instituted in the kernel or
> should merely be part of a standard protocol followed by userspace drivers.
> 
> Given that it's currently needed in only one place, it seems reasonable to leave
> this as a "gentlemen's agreement" in userspace for the time being instead of
> adding it to the kernel.
But Qemu and Virt-manager don't comply with this "gentlemen's agreement" even if the EHCI and UHCI sit in the same IOMMU group. 
And the current behavior of Qemu and linux USB driver will cause a catastrophe. 
Maybe the hackers can use this BUG to attack the computer.
We shouldn't believe the VMM always comply with the "gentlemen's agreement", I think.
> 
> Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-08-19  3:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-22 11:57 [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci WeitaoWangoc
2020-07-22 12:44 ` Greg KH
2020-07-22 14:59   ` Alan Stern
2020-07-23  2:59     ` 答复: " Weitao Wang(BJ-RD)
2020-07-23  4:18       ` Alex Williamson
2020-07-23 15:38         ` Alan Stern
2020-07-23 16:17           ` Alex Williamson
2020-07-23 16:38             ` Alan Stern
2020-07-24 12:57               ` 答复: " WeitaoWang-oc
2020-07-24 19:17                 ` Alex Williamson
2020-07-28  5:53                   ` 答复: " WeitaoWang-oc
2020-08-19  3:23               ` TimGuo-oc
2020-07-23  5:25       ` Greg KH
2020-07-23  2:58   ` Weitao Wang(BJ-RD)
2020-07-23  3:53 ` Alex Williamson
2020-07-23  8:36   ` 答复: " WeitaoWang-oc
2020-07-23  8:41     ` gregkh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).