All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed
@ 2021-10-11  8:40 Wang Hai
  2021-10-11  8:52 ` Barry Song
  0 siblings, 1 reply; 9+ messages in thread
From: Wang Hai @ 2021-10-11  8:40 UTC (permalink / raw)
  To: bhelgaas, maz, tglx, song.bao.hua, gregkh; +Cc: linux-pci, linux-kernel

I got a page fault report when doing fault injection test:

BUG: unable to handle page fault for address: fffffffffffffff4
...
RIP: 0010:sysfs_remove_groups+0x25/0x60
...
Call Trace:
 msi_destroy_sysfs+0x30/0xa0
 free_msi_irqs+0x11d/0x1b0
 __pci_enable_msix_range+0x67f/0x760
 pci_alloc_irq_vectors_affinity+0xe7/0x170
 vp_find_vqs_msix+0x129/0x560
 vp_find_vqs+0x52/0x230
 vp_modern_find_vqs+0x47/0xb0
 p9_virtio_probe+0xa1/0x460 [9pnet_virtio]
 virtio_dev_probe+0x1ed/0x2e0
 really_probe+0x1c7/0x400
 __driver_probe_device+0xa4/0x120
 driver_probe_device+0x32/0xe0
 __driver_attach+0xbf/0x130
 bus_for_each_dev+0xbb/0x110
 driver_attach+0x27/0x30
 bus_add_driver+0x1d9/0x270
 driver_register+0xa9/0x180
 register_virtio_driver+0x31/0x50
 p9_virtio_init+0x3c/0x1000 [9pnet_virtio]
 do_one_initcall+0x7b/0x380
 do_init_module+0x5f/0x21e
 load_module+0x265c/0x2c60
 __do_sys_finit_module+0xb0/0xf0
 __x64_sys_finit_module+0x1a/0x20
 do_syscall_64+0x34/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

When populating msi_irqs sysfs failed in msi_capability_init() or
msix_capability_init(), dev->msi_irq_groups will point to ERR_PTR(...).
This will cause a page fault when destroying the wrong
dev->msi_irq_groups in free_msi_irqs().

Fix this by setting dev->msi_irq_groups to NULL when msi_populate_sysfs()
failed.

Fixes: 2f170814bdd2 ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
---
 drivers/pci/msi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 0099a00af361..6f75db9f3be7 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -561,6 +561,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
 	dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
 	if (IS_ERR(dev->msi_irq_groups)) {
 		ret = PTR_ERR(dev->msi_irq_groups);
+		dev->msi_irq_groups = NULL;
 		goto err;
 	}
 
@@ -733,6 +734,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
 	if (IS_ERR(dev->msi_irq_groups)) {
 		ret = PTR_ERR(dev->msi_irq_groups);
+		dev->msi_irq_groups = NULL;
 		goto out_free;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed
  2021-10-11  8:40 [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed Wang Hai
@ 2021-10-11  8:52 ` Barry Song
  2021-10-11  9:15   ` wanghai (M)
  0 siblings, 1 reply; 9+ messages in thread
From: Barry Song @ 2021-10-11  8:52 UTC (permalink / raw)
  To: Wang Hai
  Cc: Bjorn Helgaas, Marc Zyngier, Thomas Gleixner, Barry Song,
	Greg Kroah-Hartman, linux-pci, LKML

On Mon, Oct 11, 2021 at 9:24 PM Wang Hai <wanghai38@huawei.com> wrote:
>
> I got a page fault report when doing fault injection test:
>
> BUG: unable to handle page fault for address: fffffffffffffff4
> ...
> RIP: 0010:sysfs_remove_groups+0x25/0x60
> ...
> Call Trace:
>  msi_destroy_sysfs+0x30/0xa0
>  free_msi_irqs+0x11d/0x1b0
>  __pci_enable_msix_range+0x67f/0x760
>  pci_alloc_irq_vectors_affinity+0xe7/0x170
>  vp_find_vqs_msix+0x129/0x560
>  vp_find_vqs+0x52/0x230
>  vp_modern_find_vqs+0x47/0xb0
>  p9_virtio_probe+0xa1/0x460 [9pnet_virtio]
>  virtio_dev_probe+0x1ed/0x2e0
>  really_probe+0x1c7/0x400
>  __driver_probe_device+0xa4/0x120
>  driver_probe_device+0x32/0xe0
>  __driver_attach+0xbf/0x130
>  bus_for_each_dev+0xbb/0x110
>  driver_attach+0x27/0x30
>  bus_add_driver+0x1d9/0x270
>  driver_register+0xa9/0x180
>  register_virtio_driver+0x31/0x50
>  p9_virtio_init+0x3c/0x1000 [9pnet_virtio]
>  do_one_initcall+0x7b/0x380
>  do_init_module+0x5f/0x21e
>  load_module+0x265c/0x2c60
>  __do_sys_finit_module+0xb0/0xf0
>  __x64_sys_finit_module+0x1a/0x20
>  do_syscall_64+0x34/0xb0
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> When populating msi_irqs sysfs failed in msi_capability_init() or
> msix_capability_init(), dev->msi_irq_groups will point to ERR_PTR(...).
> This will cause a page fault when destroying the wrong
> dev->msi_irq_groups in free_msi_irqs().
>
> Fix this by setting dev->msi_irq_groups to NULL when msi_populate_sysfs()
> failed.
>
> Fixes: 2f170814bdd2 ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Wang Hai <wanghai38@huawei.com>

Acked-by: Barry Song <song.bao.hua@hisilicon.com>

> ---
>  drivers/pci/msi.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 0099a00af361..6f75db9f3be7 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -561,6 +561,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
>         dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
>         if (IS_ERR(dev->msi_irq_groups)) {
>                 ret = PTR_ERR(dev->msi_irq_groups);
> +               dev->msi_irq_groups = NULL;
>                 goto err;
>         }
>
> @@ -733,6 +734,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>         dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
>         if (IS_ERR(dev->msi_irq_groups)) {
>                 ret = PTR_ERR(dev->msi_irq_groups);
> +               dev->msi_irq_groups = NULL;

Can you define a temp variable and assign it to dev->msi_irq_groups if
the temp variable
is not PTR_ERR?

>                 goto out_free;
>         }
>
> --
> 2.17.1
>

Thanks
Barry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed
  2021-10-11  8:52 ` Barry Song
@ 2021-10-11  9:15   ` wanghai (M)
  2021-10-11 17:11     ` Bjorn Helgaas
  0 siblings, 1 reply; 9+ messages in thread
From: wanghai (M) @ 2021-10-11  9:15 UTC (permalink / raw)
  To: Barry Song
  Cc: Bjorn Helgaas, Marc Zyngier, Thomas Gleixner, Barry Song,
	Greg Kroah-Hartman, linux-pci, LKML


在 2021/10/11 16:52, Barry Song 写道:
> On Mon, Oct 11, 2021 at 9:24 PM Wang Hai <wanghai38@huawei.com> wrote:
>> I got a page fault report when doing fault injection test:
>>
>> BUG: unable to handle page fault for address: fffffffffffffff4
>> ...
>> RIP: 0010:sysfs_remove_groups+0x25/0x60
>> ...
>> Call Trace:
>>   msi_destroy_sysfs+0x30/0xa0
>>   free_msi_irqs+0x11d/0x1b0
>>   __pci_enable_msix_range+0x67f/0x760
>>   pci_alloc_irq_vectors_affinity+0xe7/0x170
>>   vp_find_vqs_msix+0x129/0x560
>>   vp_find_vqs+0x52/0x230
>>   vp_modern_find_vqs+0x47/0xb0
>>   p9_virtio_probe+0xa1/0x460 [9pnet_virtio]
>>   virtio_dev_probe+0x1ed/0x2e0
>>   really_probe+0x1c7/0x400
>>   __driver_probe_device+0xa4/0x120
>>   driver_probe_device+0x32/0xe0
>>   __driver_attach+0xbf/0x130
>>   bus_for_each_dev+0xbb/0x110
>>   driver_attach+0x27/0x30
>>   bus_add_driver+0x1d9/0x270
>>   driver_register+0xa9/0x180
>>   register_virtio_driver+0x31/0x50
>>   p9_virtio_init+0x3c/0x1000 [9pnet_virtio]
>>   do_one_initcall+0x7b/0x380
>>   do_init_module+0x5f/0x21e
>>   load_module+0x265c/0x2c60
>>   __do_sys_finit_module+0xb0/0xf0
>>   __x64_sys_finit_module+0x1a/0x20
>>   do_syscall_64+0x34/0xb0
>>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> When populating msi_irqs sysfs failed in msi_capability_init() or
>> msix_capability_init(), dev->msi_irq_groups will point to ERR_PTR(...).
>> This will cause a page fault when destroying the wrong
>> dev->msi_irq_groups in free_msi_irqs().
>>
>> Fix this by setting dev->msi_irq_groups to NULL when msi_populate_sysfs()
>> failed.
>>
>> Fixes: 2f170814bdd2 ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Signed-off-by: Wang Hai <wanghai38@huawei.com>
> Acked-by: Barry Song <song.bao.hua@hisilicon.com>
>
>> ---
>>   drivers/pci/msi.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index 0099a00af361..6f75db9f3be7 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -561,6 +561,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
>>          dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
>>          if (IS_ERR(dev->msi_irq_groups)) {
>>                  ret = PTR_ERR(dev->msi_irq_groups);
>> +               dev->msi_irq_groups = NULL;
>>                  goto err;
>>          }
>>
>> @@ -733,6 +734,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>>          dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
>>          if (IS_ERR(dev->msi_irq_groups)) {
>>                  ret = PTR_ERR(dev->msi_irq_groups);
>> +               dev->msi_irq_groups = NULL;
> Can you define a temp variable and assign it to dev->msi_irq_groups if
> the temp variable
> is not PTR_ERR?
Of course, I will send a v2 patch.
>
>>                  goto out_free;
>>          }
>>
>> --
>> 2.17.1
>>
> Thanks
> Barry
> .
>
-- 
Wang Hai


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed
  2021-10-11  9:15   ` wanghai (M)
@ 2021-10-11 17:11     ` Bjorn Helgaas
  2021-10-12  1:59       ` wanghai (M)
  0 siblings, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2021-10-11 17:11 UTC (permalink / raw)
  To: wanghai (M)
  Cc: Barry Song, Bjorn Helgaas, Marc Zyngier, Thomas Gleixner,
	Barry Song, Greg Kroah-Hartman, linux-pci, LKML

For v2, please note "git log --oneline drivers/pci/msi.c" and make
your patch follow the style, including capitalization.

On Mon, Oct 11, 2021 at 05:15:28PM +0800, wanghai (M) wrote:
> 
> 在 2021/10/11 16:52, Barry Song 写道:
> > On Mon, Oct 11, 2021 at 9:24 PM Wang Hai <wanghai38@huawei.com> wrote:
> > > I got a page fault report when doing fault injection test:

When you send v2, can you include information about how you injected
the fault?  If it's easy, others can reproduce the failure that way.

> > > BUG: unable to handle page fault for address: fffffffffffffff4
> > > ...
> > > RIP: 0010:sysfs_remove_groups+0x25/0x60
> > > ...
> > > Call Trace:
> > >   msi_destroy_sysfs+0x30/0xa0
> > >   free_msi_irqs+0x11d/0x1b0
> > >   __pci_enable_msix_range+0x67f/0x760
> > >   pci_alloc_irq_vectors_affinity+0xe7/0x170
> > >   vp_find_vqs_msix+0x129/0x560
> > >   vp_find_vqs+0x52/0x230
> > >   vp_modern_find_vqs+0x47/0xb0
> > >   p9_virtio_probe+0xa1/0x460 [9pnet_virtio]
> > >   virtio_dev_probe+0x1ed/0x2e0
> > >   really_probe+0x1c7/0x400
> > >   __driver_probe_device+0xa4/0x120
> > >   driver_probe_device+0x32/0xe0
> > >   __driver_attach+0xbf/0x130
> > >   bus_for_each_dev+0xbb/0x110
> > >   driver_attach+0x27/0x30
> > >   bus_add_driver+0x1d9/0x270
> > >   driver_register+0xa9/0x180
> > >   register_virtio_driver+0x31/0x50
> > >   p9_virtio_init+0x3c/0x1000 [9pnet_virtio]
> > >   do_one_initcall+0x7b/0x380
> > >   do_init_module+0x5f/0x21e
> > >   load_module+0x265c/0x2c60
> > >   __do_sys_finit_module+0xb0/0xf0
> > >   __x64_sys_finit_module+0x1a/0x20
> > >   do_syscall_64+0x34/0xb0
> > >   entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > 
> > > When populating msi_irqs sysfs failed in msi_capability_init() or
> > > msix_capability_init(), dev->msi_irq_groups will point to ERR_PTR(...).
> > > This will cause a page fault when destroying the wrong
> > > dev->msi_irq_groups in free_msi_irqs().
> > > 
> > > Fix this by setting dev->msi_irq_groups to NULL when msi_populate_sysfs()
> > > failed.
> > > 
> > > Fixes: 2f170814bdd2 ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
> > > Reported-by: Hulk Robot <hulkci@huawei.com>

What exactly was reported by the Hulk Robot?  Did it really do the
fault injection and report the page fault?

> > > Signed-off-by: Wang Hai <wanghai38@huawei.com>
> > Acked-by: Barry Song <song.bao.hua@hisilicon.com>
> > 
> > > ---
> > >   drivers/pci/msi.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > index 0099a00af361..6f75db9f3be7 100644
> > > --- a/drivers/pci/msi.c
> > > +++ b/drivers/pci/msi.c
> > > @@ -561,6 +561,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > >          dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
> > >          if (IS_ERR(dev->msi_irq_groups)) {
> > >                  ret = PTR_ERR(dev->msi_irq_groups);
> > > +               dev->msi_irq_groups = NULL;
> > >                  goto err;
> > >          }
> > > 
> > > @@ -733,6 +734,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > >          dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
> > >          if (IS_ERR(dev->msi_irq_groups)) {
> > >                  ret = PTR_ERR(dev->msi_irq_groups);
> > > +               dev->msi_irq_groups = NULL;
> > Can you define a temp variable and assign it to dev->msi_irq_groups if
> > the temp variable
> > is not PTR_ERR?
> Of course, I will send a v2 patch.
> > 
> > >                  goto out_free;
> > >          }
> > > 
> > > --
> > > 2.17.1
> > > 
> > Thanks
> > Barry
> > .
> > 
> -- 
> Wang Hai
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed
  2021-10-11 17:11     ` Bjorn Helgaas
@ 2021-10-12  1:59       ` wanghai (M)
  2021-10-12  2:09         ` Bjorn Helgaas
  0 siblings, 1 reply; 9+ messages in thread
From: wanghai (M) @ 2021-10-12  1:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Barry Song, Bjorn Helgaas, Marc Zyngier, Thomas Gleixner,
	Barry Song, Greg Kroah-Hartman, linux-pci, LKML


在 2021/10/12 1:11, Bjorn Helgaas 写道:
> For v2, please note "git log --oneline drivers/pci/msi.c" and make
> your patch follow the style, including capitalization.
>
> On Mon, Oct 11, 2021 at 05:15:28PM +0800, wanghai (M) wrote:
>> 在 2021/10/11 16:52, Barry Song 写道:
>>> On Mon, Oct 11, 2021 at 9:24 PM Wang Hai <wanghai38@huawei.com> wrote:
>>>> I got a page fault report when doing fault injection test:
> When you send v2, can you include information about how you injected
> the fault?  If it's easy, others can reproduce the failure that way.
Sorry, the reproduction needs to be based on the fault injection framework
provided by Hulk Robot. I don't know how the framework is implemented.

The way to reproduce this is to do a fault injection to make
'msi_attrs = kcalloc() in msi_populate_sysfs()' fail when insmod
9pnet_virtio.ko.

I sent v2 yesterday, can you help review it?
https://lore.kernel.org/linux-pci/20211011130837.766323-1-wanghai38@huawei.com/
>>>> BUG: unable to handle page fault for address: fffffffffffffff4
>>>> ...
>>>> RIP: 0010:sysfs_remove_groups+0x25/0x60
>>>> ...
>>>> Call Trace:
>>>>    msi_destroy_sysfs+0x30/0xa0
>>>>    free_msi_irqs+0x11d/0x1b0
>>>>    __pci_enable_msix_range+0x67f/0x760
>>>>    pci_alloc_irq_vectors_affinity+0xe7/0x170
>>>>    vp_find_vqs_msix+0x129/0x560
>>>>    vp_find_vqs+0x52/0x230
>>>>    vp_modern_find_vqs+0x47/0xb0
>>>>    p9_virtio_probe+0xa1/0x460 [9pnet_virtio]
>>>>    virtio_dev_probe+0x1ed/0x2e0
>>>>    really_probe+0x1c7/0x400
>>>>    __driver_probe_device+0xa4/0x120
>>>>    driver_probe_device+0x32/0xe0
>>>>    __driver_attach+0xbf/0x130
>>>>    bus_for_each_dev+0xbb/0x110
>>>>    driver_attach+0x27/0x30
>>>>    bus_add_driver+0x1d9/0x270
>>>>    driver_register+0xa9/0x180
>>>>    register_virtio_driver+0x31/0x50
>>>>    p9_virtio_init+0x3c/0x1000 [9pnet_virtio]
>>>>    do_one_initcall+0x7b/0x380
>>>>    do_init_module+0x5f/0x21e
>>>>    load_module+0x265c/0x2c60
>>>>    __do_sys_finit_module+0xb0/0xf0
>>>>    __x64_sys_finit_module+0x1a/0x20
>>>>    do_syscall_64+0x34/0xb0
>>>>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>
>>>> When populating msi_irqs sysfs failed in msi_capability_init() or
>>>> msix_capability_init(), dev->msi_irq_groups will point to ERR_PTR(...).
>>>> This will cause a page fault when destroying the wrong
>>>> dev->msi_irq_groups in free_msi_irqs().
>>>>
>>>> Fix this by setting dev->msi_irq_groups to NULL when msi_populate_sysfs()
>>>> failed.
>>>>
>>>> Fixes: 2f170814bdd2 ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
>>>> Reported-by: Hulk Robot <hulkci@huawei.com>
> What exactly was reported by the Hulk Robot?  Did it really do the
> fault injection and report the page fault?
Yes, it reported the error and provided a way to reproduce it
>>>> Signed-off-by: Wang Hai <wanghai38@huawei.com>
>>> Acked-by: Barry Song <song.bao.hua@hisilicon.com>
>>>
>>>> ---
>>>>    drivers/pci/msi.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>>>> index 0099a00af361..6f75db9f3be7 100644
>>>> --- a/drivers/pci/msi.c
>>>> +++ b/drivers/pci/msi.c
>>>> @@ -561,6 +561,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
>>>>           dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
>>>>           if (IS_ERR(dev->msi_irq_groups)) {
>>>>                   ret = PTR_ERR(dev->msi_irq_groups);
>>>> +               dev->msi_irq_groups = NULL;
>>>>                   goto err;
>>>>           }
>>>>
>>>> @@ -733,6 +734,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>>>>           dev->msi_irq_groups = msi_populate_sysfs(&dev->dev);
>>>>           if (IS_ERR(dev->msi_irq_groups)) {
>>>>                   ret = PTR_ERR(dev->msi_irq_groups);
>>>> +               dev->msi_irq_groups = NULL;
>>> Can you define a temp variable and assign it to dev->msi_irq_groups if
>>> the temp variable
>>> is not PTR_ERR?
>> Of course, I will send a v2 patch.
>>>>                   goto out_free;
>>>>           }
>>>>
>>>> --
>>>> 2.17.1
>>>>
>>> Thanks
>>> Barry
>>> .
>>>
>> -- 
>> Wang Hai
>>
> .
>
-- 
Wang Hai


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed
  2021-10-12  1:59       ` wanghai (M)
@ 2021-10-12  2:09         ` Bjorn Helgaas
  2021-10-12  2:25           ` wanghai (M)
  0 siblings, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2021-10-12  2:09 UTC (permalink / raw)
  To: wanghai (M)
  Cc: Barry Song, Bjorn Helgaas, Marc Zyngier, Thomas Gleixner,
	Barry Song, Greg Kroah-Hartman, linux-pci, LKML

On Tue, Oct 12, 2021 at 09:59:40AM +0800, wanghai (M) wrote:
> 
> 在 2021/10/12 1:11, Bjorn Helgaas 写道:
> > For v2, please note "git log --oneline drivers/pci/msi.c" and make
> > your patch follow the style, including capitalization.
> > 
> > On Mon, Oct 11, 2021 at 05:15:28PM +0800, wanghai (M) wrote:
> > > 在 2021/10/11 16:52, Barry Song 写道:
> > > > On Mon, Oct 11, 2021 at 9:24 PM Wang Hai <wanghai38@huawei.com> wrote:
> > > > > I got a page fault report when doing fault injection test:
> > When you send v2, can you include information about how you injected
> > the fault?  If it's easy, others can reproduce the failure that way.
> Sorry, the reproduction needs to be based on the fault injection framework
> provided by Hulk Robot. I don't know how the framework is implemented.
> 
> The way to reproduce this is to do a fault injection to make
> 'msi_attrs = kcalloc() in msi_populate_sysfs()' fail when insmod
> 9pnet_virtio.ko.
> 
> I sent v2 yesterday, can you help review it?
> https://lore.kernel.org/linux-pci/20211011130837.766323-1-wanghai38@huawei.com/
> > > > > BUG: unable to handle page fault for address: fffffffffffffff4
> > > > > ...
> > > > > RIP: 0010:sysfs_remove_groups+0x25/0x60
> > > > > ...
> > > > > Call Trace:
> > > > >    msi_destroy_sysfs+0x30/0xa0
> > > > >    free_msi_irqs+0x11d/0x1b0
> > > > >    __pci_enable_msix_range+0x67f/0x760
> > > > >    pci_alloc_irq_vectors_affinity+0xe7/0x170
> > > > >    vp_find_vqs_msix+0x129/0x560
> > > > >    vp_find_vqs+0x52/0x230
> > > > >    vp_modern_find_vqs+0x47/0xb0
> > > > >    p9_virtio_probe+0xa1/0x460 [9pnet_virtio]
> > > > >    virtio_dev_probe+0x1ed/0x2e0
> > > > >    really_probe+0x1c7/0x400
> > > > >    __driver_probe_device+0xa4/0x120
> > > > >    driver_probe_device+0x32/0xe0
> > > > >    __driver_attach+0xbf/0x130
> > > > >    bus_for_each_dev+0xbb/0x110
> > > > >    driver_attach+0x27/0x30
> > > > >    bus_add_driver+0x1d9/0x270
> > > > >    driver_register+0xa9/0x180
> > > > >    register_virtio_driver+0x31/0x50
> > > > >    p9_virtio_init+0x3c/0x1000 [9pnet_virtio]
> > > > >    do_one_initcall+0x7b/0x380
> > > > >    do_init_module+0x5f/0x21e
> > > > >    load_module+0x265c/0x2c60
> > > > >    __do_sys_finit_module+0xb0/0xf0
> > > > >    __x64_sys_finit_module+0x1a/0x20
> > > > >    do_syscall_64+0x34/0xb0
> > > > >    entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > 
> > > > > When populating msi_irqs sysfs failed in msi_capability_init() or
> > > > > msix_capability_init(), dev->msi_irq_groups will point to ERR_PTR(...).
> > > > > This will cause a page fault when destroying the wrong
> > > > > dev->msi_irq_groups in free_msi_irqs().
> > > > > 
> > > > > Fix this by setting dev->msi_irq_groups to NULL when msi_populate_sysfs()
> > > > > failed.
> > > > > 
> > > > > Fixes: 2f170814bdd2 ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
> > > > > Reported-by: Hulk Robot <hulkci@huawei.com>
> > What exactly was reported by the Hulk Robot?  Did it really do the
> > fault injection and report the page fault?
> Yes, it reported the error and provided a way to reproduce it

Great, can you include a link to that report then?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed
  2021-10-12  2:09         ` Bjorn Helgaas
@ 2021-10-12  2:25           ` wanghai (M)
  2021-10-12  2:39             ` Barry Song
  0 siblings, 1 reply; 9+ messages in thread
From: wanghai (M) @ 2021-10-12  2:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Barry Song, Bjorn Helgaas, Marc Zyngier, Thomas Gleixner,
	Barry Song, Greg Kroah-Hartman, linux-pci, LKML


在 2021/10/12 10:09, Bjorn Helgaas 写道:
> On Tue, Oct 12, 2021 at 09:59:40AM +0800, wanghai (M) wrote:
>> 在 2021/10/12 1:11, Bjorn Helgaas 写道:
>>> For v2, please note "git log --oneline drivers/pci/msi.c" and make
>>> your patch follow the style, including capitalization.
>>>
>>> On Mon, Oct 11, 2021 at 05:15:28PM +0800, wanghai (M) wrote:
>>>> 在 2021/10/11 16:52, Barry Song 写道:
>>>>> On Mon, Oct 11, 2021 at 9:24 PM Wang Hai <wanghai38@huawei.com> wrote:
>>>>>> I got a page fault report when doing fault injection test:
>>> When you send v2, can you include information about how you injected
>>> the fault?  If it's easy, others can reproduce the failure that way.
>> Sorry, the reproduction needs to be based on the fault injection framework
>> provided by Hulk Robot. I don't know how the framework is implemented.
>>
>> The way to reproduce this is to do a fault injection to make
>> 'msi_attrs = kcalloc() in msi_populate_sysfs()' fail when insmod
>> 9pnet_virtio.ko.
>>
>> I sent v2 yesterday, can you help review it?
>> https://lore.kernel.org/linux-pci/20211011130837.766323-1-wanghai38@huawei.com/
>>>>>> BUG: unable to handle page fault for address: fffffffffffffff4
>>>>>> ...
>>>>>> RIP: 0010:sysfs_remove_groups+0x25/0x60
>>>>>> ...
>>>>>> Call Trace:
>>>>>>     msi_destroy_sysfs+0x30/0xa0
>>>>>>     free_msi_irqs+0x11d/0x1b0
>>>>>>     __pci_enable_msix_range+0x67f/0x760
>>>>>>     pci_alloc_irq_vectors_affinity+0xe7/0x170
>>>>>>     vp_find_vqs_msix+0x129/0x560
>>>>>>     vp_find_vqs+0x52/0x230
>>>>>>     vp_modern_find_vqs+0x47/0xb0
>>>>>>     p9_virtio_probe+0xa1/0x460 [9pnet_virtio]
>>>>>>     virtio_dev_probe+0x1ed/0x2e0
>>>>>>     really_probe+0x1c7/0x400
>>>>>>     __driver_probe_device+0xa4/0x120
>>>>>>     driver_probe_device+0x32/0xe0
>>>>>>     __driver_attach+0xbf/0x130
>>>>>>     bus_for_each_dev+0xbb/0x110
>>>>>>     driver_attach+0x27/0x30
>>>>>>     bus_add_driver+0x1d9/0x270
>>>>>>     driver_register+0xa9/0x180
>>>>>>     register_virtio_driver+0x31/0x50
>>>>>>     p9_virtio_init+0x3c/0x1000 [9pnet_virtio]
>>>>>>     do_one_initcall+0x7b/0x380
>>>>>>     do_init_module+0x5f/0x21e
>>>>>>     load_module+0x265c/0x2c60
>>>>>>     __do_sys_finit_module+0xb0/0xf0
>>>>>>     __x64_sys_finit_module+0x1a/0x20
>>>>>>     do_syscall_64+0x34/0xb0
>>>>>>     entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>>>
>>>>>> When populating msi_irqs sysfs failed in msi_capability_init() or
>>>>>> msix_capability_init(), dev->msi_irq_groups will point to ERR_PTR(...).
>>>>>> This will cause a page fault when destroying the wrong
>>>>>> dev->msi_irq_groups in free_msi_irqs().
>>>>>>
>>>>>> Fix this by setting dev->msi_irq_groups to NULL when msi_populate_sysfs()
>>>>>> failed.
>>>>>>
>>>>>> Fixes: 2f170814bdd2 ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
>>>>>> Reported-by: Hulk Robot <hulkci@huawei.com>
>>> What exactly was reported by the Hulk Robot?  Did it really do the
>>> fault injection and report the page fault?
>> Yes, it reported the error and provided a way to reproduce it
> Great, can you include a link to that report then?
> .
Currently hulk robot is still in the process of continuous improvement and
is not open to the public for the time being, so you can not access our
links at the moment. We will open it in the future when it is perfected.

-- 
Wang Hai


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed
  2021-10-12  2:25           ` wanghai (M)
@ 2021-10-12  2:39             ` Barry Song
  2021-10-12  2:47               ` wanghai (M)
  0 siblings, 1 reply; 9+ messages in thread
From: Barry Song @ 2021-10-12  2:39 UTC (permalink / raw)
  To: wanghai (M)
  Cc: Bjorn Helgaas, Bjorn Helgaas, Marc Zyngier, Thomas Gleixner,
	Barry Song, Greg Kroah-Hartman, linux-pci, LKML

On Tue, Oct 12, 2021 at 3:25 PM wanghai (M) <wanghai38@huawei.com> wrote:
>
>
> 在 2021/10/12 10:09, Bjorn Helgaas 写道:
> > On Tue, Oct 12, 2021 at 09:59:40AM +0800, wanghai (M) wrote:
> >> 在 2021/10/12 1:11, Bjorn Helgaas 写道:
> >>> For v2, please note "git log --oneline drivers/pci/msi.c" and make
> >>> your patch follow the style, including capitalization.
> >>>
> >>> On Mon, Oct 11, 2021 at 05:15:28PM +0800, wanghai (M) wrote:
> >>>> 在 2021/10/11 16:52, Barry Song 写道:
> >>>>> On Mon, Oct 11, 2021 at 9:24 PM Wang Hai <wanghai38@huawei.com> wrote:
> >>>>>> I got a page fault report when doing fault injection test:
> >>> When you send v2, can you include information about how you injected
> >>> the fault?  If it's easy, others can reproduce the failure that way.
> >> Sorry, the reproduction needs to be based on the fault injection framework
> >> provided by Hulk Robot. I don't know how the framework is implemented.
> >>
> >> The way to reproduce this is to do a fault injection to make
> >> 'msi_attrs = kcalloc() in msi_populate_sysfs()' fail when insmod
> >> 9pnet_virtio.ko.
> >>
> >> I sent v2 yesterday, can you help review it?
> >> https://lore.kernel.org/linux-pci/20211011130837.766323-1-wanghai38@huawei.com/
> >>>>>> BUG: unable to handle page fault for address: fffffffffffffff4
> >>>>>> ...
> >>>>>> RIP: 0010:sysfs_remove_groups+0x25/0x60
> >>>>>> ...
> >>>>>> Call Trace:
> >>>>>>     msi_destroy_sysfs+0x30/0xa0
> >>>>>>     free_msi_irqs+0x11d/0x1b0
> >>>>>>     __pci_enable_msix_range+0x67f/0x760
> >>>>>>     pci_alloc_irq_vectors_affinity+0xe7/0x170
> >>>>>>     vp_find_vqs_msix+0x129/0x560
> >>>>>>     vp_find_vqs+0x52/0x230
> >>>>>>     vp_modern_find_vqs+0x47/0xb0
> >>>>>>     p9_virtio_probe+0xa1/0x460 [9pnet_virtio]
> >>>>>>     virtio_dev_probe+0x1ed/0x2e0
> >>>>>>     really_probe+0x1c7/0x400
> >>>>>>     __driver_probe_device+0xa4/0x120
> >>>>>>     driver_probe_device+0x32/0xe0
> >>>>>>     __driver_attach+0xbf/0x130
> >>>>>>     bus_for_each_dev+0xbb/0x110
> >>>>>>     driver_attach+0x27/0x30
> >>>>>>     bus_add_driver+0x1d9/0x270
> >>>>>>     driver_register+0xa9/0x180
> >>>>>>     register_virtio_driver+0x31/0x50
> >>>>>>     p9_virtio_init+0x3c/0x1000 [9pnet_virtio]
> >>>>>>     do_one_initcall+0x7b/0x380
> >>>>>>     do_init_module+0x5f/0x21e
> >>>>>>     load_module+0x265c/0x2c60
> >>>>>>     __do_sys_finit_module+0xb0/0xf0
> >>>>>>     __x64_sys_finit_module+0x1a/0x20
> >>>>>>     do_syscall_64+0x34/0xb0
> >>>>>>     entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>>>>>
> >>>>>> When populating msi_irqs sysfs failed in msi_capability_init() or
> >>>>>> msix_capability_init(), dev->msi_irq_groups will point to ERR_PTR(...).
> >>>>>> This will cause a page fault when destroying the wrong
> >>>>>> dev->msi_irq_groups in free_msi_irqs().
> >>>>>>
> >>>>>> Fix this by setting dev->msi_irq_groups to NULL when msi_populate_sysfs()
> >>>>>> failed.
> >>>>>>
> >>>>>> Fixes: 2f170814bdd2 ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
> >>>>>> Reported-by: Hulk Robot <hulkci@huawei.com>
> >>> What exactly was reported by the Hulk Robot?  Did it really do the
> >>> fault injection and report the page fault?
> >> Yes, it reported the error and provided a way to reproduce it
> > Great, can you include a link to that report then?
> > .
> Currently hulk robot is still in the process of continuous improvement and
> is not open to the public for the time being, so you can not access our
> links at the moment. We will open it in the future when it is perfected.

hi hai, would you like to put some information in the commit log like
if  'msi_attrs = kcalloc() in msi_populate_sysfs()' fails, blah, blah, blah...

It seems this can make things a bit clearer to me. Anyway, it doesn't matter
too much. The fix is correct.

>
> --
> Wang Hai
>
Thanks
barry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed
  2021-10-12  2:39             ` Barry Song
@ 2021-10-12  2:47               ` wanghai (M)
  0 siblings, 0 replies; 9+ messages in thread
From: wanghai (M) @ 2021-10-12  2:47 UTC (permalink / raw)
  To: Barry Song
  Cc: Bjorn Helgaas, Bjorn Helgaas, Marc Zyngier, Thomas Gleixner,
	Barry Song, Greg Kroah-Hartman, linux-pci, LKML


在 2021/10/12 10:39, Barry Song 写道:
> On Tue, Oct 12, 2021 at 3:25 PM wanghai (M) <wanghai38@huawei.com> wrote:
>>
>> 在 2021/10/12 10:09, Bjorn Helgaas 写道:
>>> On Tue, Oct 12, 2021 at 09:59:40AM +0800, wanghai (M) wrote:
>>>> 在 2021/10/12 1:11, Bjorn Helgaas 写道:
>>>>> For v2, please note "git log --oneline drivers/pci/msi.c" and make
>>>>> your patch follow the style, including capitalization.
>>>>>
>>>>> On Mon, Oct 11, 2021 at 05:15:28PM +0800, wanghai (M) wrote:
>>>>>> 在 2021/10/11 16:52, Barry Song 写道:
>>>>>>> On Mon, Oct 11, 2021 at 9:24 PM Wang Hai <wanghai38@huawei.com> wrote:
>>>>>>>> I got a page fault report when doing fault injection test:
>>>>> When you send v2, can you include information about how you injected
>>>>> the fault?  If it's easy, others can reproduce the failure that way.
>>>> Sorry, the reproduction needs to be based on the fault injection framework
>>>> provided by Hulk Robot. I don't know how the framework is implemented.
>>>>
>>>> The way to reproduce this is to do a fault injection to make
>>>> 'msi_attrs = kcalloc() in msi_populate_sysfs()' fail when insmod
>>>> 9pnet_virtio.ko.
>>>>
>>>> I sent v2 yesterday, can you help review it?
>>>> https://lore.kernel.org/linux-pci/20211011130837.766323-1-wanghai38@huawei.com/
>>>>>>>> BUG: unable to handle page fault for address: fffffffffffffff4
>>>>>>>> ...
>>>>>>>> RIP: 0010:sysfs_remove_groups+0x25/0x60
>>>>>>>> ...
>>>>>>>> Call Trace:
>>>>>>>>      msi_destroy_sysfs+0x30/0xa0
>>>>>>>>      free_msi_irqs+0x11d/0x1b0
>>>>>>>>      __pci_enable_msix_range+0x67f/0x760
>>>>>>>>      pci_alloc_irq_vectors_affinity+0xe7/0x170
>>>>>>>>      vp_find_vqs_msix+0x129/0x560
>>>>>>>>      vp_find_vqs+0x52/0x230
>>>>>>>>      vp_modern_find_vqs+0x47/0xb0
>>>>>>>>      p9_virtio_probe+0xa1/0x460 [9pnet_virtio]
>>>>>>>>      virtio_dev_probe+0x1ed/0x2e0
>>>>>>>>      really_probe+0x1c7/0x400
>>>>>>>>      __driver_probe_device+0xa4/0x120
>>>>>>>>      driver_probe_device+0x32/0xe0
>>>>>>>>      __driver_attach+0xbf/0x130
>>>>>>>>      bus_for_each_dev+0xbb/0x110
>>>>>>>>      driver_attach+0x27/0x30
>>>>>>>>      bus_add_driver+0x1d9/0x270
>>>>>>>>      driver_register+0xa9/0x180
>>>>>>>>      register_virtio_driver+0x31/0x50
>>>>>>>>      p9_virtio_init+0x3c/0x1000 [9pnet_virtio]
>>>>>>>>      do_one_initcall+0x7b/0x380
>>>>>>>>      do_init_module+0x5f/0x21e
>>>>>>>>      load_module+0x265c/0x2c60
>>>>>>>>      __do_sys_finit_module+0xb0/0xf0
>>>>>>>>      __x64_sys_finit_module+0x1a/0x20
>>>>>>>>      do_syscall_64+0x34/0xb0
>>>>>>>>      entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>>>>>
>>>>>>>> When populating msi_irqs sysfs failed in msi_capability_init() or
>>>>>>>> msix_capability_init(), dev->msi_irq_groups will point to ERR_PTR(...).
>>>>>>>> This will cause a page fault when destroying the wrong
>>>>>>>> dev->msi_irq_groups in free_msi_irqs().
>>>>>>>>
>>>>>>>> Fix this by setting dev->msi_irq_groups to NULL when msi_populate_sysfs()
>>>>>>>> failed.
>>>>>>>>
>>>>>>>> Fixes: 2f170814bdd2 ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
>>>>>>>> Reported-by: Hulk Robot <hulkci@huawei.com>
>>>>> What exactly was reported by the Hulk Robot?  Did it really do the
>>>>> fault injection and report the page fault?
>>>> Yes, it reported the error and provided a way to reproduce it
>>> Great, can you include a link to that report then?
>>> .
>> Currently hulk robot is still in the process of continuous improvement and
>> is not open to the public for the time being, so you can not access our
>> links at the moment. We will open it in the future when it is perfected.
> hi hai, would you like to put some information in the commit log like
> if  'msi_attrs = kcalloc() in msi_populate_sysfs()' fails, blah, blah, blah...
>
> It seems this can make things a bit clearer to me. Anyway, it doesn't matter
> too much. The fix is correct.
Okay, I'll refine the commit log and resend a patch
>> --
>> Wang Hai
>>
> Thanks
> barry
> .
>
-- 
Wang Hai


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-10-12  2:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-11  8:40 [PATCH] PCI/MSI: fix page fault when msi_populate_sysfs() failed Wang Hai
2021-10-11  8:52 ` Barry Song
2021-10-11  9:15   ` wanghai (M)
2021-10-11 17:11     ` Bjorn Helgaas
2021-10-12  1:59       ` wanghai (M)
2021-10-12  2:09         ` Bjorn Helgaas
2021-10-12  2:25           ` wanghai (M)
2021-10-12  2:39             ` Barry Song
2021-10-12  2:47               ` wanghai (M)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.