linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] msi: fix kobject/sysfs removal from msi_list
@ 2013-09-17  1:47 Veaceslav Falico
  2013-09-17  1:47 ` [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs() Veaceslav Falico
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-17  1:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Bjorn Helgaas, linux-pci, Veaceslav Falico

Currently, while removing msi_list's ->kobj, we just do kobject_put() on it
and after that free the entry itself. However, kobject_put() doesn't
guarantee that the kobject itself is freed - it can be used by someone else
and thus, when we'll free the entry, we'll free the kobject too - leading
to bugs in the other users (or when we'll finally release it).

Also, in some cases we might fail to register the kobjects, but we forget
to remove pdev->msi_kset, and this can lead to errors if we try to
re-register it - cause we already have that kset initialized.

Fix both issues by moving msi_kset/kobject deinitialization code completely
to free_msi_irqs(), which is called every time we fail and need to roll
back (and on the proper device irqs deinit). Also, move kfree-ing of the
msi_list entry to kobject->release (msi_kobj_release()), so that the entry
containing kobject will only be delisted in free_msi_irqs(), and free only
when there are no other users of its kobject.

CC: Bjorn Helgaas <bhelgaas@google.com>
CC: linux-pci@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>

---
 drivers/pci/msi.c | 38 +++++++++++++++-----------------------
 1 file changed, 15 insertions(+), 23 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-17  1:47 [PATCH 0/3] msi: fix kobject/sysfs removal from msi_list Veaceslav Falico
@ 2013-09-17  1:47 ` Veaceslav Falico
  2013-09-25 21:08   ` Bjorn Helgaas
  2013-09-17  1:47 ` [PATCH 2/3] msi: always unregister ->msi_kset within free_msi_irqs() Veaceslav Falico
  2013-09-17  1:47 ` [PATCH 3/3] msi: free msi_desc entry only after we've released the kobject Veaceslav Falico
  2 siblings, 1 reply; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-17  1:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Veaceslav Falico, Bjorn Helgaas, linux-pci

Before trying to kobject_init_and_add(), we add a reference to pdev via
pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
don't return it back - even on out_unroll.

Fix this by adding pci_dev_put(pdev) before going to unrolling section.

CC: Bjorn Helgaas <bhelgaas@google.com>
CC: linux-pci@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/pci/msi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index d5f90d6..14bf578 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
 		pci_dev_get(pdev);
 		ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
 				     "%u", entry->irq);
-		if (ret)
+		if (ret) {
+			pci_dev_put(pdev);
 			goto out_unroll;
+		}
 
 		count++;
 	}
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] msi: always unregister ->msi_kset within free_msi_irqs()
  2013-09-17  1:47 [PATCH 0/3] msi: fix kobject/sysfs removal from msi_list Veaceslav Falico
  2013-09-17  1:47 ` [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs() Veaceslav Falico
@ 2013-09-17  1:47 ` Veaceslav Falico
  2013-09-17  1:47 ` [PATCH 3/3] msi: free msi_desc entry only after we've released the kobject Veaceslav Falico
  2 siblings, 0 replies; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-17  1:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Veaceslav Falico, Bjorn Helgaas, linux-pci

Currently we create and populate ->msi_kset while allocating irqs in
populate_msi_sysfs(), however if it fails and/or we want to free the
entries - we don't always remove it, and we might have problems if we've
failed to allocate irqs and try it again.

To fix that, move the unregister part to free_msi_irqs() and remove already
existing ones.

CC: Bjorn Helgaas <bhelgaas@google.com>
CC: linux-pci@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/pci/msi.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 14bf578..68da921 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -388,6 +388,9 @@ static void free_msi_irqs(struct pci_dev *dev)
 		list_del(&entry->list);
 		kfree(entry);
 	}
+
+	kset_unregister(dev->msi_kset);
+	dev->msi_kset = NULL;
 }
 
 static struct msi_desc *alloc_msi_entry(struct pci_dev *dev)
@@ -917,8 +920,6 @@ void pci_disable_msi(struct pci_dev *dev)
 
 	pci_msi_shutdown(dev);
 	free_msi_irqs(dev);
-	kset_unregister(dev->msi_kset);
-	dev->msi_kset = NULL;
 }
 EXPORT_SYMBOL(pci_disable_msi);
 
@@ -1015,8 +1016,6 @@ void pci_disable_msix(struct pci_dev *dev)
 
 	pci_msix_shutdown(dev);
 	free_msi_irqs(dev);
-	kset_unregister(dev->msi_kset);
-	dev->msi_kset = NULL;
 }
 EXPORT_SYMBOL(pci_disable_msix);
 
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] msi: free msi_desc entry only after we've released the kobject
  2013-09-17  1:47 [PATCH 0/3] msi: fix kobject/sysfs removal from msi_list Veaceslav Falico
  2013-09-17  1:47 ` [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs() Veaceslav Falico
  2013-09-17  1:47 ` [PATCH 2/3] msi: always unregister ->msi_kset within free_msi_irqs() Veaceslav Falico
@ 2013-09-17  1:47 ` Veaceslav Falico
  2013-09-25 21:34   ` Bjorn Helgaas
  2 siblings, 1 reply; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-17  1:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Veaceslav Falico, Bjorn Helgaas, linux-pci

Currently, we first do kobject_put(&entry->kobj) and the kfree(entry),
however kobject_put() doesn't guarantee us that it was the last reference
and that the kobj isn't used currently by someone else, so after we
kfree(entry) with the struct kobject - other users will begin using the
freed memory, instead of the actual kobject.

Fix this by using the kobject->release callback, which is called last when
the kobject is indeed not used and is cleaned up - it's msi_kobj_release(),
which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the
kobj itself after ->release() was called, so we're safe).

In case we've failed to create the sysfs directories - just kfree()
it - cause we don't have the kobjects attached.

Also, remove the same functionality from populate_msi_sysfs(), cause on
failure we anyway call free_msi_irqs(), which will take care of all the
kobjects properly.

CC: Bjorn Helgaas <bhelgaas@google.com>
CC: linux-pci@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/pci/msi.c | 27 +++++++++------------------
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 68da921..c9236e4 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -374,19 +374,22 @@ static void free_msi_irqs(struct pci_dev *dev)
 				iounmap(entry->mask_base);
 		}
 
+		list_del(&entry->list);
+
 		/*
 		 * Its possible that we get into this path
 		 * When populate_msi_sysfs fails, which means the entries
 		 * were not registered with sysfs.  In that case don't
-		 * unregister them.
+		 * unregister them, and just free. Otherwise the
+		 * kobject->release will take care of freeing the entry via
+		 * msi_kobj_release().
 		 */
 		if (entry->kobj.parent) {
 			kobject_del(&entry->kobj);
 			kobject_put(&entry->kobj);
+		} else {
+			kfree(entry);
 		}
-
-		list_del(&entry->list);
-		kfree(entry);
 	}
 
 	kset_unregister(dev->msi_kset);
@@ -512,6 +515,7 @@ static void msi_kobj_release(struct kobject *kobj)
 	struct msi_desc *entry = to_msi_desc(kobj);
 
 	pci_dev_put(entry->dev);
+	kfree(entry);
 }
 
 static struct kobj_type msi_irq_ktype = {
@@ -525,7 +529,6 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
 	struct msi_desc *entry;
 	struct kobject *kobj;
 	int ret;
-	int count = 0;
 
 	pdev->msi_kset = kset_create_and_add("msi_irqs", NULL, &pdev->dev.kobj);
 	if (!pdev->msi_kset)
@@ -539,23 +542,11 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
 				     "%u", entry->irq);
 		if (ret) {
 			pci_dev_put(pdev);
-			goto out_unroll;
+			return ret;
 		}
-
-		count++;
 	}
 
 	return 0;
-
-out_unroll:
-	list_for_each_entry(entry, &pdev->msi_list, list) {
-		if (!count)
-			break;
-		kobject_del(&entry->kobj);
-		kobject_put(&entry->kobj);
-		count--;
-	}
-	return ret;
 }
 
 /**
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-17  1:47 ` [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs() Veaceslav Falico
@ 2013-09-25 21:08   ` Bjorn Helgaas
  2013-09-25 21:30     ` Bjorn Helgaas
  2013-09-25 23:23     ` Neil Horman
  0 siblings, 2 replies; 17+ messages in thread
From: Bjorn Helgaas @ 2013-09-25 21:08 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: linux-kernel, linux-pci, Neil Horman, Greg Kroah-Hartman

[+cc Neil (he added this code in da8d1c8ba4), Greg]

On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
> Before trying to kobject_init_and_add(), we add a reference to pdev via
> pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
> don't return it back - even on out_unroll.
>
> Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>
> CC: Bjorn Helgaas <bhelgaas@google.com>
> CC: linux-pci@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
>  drivers/pci/msi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index d5f90d6..14bf578 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>                 pci_dev_get(pdev);
>                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>                                      "%u", entry->irq);
> -               if (ret)
> +               if (ret) {
> +                       pci_dev_put(pdev);
>                         goto out_unroll;
> +               }
>
>                 count++;
>         }

I don't understand why this code does the pci_dev_get() in the first
place.  The pdev->msi_list of msi_desc structs is private to the
pci_dev, and even without bumping the refcount, there should be no way
for the pci_dev to be freed before the msi_desc.

I also don't understand this nearby code (the same pattern appears in
free_msi_irqs()):

    out_unroll:
        list_for_each_entry(entry, &pdev->msi_list, list) {
                if (!count)
                        break;
                kobject_del(&entry->kobj);
                kobject_put(&entry->kobj);
                count--;
        }

Why do we call kobject_del() here?  The kobject_put() will call
kobject_del() anyway, so it looks redundant.
Documentation/kobject.txt says kobject_del() must be called explicitly
to break a circular reference, but I don't think we have that here.

Also, I think it is incorrect that free_msi_irqs() does this:

                if (entry->kobj.parent) {
                        kobject_del(&entry->kobj);
                        kobject_put(&entry->kobj);
                }

                list_del(&entry->list);
                kfree(entry);

I think the "kfree(entry)" should be in msi_kobj_release() instead.

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-25 21:08   ` Bjorn Helgaas
@ 2013-09-25 21:30     ` Bjorn Helgaas
  2013-09-25 22:09       ` Veaceslav Falico
  2013-09-25 23:23     ` Neil Horman
  1 sibling, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2013-09-25 21:30 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: linux-kernel, linux-pci, Neil Horman, Greg Kroah-Hartman

On Wed, Sep 25, 2013 at 3:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Also, I think it is incorrect that free_msi_irqs() does this:
>
>                 if (entry->kobj.parent) {
>                         kobject_del(&entry->kobj);
>                         kobject_put(&entry->kobj);
>                 }
>
>                 list_del(&entry->list);
>                 kfree(entry);
>
> I think the "kfree(entry)" should be in msi_kobj_release() instead.

Oh, I see you fixed this in patch 3/3.  I hadn't read that far yet :)

Did you find these problems by inspection, or is there some easy way
to trigger bad behavior?  Just wondering if this is some way I can
reproduce the problem.

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] msi: free msi_desc entry only after we've released the kobject
  2013-09-17  1:47 ` [PATCH 3/3] msi: free msi_desc entry only after we've released the kobject Veaceslav Falico
@ 2013-09-25 21:34   ` Bjorn Helgaas
  2013-09-25 22:12     ` Veaceslav Falico
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2013-09-25 21:34 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: linux-kernel, linux-pci

On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
> Currently, we first do kobject_put(&entry->kobj) and the kfree(entry),
> however kobject_put() doesn't guarantee us that it was the last reference
> and that the kobj isn't used currently by someone else, so after we
> kfree(entry) with the struct kobject - other users will begin using the
> freed memory, instead of the actual kobject.
>
> Fix this by using the kobject->release callback, which is called last when
> the kobject is indeed not used and is cleaned up - it's msi_kobj_release(),
> which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the
> kobj itself after ->release() was called, so we're safe).
>
> In case we've failed to create the sysfs directories - just kfree()
> it - cause we don't have the kobjects attached.
>
> Also, remove the same functionality from populate_msi_sysfs(), cause on
> failure we anyway call free_msi_irqs(), which will take care of all the
> kobjects properly.

I agree; it looks like populate_msi_sysfs() doesn't need to have the
cleanup in it.  Can you split that into a separate patch, since I
don't think it depends on the kfree() fix?

Bjorn

> CC: Bjorn Helgaas <bhelgaas@google.com>
> CC: linux-pci@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
>  drivers/pci/msi.c | 27 +++++++++------------------
>  1 file changed, 9 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 68da921..c9236e4 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -374,19 +374,22 @@ static void free_msi_irqs(struct pci_dev *dev)
>                                 iounmap(entry->mask_base);
>                 }
>
> +               list_del(&entry->list);
> +
>                 /*
>                  * Its possible that we get into this path
>                  * When populate_msi_sysfs fails, which means the entries
>                  * were not registered with sysfs.  In that case don't
> -                * unregister them.
> +                * unregister them, and just free. Otherwise the
> +                * kobject->release will take care of freeing the entry via
> +                * msi_kobj_release().
>                  */
>                 if (entry->kobj.parent) {
>                         kobject_del(&entry->kobj);
>                         kobject_put(&entry->kobj);
> +               } else {
> +                       kfree(entry);
>                 }
> -
> -               list_del(&entry->list);
> -               kfree(entry);
>         }
>
>         kset_unregister(dev->msi_kset);
> @@ -512,6 +515,7 @@ static void msi_kobj_release(struct kobject *kobj)
>         struct msi_desc *entry = to_msi_desc(kobj);
>
>         pci_dev_put(entry->dev);
> +       kfree(entry);
>  }
>
>  static struct kobj_type msi_irq_ktype = {
> @@ -525,7 +529,6 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>         struct msi_desc *entry;
>         struct kobject *kobj;
>         int ret;
> -       int count = 0;
>
>         pdev->msi_kset = kset_create_and_add("msi_irqs", NULL, &pdev->dev.kobj);
>         if (!pdev->msi_kset)
> @@ -539,23 +542,11 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>                                      "%u", entry->irq);
>                 if (ret) {
>                         pci_dev_put(pdev);
> -                       goto out_unroll;
> +                       return ret;
>                 }
> -
> -               count++;
>         }
>
>         return 0;
> -
> -out_unroll:
> -       list_for_each_entry(entry, &pdev->msi_list, list) {
> -               if (!count)
> -                       break;
> -               kobject_del(&entry->kobj);
> -               kobject_put(&entry->kobj);
> -               count--;
> -       }
> -       return ret;
>  }
>
>  /**
> --
> 1.8.4
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-25 21:30     ` Bjorn Helgaas
@ 2013-09-25 22:09       ` Veaceslav Falico
  0 siblings, 0 replies; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-25 22:09 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-kernel, linux-pci, Neil Horman, Greg Kroah-Hartman

On Wed, Sep 25, 2013 at 03:30:14PM -0600, Bjorn Helgaas wrote:
>On Wed, Sep 25, 2013 at 3:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Also, I think it is incorrect that free_msi_irqs() does this:
>>
>>                 if (entry->kobj.parent) {
>>                         kobject_del(&entry->kobj);
>>                         kobject_put(&entry->kobj);
>>                 }
>>
>>                 list_del(&entry->list);
>>                 kfree(entry);
>>
>> I think the "kfree(entry)" should be in msi_kobj_release() instead.
>
>Oh, I see you fixed this in patch 3/3.  I hadn't read that far yet :)
>
>Did you find these problems by inspection, or is there some easy way
>to trigger bad behavior?  Just wondering if this is some way I can
>reproduce the problem.

Hi,

I've first found it by building with CONFIG_DEBUG_KOBJECT_RELEASE and
CONFIG_DEBUG_OBJECTS - it shows that it's freeing an object in an active
state (I'm just running insmod/rmmod tg3 concurently, but I guess it's
reproducible with any driver that uses msi/x).

Without CONFIG_DEBUG_OBJECTS it's also reproducible, and without
CONFIG_DEBUG_KOBJECT_RELEASE it's really hard to reproduce, but still
reproducible (I've hit it with tg3 as a slave of bonding and concurently
running rmmod bonding/ifup bond0 - though it's really hard). It just panics
in kobject_put(), iirc.

So, with those CONFIG_DEBUG_* it's easily triggerable, without - quite
hard.

Hope that helps.

p.s. I'll adress your other comments tomorrow already, it's quite late here
and I don't want to do something stupid now :).

Thanks a lot!

>
>Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] msi: free msi_desc entry only after we've released the kobject
  2013-09-25 21:34   ` Bjorn Helgaas
@ 2013-09-25 22:12     ` Veaceslav Falico
  0 siblings, 0 replies; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-25 22:12 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-kernel, linux-pci

On Wed, Sep 25, 2013 at 03:34:58PM -0600, Bjorn Helgaas wrote:
>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>> Currently, we first do kobject_put(&entry->kobj) and the kfree(entry),
>> however kobject_put() doesn't guarantee us that it was the last reference
>> and that the kobj isn't used currently by someone else, so after we
>> kfree(entry) with the struct kobject - other users will begin using the
>> freed memory, instead of the actual kobject.
>>
>> Fix this by using the kobject->release callback, which is called last when
>> the kobject is indeed not used and is cleaned up - it's msi_kobj_release(),
>> which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the
>> kobj itself after ->release() was called, so we're safe).
>>
>> In case we've failed to create the sysfs directories - just kfree()
>> it - cause we don't have the kobjects attached.
>>
>> Also, remove the same functionality from populate_msi_sysfs(), cause on
>> failure we anyway call free_msi_irqs(), which will take care of all the
>> kobjects properly.
>
>I agree; it looks like populate_msi_sysfs() doesn't need to have the
>cleanup in it.  Can you split that into a separate patch, since I
>don't think it depends on the kfree() fix?

Yep, will do and re-send in two patchsets.

Thank you!

>
>Bjorn
>
>> CC: Bjorn Helgaas <bhelgaas@google.com>
>> CC: linux-pci@vger.kernel.org
>> CC: linux-kernel@vger.kernel.org
>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>> ---
>>  drivers/pci/msi.c | 27 +++++++++------------------
>>  1 file changed, 9 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index 68da921..c9236e4 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -374,19 +374,22 @@ static void free_msi_irqs(struct pci_dev *dev)
>>                                 iounmap(entry->mask_base);
>>                 }
>>
>> +               list_del(&entry->list);
>> +
>>                 /*
>>                  * Its possible that we get into this path
>>                  * When populate_msi_sysfs fails, which means the entries
>>                  * were not registered with sysfs.  In that case don't
>> -                * unregister them.
>> +                * unregister them, and just free. Otherwise the
>> +                * kobject->release will take care of freeing the entry via
>> +                * msi_kobj_release().
>>                  */
>>                 if (entry->kobj.parent) {
>>                         kobject_del(&entry->kobj);
>>                         kobject_put(&entry->kobj);
>> +               } else {
>> +                       kfree(entry);
>>                 }
>> -
>> -               list_del(&entry->list);
>> -               kfree(entry);
>>         }
>>
>>         kset_unregister(dev->msi_kset);
>> @@ -512,6 +515,7 @@ static void msi_kobj_release(struct kobject *kobj)
>>         struct msi_desc *entry = to_msi_desc(kobj);
>>
>>         pci_dev_put(entry->dev);
>> +       kfree(entry);
>>  }
>>
>>  static struct kobj_type msi_irq_ktype = {
>> @@ -525,7 +529,6 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>>         struct msi_desc *entry;
>>         struct kobject *kobj;
>>         int ret;
>> -       int count = 0;
>>
>>         pdev->msi_kset = kset_create_and_add("msi_irqs", NULL, &pdev->dev.kobj);
>>         if (!pdev->msi_kset)
>> @@ -539,23 +542,11 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>>                                      "%u", entry->irq);
>>                 if (ret) {
>>                         pci_dev_put(pdev);
>> -                       goto out_unroll;
>> +                       return ret;
>>                 }
>> -
>> -               count++;
>>         }
>>
>>         return 0;
>> -
>> -out_unroll:
>> -       list_for_each_entry(entry, &pdev->msi_list, list) {
>> -               if (!count)
>> -                       break;
>> -               kobject_del(&entry->kobj);
>> -               kobject_put(&entry->kobj);
>> -               count--;
>> -       }
>> -       return ret;
>>  }
>>
>>  /**
>> --
>> 1.8.4
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-25 21:08   ` Bjorn Helgaas
  2013-09-25 21:30     ` Bjorn Helgaas
@ 2013-09-25 23:23     ` Neil Horman
  2013-09-25 23:35       ` Bjorn Helgaas
  1 sibling, 1 reply; 17+ messages in thread
From: Neil Horman @ 2013-09-25 23:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Veaceslav Falico, linux-kernel, linux-pci, Greg Kroah-Hartman

On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
> [+cc Neil (he added this code in da8d1c8ba4), Greg]
> 
> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
> > Before trying to kobject_init_and_add(), we add a reference to pdev via
> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
> > don't return it back - even on out_unroll.
> >
> > Fix this by adding pci_dev_put(pdev) before going to unrolling section.
> >
> > CC: Bjorn Helgaas <bhelgaas@google.com>
> > CC: linux-pci@vger.kernel.org
> > CC: linux-kernel@vger.kernel.org
> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> > ---
> >  drivers/pci/msi.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index d5f90d6..14bf578 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
> >                 pci_dev_get(pdev);
> >                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
> >                                      "%u", entry->irq);
> > -               if (ret)
> > +               if (ret) {
> > +                       pci_dev_put(pdev);
> >                         goto out_unroll;
> > +               }
> >
> >                 count++;
> >         }
> 
> I don't understand why this code does the pci_dev_get() in the first
> place.  The pdev->msi_list of msi_desc structs is private to the
> pci_dev, and even without bumping the refcount, there should be no way
> for the pci_dev to be freed before the msi_desc.
> 
Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
people didn't try to remove the device prior to freeing all their interrupts
(i.e I didn't want a broken driver to go through its remove routine without
freeing all its irqs).  That might have been the wrong thing to do, but thats
what bubbles to the front of my head when looking at this.

> I also don't understand this nearby code (the same pattern appears in
> free_msi_irqs()):
> 
>     out_unroll:
>         list_for_each_entry(entry, &pdev->msi_list, list) {
>                 if (!count)
>                         break;
>                 kobject_del(&entry->kobj);
>                 kobject_put(&entry->kobj);
>                 count--;
>         }
> 
> Why do we call kobject_del() here?  The kobject_put() will call
> kobject_del() anyway, so it looks redundant.
> Documentation/kobject.txt says kobject_del() must be called explicitly
> to break a circular reference, but I don't think we have that here.
> 
 I think thats exactly why I did it, because of the documentation.  I agree
however, it does look redundant.  Harmless, but redundant.

> Also, I think it is incorrect that free_msi_irqs() does this:
> 
>                 if (entry->kobj.parent) {
>                         kobject_del(&entry->kobj);
>                         kobject_put(&entry->kobj);
>                 }
> 
>                 list_del(&entry->list);
>                 kfree(entry);
> 
> I think the "kfree(entry)" should be in msi_kobj_release() instead.
> 
As far as this change goes, I think it looks ok

Acked-by: Neil Horman <nhorman@tuxdriver.com>


> Bjorn
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-25 23:23     ` Neil Horman
@ 2013-09-25 23:35       ` Bjorn Helgaas
  2013-09-26  9:27         ` Veaceslav Falico
  2013-09-26 12:25         ` Veaceslav Falico
  0 siblings, 2 replies; 17+ messages in thread
From: Bjorn Helgaas @ 2013-09-25 23:35 UTC (permalink / raw)
  To: Neil Horman; +Cc: Veaceslav Falico, linux-kernel, linux-pci, Greg Kroah-Hartman

On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>> [+cc Neil (he added this code in da8d1c8ba4), Greg]
>>
>> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>> > Before trying to kobject_init_and_add(), we add a reference to pdev via
>> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
>> > don't return it back - even on out_unroll.
>> >
>> > Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>> >
>> > CC: Bjorn Helgaas <bhelgaas@google.com>
>> > CC: linux-pci@vger.kernel.org
>> > CC: linux-kernel@vger.kernel.org
>> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>> > ---
>> >  drivers/pci/msi.c | 4 +++-
>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> > index d5f90d6..14bf578 100644
>> > --- a/drivers/pci/msi.c
>> > +++ b/drivers/pci/msi.c
>> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>> >                 pci_dev_get(pdev);
>> >                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>> >                                      "%u", entry->irq);
>> > -               if (ret)
>> > +               if (ret) {
>> > +                       pci_dev_put(pdev);
>> >                         goto out_unroll;
>> > +               }
>> >
>> >                 count++;
>> >         }
>>
>> I don't understand why this code does the pci_dev_get() in the first
>> place.  The pdev->msi_list of msi_desc structs is private to the
>> pci_dev, and even without bumping the refcount, there should be no way
>> for the pci_dev to be freed before the msi_desc.
>>
> Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
> people didn't try to remove the device prior to freeing all their interrupts
> (i.e I didn't want a broken driver to go through its remove routine without
> freeing all its irqs).  That might have been the wrong thing to do, but thats
> what bubbles to the front of my head when looking at this.

That sounds plausible, but I think I'd rather deal with that by having
the PCI core remove logic free all the interrupts.  I *think* that's
already in place, i.e., pci_free_resources() calls
msi_remove_pci_irq_vectors().  So I propose that we remove the
pci_dev_get()/put() unless we come up with a more compelling reason
for it.

>> I also don't understand this nearby code (the same pattern appears in
>> free_msi_irqs()):
>>
>>     out_unroll:
>>         list_for_each_entry(entry, &pdev->msi_list, list) {
>>                 if (!count)
>>                         break;
>>                 kobject_del(&entry->kobj);
>>                 kobject_put(&entry->kobj);
>>                 count--;
>>         }
>>
>> Why do we call kobject_del() here?  The kobject_put() will call
>> kobject_del() anyway, so it looks redundant.
>> Documentation/kobject.txt says kobject_del() must be called explicitly
>> to break a circular reference, but I don't think we have that here.
>>
>  I think thats exactly why I did it, because of the documentation.  I agree
> however, it does look redundant.  Harmless, but redundant.

OK, thanks.  I think we should remove it on the grounds that it's not
needed and removing it will make this code look more similar to other
callers of kobject_init_and_add(), which means bugs will have fewer
places to hide.

Thanks, Neil!

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-25 23:35       ` Bjorn Helgaas
@ 2013-09-26  9:27         ` Veaceslav Falico
  2013-09-26 12:25         ` Veaceslav Falico
  1 sibling, 0 replies; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-26  9:27 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Neil Horman, linux-kernel, linux-pci, Greg Kroah-Hartman

On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>> On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>>> [+cc Neil (he added this code in da8d1c8ba4), Greg]
>>>
>>> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>>> > Before trying to kobject_init_and_add(), we add a reference to pdev via
>>> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
>>> > don't return it back - even on out_unroll.
>>> >
>>> > Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>>> >
>>> > CC: Bjorn Helgaas <bhelgaas@google.com>
>>> > CC: linux-pci@vger.kernel.org
>>> > CC: linux-kernel@vger.kernel.org
>>> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>>> > ---
>>> >  drivers/pci/msi.c | 4 +++-
>>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>>> >
>>> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>>> > index d5f90d6..14bf578 100644
>>> > --- a/drivers/pci/msi.c
>>> > +++ b/drivers/pci/msi.c
>>> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>>> >                 pci_dev_get(pdev);
>>> >                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>>> >                                      "%u", entry->irq);
>>> > -               if (ret)
>>> > +               if (ret) {
>>> > +                       pci_dev_put(pdev);
>>> >                         goto out_unroll;
>>> > +               }
>>> >
>>> >                 count++;
>>> >         }
>>>
>>> I don't understand why this code does the pci_dev_get() in the first
>>> place.  The pdev->msi_list of msi_desc structs is private to the
>>> pci_dev, and even without bumping the refcount, there should be no way
>>> for the pci_dev to be freed before the msi_desc.
>>>
>> Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
>> people didn't try to remove the device prior to freeing all their interrupts
>> (i.e I didn't want a broken driver to go through its remove routine without
>> freeing all its irqs).  That might have been the wrong thing to do, but thats
>> what bubbles to the front of my head when looking at this.
>
>That sounds plausible, but I think I'd rather deal with that by having
>the PCI core remove logic free all the interrupts.  I *think* that's
>already in place, i.e., pci_free_resources() calls
>msi_remove_pci_irq_vectors().  So I propose that we remove the
>pci_dev_get()/put() unless we come up with a more compelling reason
>for it.
>
>>> I also don't understand this nearby code (the same pattern appears in
>>> free_msi_irqs()):
>>>
>>>     out_unroll:
>>>         list_for_each_entry(entry, &pdev->msi_list, list) {
>>>                 if (!count)
>>>                         break;
>>>                 kobject_del(&entry->kobj);
>>>                 kobject_put(&entry->kobj);
>>>                 count--;
>>>         }
>>>
>>> Why do we call kobject_del() here?  The kobject_put() will call
>>> kobject_del() anyway, so it looks redundant.
>>> Documentation/kobject.txt says kobject_del() must be called explicitly
>>> to break a circular reference, but I don't think we have that here.
>>>
>>  I think thats exactly why I did it, because of the documentation.  I agree
>> however, it does look redundant.  Harmless, but redundant.
>
>OK, thanks.  I think we should remove it on the grounds that it's not
>needed and removing it will make this code look more similar to other
>callers of kobject_init_and_add(), which means bugs will have fewer
>places to hide.
Hi,

Sounds great, I'll add this as new patches and send this in a separate
patchset with the unregister msi_kset in free_msi_irqs patch, so that it
won't depend on the fix (both removing kobject_del() and
pci_dev_get/put()).

Thank you!

>
>Thanks, Neil!
>
>Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-25 23:35       ` Bjorn Helgaas
  2013-09-26  9:27         ` Veaceslav Falico
@ 2013-09-26 12:25         ` Veaceslav Falico
  2013-09-26 14:07           ` Veaceslav Falico
  2013-09-26 14:40           ` Neil Horman
  1 sibling, 2 replies; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-26 12:25 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Neil Horman, linux-kernel, linux-pci, Greg Kroah-Hartman

On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>> On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>>> [+cc Neil (he added this code in da8d1c8ba4), Greg]
>>>
>>> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>>> > Before trying to kobject_init_and_add(), we add a reference to pdev via
>>> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
>>> > don't return it back - even on out_unroll.
>>> >
>>> > Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>>> >
>>> > CC: Bjorn Helgaas <bhelgaas@google.com>
>>> > CC: linux-pci@vger.kernel.org
>>> > CC: linux-kernel@vger.kernel.org
>>> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>>> > ---
>>> >  drivers/pci/msi.c | 4 +++-
>>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>>> >
>>> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>>> > index d5f90d6..14bf578 100644
>>> > --- a/drivers/pci/msi.c
>>> > +++ b/drivers/pci/msi.c
>>> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>>> >                 pci_dev_get(pdev);
>>> >                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>>> >                                      "%u", entry->irq);
>>> > -               if (ret)
>>> > +               if (ret) {
>>> > +                       pci_dev_put(pdev);
>>> >                         goto out_unroll;
>>> > +               }
>>> >
>>> >                 count++;
>>> >         }
>>>
>>> I don't understand why this code does the pci_dev_get() in the first
>>> place.  The pdev->msi_list of msi_desc structs is private to the
>>> pci_dev, and even without bumping the refcount, there should be no way
>>> for the pci_dev to be freed before the msi_desc.
>>>
>> Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
>> people didn't try to remove the device prior to freeing all their interrupts
>> (i.e I didn't want a broken driver to go through its remove routine without
>> freeing all its irqs).  That might have been the wrong thing to do, but thats
>> what bubbles to the front of my head when looking at this.
>
>That sounds plausible, but I think I'd rather deal with that by having
>the PCI core remove logic free all the interrupts.  I *think* that's
>already in place, i.e., pci_free_resources() calls
>msi_remove_pci_irq_vectors().  So I propose that we remove the
>pci_dev_get()/put() unless we come up with a more compelling reason
>for it.

As an update - I've found an interesting case why exactly that
kobject_del() might be needed:

in kobject_del() it removes instantly the link to kset - via
kobj_kset_leave(), so that our kset remains without links and, thus, might
be instantly removed.

So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
without any links (i.e. other kobjects) and, when we call kset_unregister()
- it exits instantly (if it's not being hold somewhere elsewhere...).

Without it, kset_unregister() will wait till all the kobjects will be gone.

Now, the fun part starts - if we quickly call pci_disable_msi() and,
afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
still there, waiting to unregister, and the sysfs dir is still active.

It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
and are called on enslave/deslave in bonding.

What I get:
[   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
[   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'

I'll take a deeper look at the issue, though any feedback/advise is
welcome. And I'll hold on with the patchset that removes pci_dev_get/put
and kobject_del.


>
>>> I also don't understand this nearby code (the same pattern appears in
>>> free_msi_irqs()):
>>>
>>>     out_unroll:
>>>         list_for_each_entry(entry, &pdev->msi_list, list) {
>>>                 if (!count)
>>>                         break;
>>>                 kobject_del(&entry->kobj);
>>>                 kobject_put(&entry->kobj);
>>>                 count--;
>>>         }
>>>
>>> Why do we call kobject_del() here?  The kobject_put() will call
>>> kobject_del() anyway, so it looks redundant.
>>> Documentation/kobject.txt says kobject_del() must be called explicitly
>>> to break a circular reference, but I don't think we have that here.
>>>
>>  I think thats exactly why I did it, because of the documentation.  I agree
>> however, it does look redundant.  Harmless, but redundant.
>
>OK, thanks.  I think we should remove it on the grounds that it's not
>needed and removing it will make this code look more similar to other
>callers of kobject_init_and_add(), which means bugs will have fewer
>places to hide.
>
>Thanks, Neil!
>
>Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-26 12:25         ` Veaceslav Falico
@ 2013-09-26 14:07           ` Veaceslav Falico
  2013-09-26 22:16             ` Bjorn Helgaas
  2013-09-26 14:40           ` Neil Horman
  1 sibling, 1 reply; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-26 14:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Neil Horman, linux-kernel, linux-pci, Greg Kroah-Hartman

On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote:
>On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
>>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>>>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>>>>[+cc Neil (he added this code in da8d1c8ba4), Greg]
>>>>
>>>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>>>>> Before trying to kobject_init_and_add(), we add a reference to pdev via
>>>>> pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
>>>>> don't return it back - even on out_unroll.
>>>>>
>>>>> Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>>>>>
>>>>> CC: Bjorn Helgaas <bhelgaas@google.com>
>>>>> CC: linux-pci@vger.kernel.org
>>>>> CC: linux-kernel@vger.kernel.org
>>>>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>>>>> ---
>>>>>  drivers/pci/msi.c | 4 +++-
>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>>>>> index d5f90d6..14bf578 100644
>>>>> --- a/drivers/pci/msi.c
>>>>> +++ b/drivers/pci/msi.c
>>>>> @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>>>>>                 pci_dev_get(pdev);
>>>>>                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>>>>>                                      "%u", entry->irq);
>>>>> -               if (ret)
>>>>> +               if (ret) {
>>>>> +                       pci_dev_put(pdev);
>>>>>                         goto out_unroll;
>>>>> +               }
>>>>>
>>>>>                 count++;
>>>>>         }
>>>>
>>>>I don't understand why this code does the pci_dev_get() in the first
>>>>place.  The pdev->msi_list of msi_desc structs is private to the
>>>>pci_dev, and even without bumping the refcount, there should be no way
>>>>for the pci_dev to be freed before the msi_desc.
>>>>
>>>Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
>>>people didn't try to remove the device prior to freeing all their interrupts
>>>(i.e I didn't want a broken driver to go through its remove routine without
>>>freeing all its irqs).  That might have been the wrong thing to do, but thats
>>>what bubbles to the front of my head when looking at this.
>>
>>That sounds plausible, but I think I'd rather deal with that by having
>>the PCI core remove logic free all the interrupts.  I *think* that's
>>already in place, i.e., pci_free_resources() calls
>>msi_remove_pci_irq_vectors().  So I propose that we remove the
>>pci_dev_get()/put() unless we come up with a more compelling reason
>>for it.
>
>As an update - I've found an interesting case why exactly that
>kobject_del() might be needed:
>
>in kobject_del() it removes instantly the link to kset - via
>kobj_kset_leave(), so that our kset remains without links and, thus, might
>be instantly removed.
>
>So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
>without any links (i.e. other kobjects) and, when we call kset_unregister()
>- it exits instantly (if it's not being hold somewhere elsewhere...).
>
>Without it, kset_unregister() will wait till all the kobjects will be gone.
>
>Now, the fun part starts - if we quickly call pci_disable_msi() and,
>afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
>still there, waiting to unregister, and the sysfs dir is still active.
>
>It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
>and are called on enslave/deslave in bonding.
>
>What I get:
>[   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
>[   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'
>
>I'll take a deeper look at the issue, though any feedback/advise is
>welcome. And I'll hold on with the patchset that removes pci_dev_get/put
>and kobject_del.

Ok, this is only reproducible with CONFIG_DEBUG_KOBJECT_RELEASE, with or
without removing kobject_del() (though it's harder to reproduce). I could
not trigger it without CONFIG_DEBUG_KOBJECT_RELEASE, even with constantly
poking /sys/class/net/tg3_device/device/msi_irqs/* . Though it's,
obviously, possible, with and without cleanup and my previous bugfix.

I'll then send now the cleanup, however this theoretical issue was, is and
won't be fixed by it :-/. And I don't know how can we possible fix it
without something like kobject_put_wait().

>
>
>>
>>>>I also don't understand this nearby code (the same pattern appears in
>>>>free_msi_irqs()):
>>>>
>>>>    out_unroll:
>>>>        list_for_each_entry(entry, &pdev->msi_list, list) {
>>>>                if (!count)
>>>>                        break;
>>>>                kobject_del(&entry->kobj);
>>>>                kobject_put(&entry->kobj);
>>>>                count--;
>>>>        }
>>>>
>>>>Why do we call kobject_del() here?  The kobject_put() will call
>>>>kobject_del() anyway, so it looks redundant.
>>>>Documentation/kobject.txt says kobject_del() must be called explicitly
>>>>to break a circular reference, but I don't think we have that here.
>>>>
>>> I think thats exactly why I did it, because of the documentation.  I agree
>>>however, it does look redundant.  Harmless, but redundant.
>>
>>OK, thanks.  I think we should remove it on the grounds that it's not
>>needed and removing it will make this code look more similar to other
>>callers of kobject_init_and_add(), which means bugs will have fewer
>>places to hide.
>>
>>Thanks, Neil!
>>
>>Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-26 12:25         ` Veaceslav Falico
  2013-09-26 14:07           ` Veaceslav Falico
@ 2013-09-26 14:40           ` Neil Horman
  1 sibling, 0 replies; 17+ messages in thread
From: Neil Horman @ 2013-09-26 14:40 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: Bjorn Helgaas, linux-kernel, linux-pci, Greg Kroah-Hartman

On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote:
> On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
> >On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> >>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
> >>>[+cc Neil (he added this code in da8d1c8ba4), Greg]
> >>>
> >>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
> >>>> Before trying to kobject_init_and_add(), we add a reference to pdev via
> >>>> pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
> >>>> don't return it back - even on out_unroll.
> >>>>
> >>>> Fix this by adding pci_dev_put(pdev) before going to unrolling section.
> >>>>
> >>>> CC: Bjorn Helgaas <bhelgaas@google.com>
> >>>> CC: linux-pci@vger.kernel.org
> >>>> CC: linux-kernel@vger.kernel.org
> >>>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> >>>> ---
> >>>>  drivers/pci/msi.c | 4 +++-
> >>>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> >>>> index d5f90d6..14bf578 100644
> >>>> --- a/drivers/pci/msi.c
> >>>> +++ b/drivers/pci/msi.c
> >>>> @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
> >>>>                 pci_dev_get(pdev);
> >>>>                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
> >>>>                                      "%u", entry->irq);
> >>>> -               if (ret)
> >>>> +               if (ret) {
> >>>> +                       pci_dev_put(pdev);
> >>>>                         goto out_unroll;
> >>>> +               }
> >>>>
> >>>>                 count++;
> >>>>         }
> >>>
> >>>I don't understand why this code does the pci_dev_get() in the first
> >>>place.  The pdev->msi_list of msi_desc structs is private to the
> >>>pci_dev, and even without bumping the refcount, there should be no way
> >>>for the pci_dev to be freed before the msi_desc.
> >>>
> >>Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
> >>people didn't try to remove the device prior to freeing all their interrupts
> >>(i.e I didn't want a broken driver to go through its remove routine without
> >>freeing all its irqs).  That might have been the wrong thing to do, but thats
> >>what bubbles to the front of my head when looking at this.
> >
> >That sounds plausible, but I think I'd rather deal with that by having
> >the PCI core remove logic free all the interrupts.  I *think* that's
> >already in place, i.e., pci_free_resources() calls
> >msi_remove_pci_irq_vectors().  So I propose that we remove the
> >pci_dev_get()/put() unless we come up with a more compelling reason
> >for it.
> 
> As an update - I've found an interesting case why exactly that
> kobject_del() might be needed:
> 
> in kobject_del() it removes instantly the link to kset - via
> kobj_kset_leave(), so that our kset remains without links and, thus, might
> be instantly removed.
> 
> So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
> without any links (i.e. other kobjects) and, when we call kset_unregister()
> - it exits instantly (if it's not being hold somewhere elsewhere...).
> 
> Without it, kset_unregister() will wait till all the kobjects will be gone.
> 
> Now, the fun part starts - if we quickly call pci_disable_msi() and,
> afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
> still there, waiting to unregister, and the sysfs dir is still active.
> 
> It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
> and are called on enslave/deslave in bonding.
> 
> What I get:
> [   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
> [   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'
> 
> I'll take a deeper look at the issue, though any feedback/advise is
> welcome. And I'll hold on with the patchset that removes pci_dev_get/put
> and kobject_del.
> 
> 
The origional post may offer some guidance here:
https://lkml.org/lkml/2011/9/29/220

In particular the v3 update I think is relevant.
Neil

> >
> >>>I also don't understand this nearby code (the same pattern appears in
> >>>free_msi_irqs()):
> >>>
> >>>    out_unroll:
> >>>        list_for_each_entry(entry, &pdev->msi_list, list) {
> >>>                if (!count)
> >>>                        break;
> >>>                kobject_del(&entry->kobj);
> >>>                kobject_put(&entry->kobj);
> >>>                count--;
> >>>        }
> >>>
> >>>Why do we call kobject_del() here?  The kobject_put() will call
> >>>kobject_del() anyway, so it looks redundant.
> >>>Documentation/kobject.txt says kobject_del() must be called explicitly
> >>>to break a circular reference, but I don't think we have that here.
> >>>
> >> I think thats exactly why I did it, because of the documentation.  I agree
> >>however, it does look redundant.  Harmless, but redundant.
> >
> >OK, thanks.  I think we should remove it on the grounds that it's not
> >needed and removing it will make this code look more similar to other
> >callers of kobject_init_and_add(), which means bugs will have fewer
> >places to hide.
> >
> >Thanks, Neil!
> >
> >Bjorn
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-26 14:07           ` Veaceslav Falico
@ 2013-09-26 22:16             ` Bjorn Helgaas
  2013-09-26 23:05               ` Veaceslav Falico
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2013-09-26 22:16 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: Neil Horman, linux-kernel, linux-pci, Greg Kroah-Hartman, Russell King

[+cc Russell]

On Thu, Sep 26, 2013 at 04:07:51PM +0200, Veaceslav Falico wrote:
> On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote:
> >On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
> >>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> >>>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
> >>>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
...
> >As an update - I've found an interesting case why exactly that
> >kobject_del() might be needed:
> >
> >in kobject_del() it removes instantly the link to kset - via
> >kobj_kset_leave(), so that our kset remains without links and, thus, might
> >be instantly removed.
> >
> >So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
> >without any links (i.e. other kobjects) and, when we call kset_unregister()
> >- it exits instantly (if it's not being hold somewhere elsewhere...).
> >
> >Without it, kset_unregister() will wait till all the kobjects will be gone.

I don't see any waiting in kset_unregister(); all it does is a
kobject_put().

> >Now, the fun part starts - if we quickly call pci_disable_msi() and,
> >afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
> >still there, waiting to unregister, and the sysfs dir is still active.
> >
> >It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
> >and are called on enslave/deslave in bonding.
> >
> >What I get:
> >[   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
> >[   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'
> >
> >I'll take a deeper look at the issue, though any feedback/advise is
> >welcome. And I'll hold on with the patchset that removes pci_dev_get/put
> >and kobject_del.
> 
> Ok, this is only reproducible with CONFIG_DEBUG_KOBJECT_RELEASE, with or
> without removing kobject_del() (though it's harder to reproduce). I could
> not trigger it without CONFIG_DEBUG_KOBJECT_RELEASE, even with constantly
> poking /sys/class/net/tg3_device/device/msi_irqs/* . Though it's,
> obviously, possible, with and without cleanup and my previous bugfix.
> 
> I'll then send now the cleanup, however this theoretical issue was, is and
> won't be fixed by it :-/. And I don't know how can we possible fix it
> without something like kobject_put_wait().

I still think we should remove the kobject_del().  We don't want to
make a race harder to hit; we want to remove it completely.  What we
really want is to make races *easier* to hit so we can find them,
which seems to be the point of KOBJECT_RELEASE :)

That said, I think I see why you see the warning in this case.
You're calling pci_enable_msi(), pci_disable_msi(), pci_enable_msi()
as in this call chain:

  pci_enable_msi
    msi_capability_init
      populate_msi_sysfs
        dev->msi_kset = kset_create_and_add("msi_irqs", ...)
        list_for_each_entry(entry, &dev->msi_list, ...)
          kobj->kset = dev->msi_kset
          kobject_init_and_add(kobj, &msi_irq_ktype, NULL, ...)
            kobject_add_internal
              kobject_get(&kobj->kset->kobj)    # dev->msi_kset

  pci_disable_msi
    free_msi_irqs
      list_for_each_entry_safe(entry, ..., &dev->msi_list, ...)
        kobject_put(&entry->kobj)
          kobject_release
            ... <delayed> ...
              kobject_cleanup
                kobject_del
                  kobj_kset_leave
                    kset_put(kobj->kset)        # dev->msi_kset
                      kobject_put	# happens AFTER pci_enable_msi() below
                t->release
        list_del(&entry->list)
    kset_unregister(dev->msi_kset)
      kobject_put
        kobject_release
          ... <delayed> ...
            kobject_cleanup		# happens AFTER pci_enable_msi() below
    dev->msi_kset = NULL

  pci_enable_msi
    msi_capability_init
      populate_msi_sysfs
        dev->msi_kset = kset_create_and_add("msi_irqs", ...)
          "sysfs: cannot create duplicate filename .../msi_irqs"

When we kset_create_and_add("msi_irqs") the second time, the delayed
kobject_cleanups() for the msi_desc entries and the msi_kset have
not yet occurred, so the msi_desc entries still hold references to
the msi_kset, etc.

I'm not sure if this is a design problem in the way PCI manages
msi_kset and msi_desc entries, or if there's something wrong in
the way KOBJECT_RELEASE is implemented.  I could imagine changing
it so the bulk of kobject_cleanup(), including the sysfs cleanup,
is executed immediately when the last reference is dropped, but
the kobj_type->release() function itself is delayed.

Calling kobject_del() explicitly sort of side-steps this problem
by doing the sysfs cleanup before the last put.  But it is quite
subtle, and it feels error-prone to rely on that.

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
  2013-09-26 22:16             ` Bjorn Helgaas
@ 2013-09-26 23:05               ` Veaceslav Falico
  0 siblings, 0 replies; 17+ messages in thread
From: Veaceslav Falico @ 2013-09-26 23:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Neil Horman, linux-kernel, linux-pci, Greg Kroah-Hartman, Russell King

On Thu, Sep 26, 2013 at 04:16:13PM -0600, Bjorn Helgaas wrote:
>[+cc Russell]
>
>On Thu, Sep 26, 2013 at 04:07:51PM +0200, Veaceslav Falico wrote:
>> On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote:
>> >On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
>> >>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>> >>>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>> >>>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>...
>> >As an update - I've found an interesting case why exactly that
>> >kobject_del() might be needed:
>> >
>> >in kobject_del() it removes instantly the link to kset - via
>> >kobj_kset_leave(), so that our kset remains without links and, thus, might
>> >be instantly removed.
>> >
>> >So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
>> >without any links (i.e. other kobjects) and, when we call kset_unregister()
>> >- it exits instantly (if it's not being hold somewhere elsewhere...).
>> >
>> >Without it, kset_unregister() will wait till all the kobjects will be gone.
>
>I don't see any waiting in kset_unregister(); all it does is a
>kobject_put().

Sorry, didn't word it correctly - my thought was that kset_unregister()
(which is, basicly, kobject_put()) will NOT unregister it instantly without
kobject_del() in place, because the 'child' kobjects are still tied to this
kset.

>
>> >Now, the fun part starts - if we quickly call pci_disable_msi() and,
>> >afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
>> >still there, waiting to unregister, and the sysfs dir is still active.
>> >
>> >It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
>> >and are called on enslave/deslave in bonding.
>> >
>> >What I get:
>> >[   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
>> >[   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'
>> >
>> >I'll take a deeper look at the issue, though any feedback/advise is
>> >welcome. And I'll hold on with the patchset that removes pci_dev_get/put
>> >and kobject_del.
>>
>> Ok, this is only reproducible with CONFIG_DEBUG_KOBJECT_RELEASE, with or
>> without removing kobject_del() (though it's harder to reproduce). I could
>> not trigger it without CONFIG_DEBUG_KOBJECT_RELEASE, even with constantly
>> poking /sys/class/net/tg3_device/device/msi_irqs/* . Though it's,
>> obviously, possible, with and without cleanup and my previous bugfix.
>>
>> I'll then send now the cleanup, however this theoretical issue was, is and
>> won't be fixed by it :-/. And I don't know how can we possible fix it
>> without something like kobject_put_wait().
>
>I still think we should remove the kobject_del().  We don't want to
>make a race harder to hit; we want to remove it completely.  What we
>really want is to make races *easier* to hit so we can find them,
>which seems to be the point of KOBJECT_RELEASE :)

Agree, that's what I've done in the v2 patchset :).

>
>That said, I think I see why you see the warning in this case.
>You're calling pci_enable_msi(), pci_disable_msi(), pci_enable_msi()
>as in this call chain:
>
>  pci_enable_msi
>    msi_capability_init
>      populate_msi_sysfs
>        dev->msi_kset = kset_create_and_add("msi_irqs", ...)
>        list_for_each_entry(entry, &dev->msi_list, ...)
>          kobj->kset = dev->msi_kset
>          kobject_init_and_add(kobj, &msi_irq_ktype, NULL, ...)
>            kobject_add_internal
>              kobject_get(&kobj->kset->kobj)    # dev->msi_kset
>
>  pci_disable_msi
>    free_msi_irqs
>      list_for_each_entry_safe(entry, ..., &dev->msi_list, ...)
>        kobject_put(&entry->kobj)
>          kobject_release
>            ... <delayed> ...
>              kobject_cleanup
>                kobject_del
>                  kobj_kset_leave
>                    kset_put(kobj->kset)        # dev->msi_kset
>                      kobject_put	# happens AFTER pci_enable_msi() below
>                t->release
>        list_del(&entry->list)
>    kset_unregister(dev->msi_kset)
>      kobject_put
>        kobject_release
>          ... <delayed> ...
>            kobject_cleanup		# happens AFTER pci_enable_msi() below
>    dev->msi_kset = NULL
>
>  pci_enable_msi
>    msi_capability_init
>      populate_msi_sysfs
>        dev->msi_kset = kset_create_and_add("msi_irqs", ...)
>          "sysfs: cannot create duplicate filename .../msi_irqs"
>
>When we kset_create_and_add("msi_irqs") the second time, the delayed
>kobject_cleanups() for the msi_desc entries and the msi_kset have
>not yet occurred, so the msi_desc entries still hold references to
>the msi_kset, etc.

Exactly!

>
>I'm not sure if this is a design problem in the way PCI manages
>msi_kset and msi_desc entries, or if there's something wrong in
>the way KOBJECT_RELEASE is implemented.  I could imagine changing
>it so the bulk of kobject_cleanup(), including the sysfs cleanup,
>is executed immediately when the last reference is dropped, but
>the kobj_type->release() function itself is delayed.

Yep, could be one of the possibilities - and it actually resembles what
kobject_del() does :). But anyway, I agree that we shouldn't leave
kobject_del(), cause it only hides the real problem, and not completely (if
there are some other users who kobject_get(kset/kobject) - we're in
trouble).

>
>Calling kobject_del() explicitly sort of side-steps this problem
>by doing the sysfs cleanup before the last put.  But it is quite
>subtle, and it feels error-prone to rely on that.

So, in my v2 patchset, I've removed the kobject_del(). It's really hard to
trigger the bug without KOBJECT_DEBUG_RELEASE, still.

>
>Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-09-26 23:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-17  1:47 [PATCH 0/3] msi: fix kobject/sysfs removal from msi_list Veaceslav Falico
2013-09-17  1:47 ` [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs() Veaceslav Falico
2013-09-25 21:08   ` Bjorn Helgaas
2013-09-25 21:30     ` Bjorn Helgaas
2013-09-25 22:09       ` Veaceslav Falico
2013-09-25 23:23     ` Neil Horman
2013-09-25 23:35       ` Bjorn Helgaas
2013-09-26  9:27         ` Veaceslav Falico
2013-09-26 12:25         ` Veaceslav Falico
2013-09-26 14:07           ` Veaceslav Falico
2013-09-26 22:16             ` Bjorn Helgaas
2013-09-26 23:05               ` Veaceslav Falico
2013-09-26 14:40           ` Neil Horman
2013-09-17  1:47 ` [PATCH 2/3] msi: always unregister ->msi_kset within free_msi_irqs() Veaceslav Falico
2013-09-17  1:47 ` [PATCH 3/3] msi: free msi_desc entry only after we've released the kobject Veaceslav Falico
2013-09-25 21:34   ` Bjorn Helgaas
2013-09-25 22:12     ` Veaceslav Falico

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).