linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Francesco Ruggeri <fruggeri@arista.com>
Cc: Greg Kroah-Hartmann <gregkh@linuxfoundation.org>,
	Hannes Reinecke <hare@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: pci: kernel crash in bus_find_device
Date: Thu, 22 May 2014 10:57:00 -0700	[thread overview]
Message-ID: <20140522175700.GA14814@roeck-us.net> (raw)
In-Reply-To: <CA+HUmGhVhLoehT68=2fErGeYSjAC87Ria=5pXUNSpYLwdpXc2w@mail.gmail.com>

On Thu, May 22, 2014 at 09:19:40AM -0700, Francesco Ruggeri wrote:
> Aborting a search does not sound like a correct solution.
> How does a higher level user (eg for_each_pci_dev) know that a search
> was aborted and decide whether it should try again, assuming it would
> be ok repeating the action on the devices visited the first time?
> 
Agreed, it is less than desirable.

I would consider this to be a secondary problem, though, the immediate
problem being the crash. One possible solution might be to have the various
functions return error codes (ERR_PTR), but that would be quite invasive as
well. I really think we need input from Greg and, if the solution touches
the PCI subsystem, from Bjorn Helgaas to find an acceptable solution
to that problem.

Guenter

> Francesco
> 
> 
> On Thu, May 22, 2014 at 12:22 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> > On 05/22/2014 12:14 AM, Greg Kroah-Hartmann wrote:
> >>
> >> On Wed, May 21, 2014 at 03:59:58PM -0700, Guenter Roeck wrote:
> >>>
> >>> On Wed, May 21, 2014 at 01:04:04PM -0700, Francesco Ruggeri wrote:
> >>>>
> >>>> I have been using an x86 platform.
> >>>> When I started working on it I got early crashes until I added the
> >>>> check for p not NULL in
> >>>>
> >>>> +void bus_release_device(struct device *dev)
> >>>> +{
> >>>> + struct device_private *p = dev->p;
> >>>> +
> >>>> + if (p && klist_node_attached(&p->knode_bus))
> >>>> + klist_put_last(&p->knode_bus);
> >>>> +}
> >>>> +
> >>>>
> >>>> Maybe on powerpc *p is overriden between device_del and device_release?
> >>>>
> >>>> Or maybe some of the BUG_ONs in the patch? The ones on knode_dead are
> >>>> treated as WARN_ONs in the current klist code.
> >>>> The one in BUG_ON(!klist_dec_and_del(n)); is new, and in my tests I
> >>>> ran into it without the second patch (but only when I ran my module
> >>>> and tests).
> >>>>
> >>> Hi Francesco,
> >>>
> >>> I replaced the BUG_ON with WARN_ON; still crashes.
> >>>
> >>> Anyway, the problem seems to be known. I found two related exchanges.
> >>>
> >>> [1] describes pretty much the same problem. I don't see if/where it was
> >>> ever fixed, though.
> >>>
> >>> [2] is a patch to fix the problem. It did not apply cleanly to 3.14,
> >>> so I had to make some adjustments in klist_iter_init_node. Resulting
> >>> patch is below. With this patch, the problem is gone. It is not perfect,
> >>> as it aborts the loop if it encounters a deleted kobject, but it is
> >>> better
> >>> than nothing. Unfortunately, the patch never made it upstream; no idea
> >>> why.
> >>> Copying the author and Greg to get additional feedback.
> >>>
> >>> Guenter
> >>>
> >>> [1] https://lkml.org/lkml/2008/10/26/79
> >>> [2] https://lkml.org/lkml/2012/4/16/218
> >>
> >>
> >> 2 years ago?  I have no idea what was up with that, sorry...
> >>
> >
> > Ok, but do you have comments on the patch itself in its current version ?
> >
> > Guenter
> >
> 

  reply	other threads:[~2014-05-22 17:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-20 19:17 pci: kernel crash in bus_find_device Francesco Ruggeri
2014-05-20 19:50 ` Guenter Roeck
2014-05-20 22:35   ` Francesco Ruggeri
2014-05-20 23:38     ` Guenter Roeck
     [not found]       ` <CA+HUmGge7AEpAnwAG_VJD2CKTtRBoC2bCGVU_t4qm-x6+OCr-g@mail.gmail.com>
     [not found]         ` <20140521193010.GA1721@roeck-us.net>
     [not found]           ` <CA+HUmGhm1VLTvMKW1TUUPqStUhD11M5u0VyTZyXyWz_ZS8uSVw@mail.gmail.com>
2014-05-21 22:59             ` Guenter Roeck
2014-05-22  7:14               ` Greg Kroah-Hartmann
2014-05-22  7:22                 ` Guenter Roeck
2014-05-22 16:19                   ` Francesco Ruggeri
2014-05-22 17:57                     ` Guenter Roeck [this message]
2014-05-23  2:31                   ` Greg Kroah-Hartmann
2014-05-21 17:39     ` Guenter Roeck
2014-06-03 22:55 Francesco Ruggeri
2014-06-03 23:21 ` Greg KH
2014-06-04  3:25   ` Guenter Roeck
2014-06-04  6:22     ` Francesco Ruggeri
2014-06-03 23:23 ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140522175700.GA14814@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=fruggeri@arista.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).