From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750987AbaEVQTm (ORCPT ); Thu, 22 May 2014 12:19:42 -0400 Received: from mail-vc0-f179.google.com ([209.85.220.179]:46639 "EHLO mail-vc0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750747AbaEVQTl (ORCPT ); Thu, 22 May 2014 12:19:41 -0400 MIME-Version: 1.0 In-Reply-To: <537DA5C0.2070609@roeck-us.net> References: <20140520195041.GA28913@roeck-us.net> <20140520233812.GA15640@roeck-us.net> <20140521193010.GA1721@roeck-us.net> <20140521225958.GB2467@roeck-us.net> <20140522071427.GA21230@kroah.com> <537DA5C0.2070609@roeck-us.net> Date: Thu, 22 May 2014 09:19:40 -0700 Message-ID: Subject: Re: pci: kernel crash in bus_find_device From: Francesco Ruggeri To: Guenter Roeck Cc: Greg Kroah-Hartmann , Hannes Reinecke , linux-kernel@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Aborting a search does not sound like a correct solution. How does a higher level user (eg for_each_pci_dev) know that a search was aborted and decide whether it should try again, assuming it would be ok repeating the action on the devices visited the first time? Francesco On Thu, May 22, 2014 at 12:22 AM, Guenter Roeck wrote: > On 05/22/2014 12:14 AM, Greg Kroah-Hartmann wrote: >> >> On Wed, May 21, 2014 at 03:59:58PM -0700, Guenter Roeck wrote: >>> >>> On Wed, May 21, 2014 at 01:04:04PM -0700, Francesco Ruggeri wrote: >>>> >>>> I have been using an x86 platform. >>>> When I started working on it I got early crashes until I added the >>>> check for p not NULL in >>>> >>>> +void bus_release_device(struct device *dev) >>>> +{ >>>> + struct device_private *p = dev->p; >>>> + >>>> + if (p && klist_node_attached(&p->knode_bus)) >>>> + klist_put_last(&p->knode_bus); >>>> +} >>>> + >>>> >>>> Maybe on powerpc *p is overriden between device_del and device_release? >>>> >>>> Or maybe some of the BUG_ONs in the patch? The ones on knode_dead are >>>> treated as WARN_ONs in the current klist code. >>>> The one in BUG_ON(!klist_dec_and_del(n)); is new, and in my tests I >>>> ran into it without the second patch (but only when I ran my module >>>> and tests). >>>> >>> Hi Francesco, >>> >>> I replaced the BUG_ON with WARN_ON; still crashes. >>> >>> Anyway, the problem seems to be known. I found two related exchanges. >>> >>> [1] describes pretty much the same problem. I don't see if/where it was >>> ever fixed, though. >>> >>> [2] is a patch to fix the problem. It did not apply cleanly to 3.14, >>> so I had to make some adjustments in klist_iter_init_node. Resulting >>> patch is below. With this patch, the problem is gone. It is not perfect, >>> as it aborts the loop if it encounters a deleted kobject, but it is >>> better >>> than nothing. Unfortunately, the patch never made it upstream; no idea >>> why. >>> Copying the author and Greg to get additional feedback. >>> >>> Guenter >>> >>> [1] https://lkml.org/lkml/2008/10/26/79 >>> [2] https://lkml.org/lkml/2012/4/16/218 >> >> >> 2 years ago? I have no idea what was up with that, sorry... >> > > Ok, but do you have comments on the patch itself in its current version ? > > Guenter >