All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Chiang <achiang@hp.com>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Oleg Nesterov <oleg@redhat.com>,
	jbarnes@virtuousgeek.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, kaneshige.kenji@jp.fujitsu.com,
	Lai Jiangshan <laijs@cn.fujitsu.com>
Subject: Re: [PATCH v5 09/13] PCI: Introduce /sys/bus/pci/devices/.../remove
Date: Tue, 24 Mar 2009 11:23:54 -0600	[thread overview]
Message-ID: <20090324172354.GB17297@ldl.fc.hp.com> (raw)
In-Reply-To: <1237897972.4320.79.camel@johannes.local>

* Johannes Berg <johannes@sipsolutions.net>:
> On Tue, 2009-03-24 at 03:46 -0700, Andrew Morton wrote:
> 
> > But I don't think we've seen a coherent description of what's actually
> > _wrong_ with the current code.  flush_cpu_workqueue() has been handling
> > this case for many years with no problems reported as far as I know.
> > 
> > So what has caused this sudden flurry of reports?  Did something change in
> > lockdep?  What is this
> > 
> > [  537.380128]  (events){--..}, at: [<ffffffff80257fc0>] flush_workqueue+0x0/0xa0
> > [  537.380128]
> > [  537.380128] but task is already holding lock:
> > [  537.380128]  (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
> > 
> > supposed to mean?  "events" isn't a lock - it's the name of a kernel
> > thread, isn't it?  If this is supposed to be deadlockable then how?
> 
> events is indeed the schedule_work workqueue thread name -- I just used
> that for lack of a better name.
> 
> > Because I don't immediately see what's wrong with e1000_remove() calling
> > flush_work().  It's undesirable, and we can perhaps improve it via some
> > means, but where is the bug?
> 
> There is no bug -- it's a false positive in a way. I've pointed this out
> in the original thread, see
> http://thread.gmane.org/gmane.linux.kernel/550877/focus=550932

I'm actually a bit confused now.

Peter explained why flushing a workqueue from the same queue is
bad, and in general I agree, but what do you mean by "false
positive"?

By the way, this scenario:

	code path 1:
	  my_function() -> lock(L1); ...; flush_workqueue(); ...

	code path 2:
	  run_workqueue() -> my_work() -> ...; lock(L1); ...

is _not_ what is happening here.

sysfs_schedule_callback() is an ugly piece of code that exists
because a sysfs attribute cannot remove itself without
deadlocking. So the callback mechanism was created to allow a
different kernel thread to remove the sysfs attribute and avoid
deadlock.

So what you really have going on is:

	sysfs callback -> add remove callback to global workqueue
	remove callback fires off (pci_remove_bus_device) and we do...
	    device_unregister
	    driver's ->remove method called
	    driver's ->remove method calls flush_scheduled_work

Yes, after read the thread I agree that generically calling
flush_workqueue in the middle of run_workqueue is bad, but the
lockdep warning that Kenji showed us really won't deadlock.

This is because pci_remove_bus_device() will not acquire any lock
L1 that an individual device driver will attempt to acquire in
the remove path. If that were the case, we would deadlock every
time you rmmod'ed a device driver's module or every time you shut
your machine down.

I think from my end, there are 2 things I need to do:

	a) make sysfs_schedule_callback() use its own work queue
	   instead of global work queue, because too many drivers
	   call flush_scheduled_work in their remove path

	b) give sysfs attributes the ability to commit suicide

(a) is short term work, 2.6.30 timeframe, since it doesn't
involve any large conceptual changes.

(b) is picking up Tejun Heo's existing work, but that was a bit
controversial last time, and I'm not sure it will make it during
this merge window.

Question for the lockdep folks though -- given what I described,
do you agree that the warning we saw was a false positive? Or am
I off in left field?

Thanks.

/ac


  reply	other threads:[~2009-03-24 17:24 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-20 20:55 [PATCH v5 00/13] PCI core learns hotplug Alex Chiang
2009-03-20 20:55 ` [PATCH v5 01/13] PCI: pci_is_root_bus helper Alex Chiang
2009-03-20 22:00   ` Jesse Barnes
2009-03-20 20:56 ` [PATCH v5 02/13] PCI: don't scan existing devices Alex Chiang
2009-03-20 20:56 ` [PATCH v5 03/13] PCI: pci_scan_slot() returns newly found devices Alex Chiang
2009-03-20 20:56 ` [PATCH v5 04/13] PCI: always scan child buses Alex Chiang
2009-03-20 20:56 ` [PATCH v5 05/13] PCI: do not initialize bridges more than once Alex Chiang
2009-03-20 20:56 ` [PATCH v5 06/13] PCI: do not enable " Alex Chiang
2009-03-20 20:56 ` [PATCH v5 07/13] PCI: Introduce pci_rescan_bus() Alex Chiang
2009-03-20 20:56 ` [PATCH v5 08/13] PCI: Introduce /sys/bus/pci/rescan Alex Chiang
2009-03-20 20:56 ` [PATCH v5 09/13] PCI: Introduce /sys/bus/pci/devices/.../remove Alex Chiang
2009-03-23  9:01   ` Kenji Kaneshige
2009-03-24  3:23     ` Alex Chiang
2009-03-24  9:25       ` Ingo Molnar
2009-03-24 10:46         ` Andrew Morton
2009-03-24 11:17           ` Peter Zijlstra
2009-03-24 13:21             ` Johannes Berg
2009-03-24 12:32           ` Johannes Berg
2009-03-24 17:23             ` Alex Chiang [this message]
2009-03-24 20:22               ` Johannes Berg
2009-03-24 16:12         ` Oleg Nesterov
2009-03-24 17:32           ` Alex Chiang
2009-03-24 19:29     ` Alex Chiang
2009-03-25  5:06       ` Kenji Kaneshige
2009-03-25  5:20         ` Alex Chiang
2009-03-25  5:39           ` Kenji Kaneshige
2009-03-20 20:56 ` [PATCH v5 10/13] PCI: Introduce /sys/bus/pci/devices/.../rescan Alex Chiang
2009-03-20 20:56 ` [PATCH v5 11/13] PCI Hotplug: restore fakephp interface with complete reimplementation Alex Chiang
2009-03-20 20:56 ` [PATCH v5 12/13] PCI Hotplug: rename legacy_fakephp to fakephp Alex Chiang
2009-03-20 20:56 ` [PATCH v5 13/13] PCI Hotplug: schedule fakephp for feature removal Alex Chiang
2012-03-10 21:20   ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090324172354.GB17297@ldl.fc.hp.com \
    --to=achiang@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=jbarnes@virtuousgeek.org \
    --cc=johannes@sipsolutions.net \
    --cc=kaneshige.kenji@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.