All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: alexander.h.duyck@linux.intel.com
Cc: "Brown, Len" <len.brown@intel.com>,
	bvanassche@acm.org,
	Linux-pm mailing list <linux-pm@vger.kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	jiangshanlai@gmail.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Pavel Machek <pavel@ucw.cz>,
	zwisler@kernel.org, Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [driver-core PATCH v6 4/9] driver core: Move async_synchronize_full call
Date: Tue, 27 Nov 2018 12:35:20 -0800	[thread overview]
Message-ID: <CAPcyv4gKT1CDA-xVh5LCYEVUeXLB5ktCFqpFhPWzNK7+QbQdvw@mail.gmail.com> (raw)
In-Reply-To: <ec672fcf5924ef267f35b11c13ddc50c815b1a9f.camel@linux.intel.com>

On Tue, Nov 27, 2018 at 9:38 AM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
> On Mon, 2018-11-26 at 18:11 -0800, Dan Williams wrote:
> > On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck
> > <alexander.h.duyck@linux.intel.com> wrote:
> > >
> > > Move the async_synchronize_full call out of __device_release_driver and
> > > into driver_detach.
> > >
> > > The idea behind this is that the async_synchronize_full call will only
> > > guarantee that any existing async operations are flushed. This doesn't do
> > > anything to guarantee that a hotplug event that may occur while we are
> > > doing the release of the driver will not be asynchronously scheduled.
> > >
> > > By moving this into the driver_detach path we can avoid potential deadlocks
> > > as we aren't holding the device lock at this point and we should not have
> > > the driver we want to flush loaded so the flush will take care of any
> > > asynchronous events the driver we are detaching might have scheduled.
> > >
> >
> > What problem is this patch solving in practice, because if there were
> > drivers issuing async work from probe they would need to be
> > responsible for flushing it themselves. That said it seems broken that
> > the async probing infrastructure takes the device_lock inside
> > async_schedule and then holds the lock when calling
> > async_syncrhonize_full. Is it just luck that this hasn't caused
> > deadlocks in practice?
>
> My understanding is that it has caused some deadlocks. There was
> another patch set that Bart Van Assche had submitted that was
> addressing this. I just tweaked my patch set to address both the issues
> he had seen as well as the performance improvements included in my
> original patch set.

I tried to go find that discussion, but failed. It would help to
report an actual failure rather than a theoretical one.

> > Given that the device_lock is hidden from lockdep I think it would be
> > helpful to have a custom lock_map_acquire() setup, similar to the
> > workqueue core, to try to keep the locking rules enforced /
> > documented.
> >
> > The only documentation I can find for async-probe deadlock avoidance
> > is the comment block in do_init_module() for async_probe_requested.
>
> Would it make sense to just add any lockdep or deadlock documentation
> as a seperate patch? I can work on it but I am not sure it makes sense
> to add to this patch since there is a chance this one will need to be
> backported to stable at some point.

Yes, separate follow-on sounds ok.

> > Stepping back a bit, does this patch have anything to do with the
> > performance improvement, or is it a separate "by the way I also found
> > this" kind of patch?
>
> This is more of a seperate "by the way" type of patch based on the
> discussion Bart and I had about how to best address the issue. There
> may be some improvement since we only call async_synchronize_full once
> and only when we are removing the driver, but I don't think it would be
> very noticable.

Ok, might be worthwhile to submit this at the front of the series as a
fix that has implications for what comes later. The only concern is
whether this fix stands alone. It would seem to make the possibility
of ->remove() racing ->probe() worse, no? Can we make this change
without the new/proposed ->async_probe tracking infrastructure?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: alexander.h.duyck@linux.intel.com
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-pm mailing list <linux-pm@vger.kernel.org>,
	jiangshanlai@gmail.com, "Rafael J. Wysocki" <rafael@kernel.org>,
	"Brown, Len" <len.brown@intel.com>, Pavel Machek <pavel@ucw.cz>,
	zwisler@kernel.org, Dave Jiang <dave.jiang@intel.com>,
	bvanassche@acm.org
Subject: Re: [driver-core PATCH v6 4/9] driver core: Move async_synchronize_full call
Date: Tue, 27 Nov 2018 12:35:20 -0800	[thread overview]
Message-ID: <CAPcyv4gKT1CDA-xVh5LCYEVUeXLB5ktCFqpFhPWzNK7+QbQdvw@mail.gmail.com> (raw)
In-Reply-To: <ec672fcf5924ef267f35b11c13ddc50c815b1a9f.camel@linux.intel.com>

On Tue, Nov 27, 2018 at 9:38 AM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
> On Mon, 2018-11-26 at 18:11 -0800, Dan Williams wrote:
> > On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck
> > <alexander.h.duyck@linux.intel.com> wrote:
> > >
> > > Move the async_synchronize_full call out of __device_release_driver and
> > > into driver_detach.
> > >
> > > The idea behind this is that the async_synchronize_full call will only
> > > guarantee that any existing async operations are flushed. This doesn't do
> > > anything to guarantee that a hotplug event that may occur while we are
> > > doing the release of the driver will not be asynchronously scheduled.
> > >
> > > By moving this into the driver_detach path we can avoid potential deadlocks
> > > as we aren't holding the device lock at this point and we should not have
> > > the driver we want to flush loaded so the flush will take care of any
> > > asynchronous events the driver we are detaching might have scheduled.
> > >
> >
> > What problem is this patch solving in practice, because if there were
> > drivers issuing async work from probe they would need to be
> > responsible for flushing it themselves. That said it seems broken that
> > the async probing infrastructure takes the device_lock inside
> > async_schedule and then holds the lock when calling
> > async_syncrhonize_full. Is it just luck that this hasn't caused
> > deadlocks in practice?
>
> My understanding is that it has caused some deadlocks. There was
> another patch set that Bart Van Assche had submitted that was
> addressing this. I just tweaked my patch set to address both the issues
> he had seen as well as the performance improvements included in my
> original patch set.

I tried to go find that discussion, but failed. It would help to
report an actual failure rather than a theoretical one.

> > Given that the device_lock is hidden from lockdep I think it would be
> > helpful to have a custom lock_map_acquire() setup, similar to the
> > workqueue core, to try to keep the locking rules enforced /
> > documented.
> >
> > The only documentation I can find for async-probe deadlock avoidance
> > is the comment block in do_init_module() for async_probe_requested.
>
> Would it make sense to just add any lockdep or deadlock documentation
> as a seperate patch? I can work on it but I am not sure it makes sense
> to add to this patch since there is a chance this one will need to be
> backported to stable at some point.

Yes, separate follow-on sounds ok.

> > Stepping back a bit, does this patch have anything to do with the
> > performance improvement, or is it a separate "by the way I also found
> > this" kind of patch?
>
> This is more of a seperate "by the way" type of patch based on the
> discussion Bart and I had about how to best address the issue. There
> may be some improvement since we only call async_synchronize_full once
> and only when we are removing the driver, but I don't think it would be
> very noticable.

Ok, might be worthwhile to submit this at the front of the series as a
fix that has implications for what comes later. The only concern is
whether this fix stands alone. It would seem to make the possibility
of ->remove() racing ->probe() worse, no? Can we make this change
without the new/proposed ->async_probe tracking infrastructure?

  reply	other threads:[~2018-11-27 20:35 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-08 18:06 [driver-core PATCH v6 0/9] Add NUMA aware async_schedule calls Alexander Duyck
2018-11-08 18:06 ` Alexander Duyck
2018-11-08 18:06 ` [driver-core PATCH v6 1/9] workqueue: Provide queue_work_node to queue work near a given NUMA node Alexander Duyck
2018-11-08 18:06   ` Alexander Duyck
2018-11-27  1:01   ` Dan Williams
2018-11-27  1:01     ` Dan Williams
2018-11-08 18:06 ` [driver-core PATCH v6 2/9] async: Add support for queueing on specific " Alexander Duyck
2018-11-08 18:06   ` Alexander Duyck
2018-11-08 23:36   ` Bart Van Assche
2018-11-08 23:36     ` Bart Van Assche
2018-11-08 23:36     ` Bart Van Assche
2018-11-11 19:32   ` Greg KH
2018-11-11 19:32     ` Greg KH
2018-11-11 19:32     ` Greg KH
2018-11-11 19:53     ` Dan Williams
2018-11-11 19:53       ` Dan Williams
2018-11-11 19:53       ` Dan Williams
2018-11-11 20:35       ` Greg KH
2018-11-11 20:35         ` Greg KH
2018-11-11 20:35         ` Greg KH
2018-11-11 22:17         ` Dan Williams
2018-11-11 22:17           ` Dan Williams
2018-11-11 22:17           ` Dan Williams
2018-11-11 23:27         ` Alexander Duyck
2018-11-11 23:27           ` Alexander Duyck
2018-11-11 23:27           ` Alexander Duyck
2018-11-11 19:59     ` Pavel Machek
2018-11-11 20:33       ` Greg KH
2018-11-11 20:33         ` Greg KH
2018-11-11 21:24         ` Bart Van Assche
2018-11-11 21:24           ` Bart Van Assche
2018-11-13 22:10         ` Pavel Machek
2018-11-27  1:10   ` Dan Williams
2018-11-27  1:10     ` Dan Williams
2018-11-27  1:10     ` Dan Williams
2018-11-08 18:06 ` [driver-core PATCH v6 3/9] device core: Consolidate locking and unlocking of parent and device Alexander Duyck
2018-11-08 18:06   ` Alexander Duyck
2018-11-08 18:06   ` Alexander Duyck
2018-11-08 22:43   ` jane.chu
2018-11-08 22:43     ` jane.chu-QHcLZuEGTsvQT0dZR+AlfA
2018-11-08 22:43     ` jane.chu
2018-11-08 22:48     ` Alexander Duyck
2018-11-08 22:48       ` Alexander Duyck
2018-11-27  1:44   ` Dan Williams
2018-11-27  1:44     ` Dan Williams
2018-11-08 18:07 ` [driver-core PATCH v6 4/9] driver core: Move async_synchronize_full call Alexander Duyck
2018-11-08 18:07   ` Alexander Duyck
2018-11-27  2:11   ` Dan Williams
2018-11-27  2:11     ` Dan Williams
2018-11-27 17:38     ` Alexander Duyck
2018-11-27 17:38       ` Alexander Duyck
2018-11-27 20:35       ` Dan Williams [this message]
2018-11-27 20:35         ` Dan Williams
2018-11-27 21:36         ` Alexander Duyck
2018-11-27 21:36           ` Alexander Duyck
2018-11-27 22:26           ` Dan Williams
2018-11-27 22:26             ` Dan Williams
2018-11-27 22:26             ` Dan Williams
2018-11-08 18:07 ` [driver-core PATCH v6 5/9] driver core: Establish clear order of operations for deferred probe and remove Alexander Duyck
2018-11-08 18:07   ` Alexander Duyck
2018-11-08 18:07   ` Alexander Duyck
2018-11-08 23:47   ` Bart Van Assche
2018-11-08 23:47     ` Bart Van Assche
2018-11-08 23:47     ` Bart Van Assche
2018-11-08 18:07 ` [driver-core PATCH v6 6/9] driver core: Probe devices asynchronously instead of the driver Alexander Duyck
2018-11-08 18:07   ` Alexander Duyck
2018-11-08 18:07   ` Alexander Duyck
2018-11-08 23:59   ` Bart Van Assche
2018-11-08 23:59     ` Bart Van Assche
2018-11-27  2:48   ` Dan Williams
2018-11-27  2:48     ` Dan Williams
2018-11-27  2:48     ` Dan Williams
2018-11-27 17:57     ` Alexander Duyck
2018-11-27 17:57       ` Alexander Duyck
2018-11-27 18:32       ` Dan Williams
2018-11-27 18:32         ` Dan Williams
2018-11-08 18:07 ` [driver-core PATCH v6 7/9] driver core: Attach devices on CPU local to device node Alexander Duyck
2018-11-08 18:07   ` Alexander Duyck
2018-11-27  4:50   ` Dan Williams
2018-11-27  4:50     ` Dan Williams
2018-11-27  4:50     ` Dan Williams
2018-11-08 18:07 ` [driver-core PATCH v6 8/9] PM core: Use new async_schedule_dev command Alexander Duyck
2018-11-08 18:07   ` Alexander Duyck
2018-11-08 18:07   ` Alexander Duyck
2018-11-27  4:52   ` Dan Williams
2018-11-27  4:52     ` Dan Williams
2018-11-27  4:52     ` Dan Williams
2018-11-08 18:07 ` [driver-core PATCH v6 9/9] libnvdimm: Schedule device registration on node local to the device Alexander Duyck
2018-11-08 18:07   ` Alexander Duyck
2018-11-27  2:21   ` Dan Williams
2018-11-27  2:21     ` Dan Williams
2018-11-27  2:21     ` Dan Williams
2018-11-27 18:04     ` Alexander Duyck
2018-11-27 19:34       ` Dan Williams
2018-11-27 19:34         ` Dan Williams
2018-11-27 20:33         ` Bart Van Assche
2018-11-27 20:33           ` Bart Van Assche
2018-11-27 20:33           ` Bart Van Assche
2018-11-27 20:50           ` Dan Williams
2018-11-27 20:50             ` Dan Williams
2018-11-27 20:50             ` Dan Williams
2018-11-27 21:22             ` Bart Van Assche
2018-11-27 21:22               ` Bart Van Assche
2018-11-27 21:22               ` Bart Van Assche
2018-11-27 22:34               ` Dan Williams
2018-11-27 22:34                 ` Dan Williams
2018-11-27 22:34                 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4gKT1CDA-xVh5LCYEVUeXLB5ktCFqpFhPWzNK7+QbQdvw@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=bvanassche@acm.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jiangshanlai@gmail.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=rafael@kernel.org \
    --cc=tj@kernel.org \
    --cc=zwisler@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.