All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: linux-nvdimm@lists.01.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Will Deacon <will.deacon@arm.com>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Erwin Tsaur <erwin.tsaur@oracle.com>
Subject: [PATCH v2 0/7] libnvdimm: Fix async operations and locking
Date: Wed, 17 Jul 2019 18:07:48 -0700	[thread overview]
Message-ID: <156341206785.292348.1660822720191643298.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)

Changes since v1 [1]:

- Fix an ioctl command corruption regression that manifested as an
  intermittent failure of the monitor.sh unit test. This is handled in
  the patch4 prep patch that makes it safe for nd_ioctl() to be
  re-entrant. (Vishal)

- Update the changelog for the driver-core 'lockdep_lock' hack to
  indicate Greg's non-NAK.

[1]: https://lore.kernel.org/lkml/156029554317.419799.1324389595953183385.stgit@dwillia2-desk3.amr.corp.intel.com/

---

The libnvdimm subsystem uses async operations to parallelize device
probing operations and to allow sysfs to trigger device_unregister() on
deleted namepsaces. A multithreaded stress test of the libnvdimm sysfs
interface uncovered a case where device_unregister() is triggered
multiple times, and the subsequent investigation uncovered a broken
locking scenario.

The lack of lockdep coverage for device_lock() stymied the debug. That
is, until patch6 "driver-core, libnvdimm: Let device subsystems add
local lockdep coverage" solved that with a shadow lock, with lockdep
coverage, to mirror device_lock() operations. Given the time saved with
shadow-lock debug-hack, patch6 attempts to generalize device_lock()
debug facility that might be able to be carried upstream. Patch6 is
staged at the end of this fix series in case it is contentious and needs
to be dropped.

Patch1 "drivers/base: Introduce kill_device()" could be achieved with
local libnvdimm infrastructure. However, the existing 'dead' flag in
'struct device_private' aims to solve similar async register/unregister
races so the fix in patch2 "libnvdimm/bus: Prevent duplicate
device_unregister() calls" can be implemented with existing driver-core
infrastructure.

Patch3 is a rare lockdep warning that is intermittent based on
namespaces racing ahead of the completion of probe of their parent
region. It is not related to the other fixes, it just happened to
trigger as a result of the async stress test.

Patch5 and patch6 address an ABBA deadlock tripped by the stress test.

These patches pass the failing stress test and the existing libnvdimm
unit tests with CONFIG_PROVE_LOCKING=y and the new "dev->lockdep_mutex"
shadow lock with no lockdep warnings.

---

Dan Williams (7):
      drivers/base: Introduce kill_device()
      libnvdimm/bus: Prevent duplicate device_unregister() calls
      libnvdimm/region: Register badblocks before namespaces
      libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant
      libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
      libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
      driver-core, libnvdimm: Let device subsystems add local lockdep coverage


 drivers/acpi/nfit/core.c        |   28 +++--
 drivers/acpi/nfit/nfit.h        |   24 ++++
 drivers/base/core.c             |   30 ++++--
 drivers/nvdimm/btt_devs.c       |   16 +--
 drivers/nvdimm/bus.c            |  210 ++++++++++++++++++++++++++-------------
 drivers/nvdimm/core.c           |   10 +-
 drivers/nvdimm/dimm_devs.c      |    4 -
 drivers/nvdimm/namespace_devs.c |   36 +++----
 drivers/nvdimm/nd-core.h        |   71 +++++++++++++
 drivers/nvdimm/pfn_devs.c       |   24 ++--
 drivers/nvdimm/pmem.c           |    4 -
 drivers/nvdimm/region.c         |   24 ++--
 drivers/nvdimm/region_devs.c    |   12 +-
 include/linux/device.h          |    6 +
 14 files changed, 343 insertions(+), 156 deletions(-)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: linux-nvdimm@lists.01.org
Cc: Ira Weiny <ira.weiny@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Keith Busch <keith.busch@intel.com>,
	Jane Chu <jane.chu@oracle.com>,
	stable@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will.deacon@arm.com>, Jane Chu <jane.chu@oracle.com>,
	Ingo Molnar <mingo@redhat.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Erwin Tsaur <erwin.tsaur@oracle.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Vishal Verma <vishal.l.verma@intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2 0/7] libnvdimm: Fix async operations and locking
Date: Wed, 17 Jul 2019 18:07:48 -0700	[thread overview]
Message-ID: <156341206785.292348.1660822720191643298.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)

Changes since v1 [1]:

- Fix an ioctl command corruption regression that manifested as an
  intermittent failure of the monitor.sh unit test. This is handled in
  the patch4 prep patch that makes it safe for nd_ioctl() to be
  re-entrant. (Vishal)

- Update the changelog for the driver-core 'lockdep_lock' hack to
  indicate Greg's non-NAK.

[1]: https://lore.kernel.org/lkml/156029554317.419799.1324389595953183385.stgit@dwillia2-desk3.amr.corp.intel.com/

---

The libnvdimm subsystem uses async operations to parallelize device
probing operations and to allow sysfs to trigger device_unregister() on
deleted namepsaces. A multithreaded stress test of the libnvdimm sysfs
interface uncovered a case where device_unregister() is triggered
multiple times, and the subsequent investigation uncovered a broken
locking scenario.

The lack of lockdep coverage for device_lock() stymied the debug. That
is, until patch6 "driver-core, libnvdimm: Let device subsystems add
local lockdep coverage" solved that with a shadow lock, with lockdep
coverage, to mirror device_lock() operations. Given the time saved with
shadow-lock debug-hack, patch6 attempts to generalize device_lock()
debug facility that might be able to be carried upstream. Patch6 is
staged at the end of this fix series in case it is contentious and needs
to be dropped.

Patch1 "drivers/base: Introduce kill_device()" could be achieved with
local libnvdimm infrastructure. However, the existing 'dead' flag in
'struct device_private' aims to solve similar async register/unregister
races so the fix in patch2 "libnvdimm/bus: Prevent duplicate
device_unregister() calls" can be implemented with existing driver-core
infrastructure.

Patch3 is a rare lockdep warning that is intermittent based on
namespaces racing ahead of the completion of probe of their parent
region. It is not related to the other fixes, it just happened to
trigger as a result of the async stress test.

Patch5 and patch6 address an ABBA deadlock tripped by the stress test.

These patches pass the failing stress test and the existing libnvdimm
unit tests with CONFIG_PROVE_LOCKING=y and the new "dev->lockdep_mutex"
shadow lock with no lockdep warnings.

---

Dan Williams (7):
      drivers/base: Introduce kill_device()
      libnvdimm/bus: Prevent duplicate device_unregister() calls
      libnvdimm/region: Register badblocks before namespaces
      libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant
      libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
      libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
      driver-core, libnvdimm: Let device subsystems add local lockdep coverage


 drivers/acpi/nfit/core.c        |   28 +++--
 drivers/acpi/nfit/nfit.h        |   24 ++++
 drivers/base/core.c             |   30 ++++--
 drivers/nvdimm/btt_devs.c       |   16 +--
 drivers/nvdimm/bus.c            |  210 ++++++++++++++++++++++++++-------------
 drivers/nvdimm/core.c           |   10 +-
 drivers/nvdimm/dimm_devs.c      |    4 -
 drivers/nvdimm/namespace_devs.c |   36 +++----
 drivers/nvdimm/nd-core.h        |   71 +++++++++++++
 drivers/nvdimm/pfn_devs.c       |   24 ++--
 drivers/nvdimm/pmem.c           |    4 -
 drivers/nvdimm/region.c         |   24 ++--
 drivers/nvdimm/region_devs.c    |   12 +-
 include/linux/device.h          |    6 +
 14 files changed, 343 insertions(+), 156 deletions(-)

             reply	other threads:[~2019-07-18  1:24 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-18  1:07 Dan Williams [this message]
2019-07-18  1:07 ` [PATCH v2 0/7] libnvdimm: Fix async operations and locking Dan Williams
2019-07-18  1:07 ` [PATCH v2 1/7] drivers/base: Introduce kill_device() Dan Williams
2019-07-18  1:07   ` Dan Williams
2019-07-18  2:29   ` Greg Kroah-Hartman
2019-07-18  2:29     ` Greg Kroah-Hartman
     [not found]   ` <156341207332.292348.14959761496009347574.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2019-07-19  0:45     ` Sasha Levin
2019-07-18  1:07 ` [PATCH v2 2/7] libnvdimm/bus: Prevent duplicate device_unregister() calls Dan Williams
2019-07-18  1:07   ` Dan Williams
2019-07-18  1:08 ` [PATCH v2 3/7] libnvdimm/region: Register badblocks before namespaces Dan Williams
2019-07-18  1:08   ` Dan Williams
2019-07-18 18:16   ` Verma, Vishal L
2019-07-18 18:16     ` Verma, Vishal L
2019-07-18  1:08 ` [PATCH v2 4/7] libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant Dan Williams
2019-07-18  1:08   ` Dan Williams
2019-07-18 18:21   ` Verma, Vishal L
2019-07-18 18:21     ` Verma, Vishal L
2019-07-18  1:08 ` [PATCH v2 5/7] libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl() Dan Williams
2019-07-18  1:08   ` Dan Williams
2019-07-18  1:08 ` [PATCH v2 6/7] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock Dan Williams
2019-07-18  1:08   ` Dan Williams
2019-07-18  2:04   ` Sasha Levin
2019-07-18  2:04     ` Sasha Levin
2019-07-18  6:39     ` Dan Williams
2019-07-18  6:39       ` Dan Williams
2019-07-18  1:08 ` [PATCH v2 7/7] driver-core, libnvdimm: Let device subsystems add local lockdep coverage Dan Williams
2019-07-18  1:08   ` Dan Williams
2019-07-18  2:28   ` Greg Kroah-Hartman
2019-07-18  2:28     ` Greg Kroah-Hartman
2019-07-18 16:09   ` Ira Weiny
2019-07-18 16:09     ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=156341206785.292348.1660822720191643298.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=erwin.tsaur@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=stable@vger.kernel.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.