All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: rakesh@tuxera.com, linux-pci@vger.kernel.org,
	linux-nvme@lists.infradead.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
Date: Tue, 6 Jun 2017 16:14:43 -0500	[thread overview]
Message-ID: <20170606211443.GB12672@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <20170606104836.GB24297@lst.de>

On Tue, Jun 06, 2017 at 12:48:36PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017 at 12:31:42AM -0500, Bjorn Helgaas wrote:
> > OK, sorry to be dense; it's taking me a long time to work out the
> > details here.  It feels like there should be a general principle to
> > help figure out where we need locking, and it would be really awesome
> > if we could include that in the changelog.  But it's not obvious to me
> > what that principle would be.
> 
> The principle is very simple: every method in struct device_driver
> or structures derived from it like struct pci_driver MUST provide
> exclusion vs ->remove.  Usuaull by using device_lock().
> 
> If we don't provide such an exclusion the method call can race with
> a removal in one form or another.

So I guess the method here is
dev->driver->err_handler->reset_notify(), and the PCI core should be
holding device_lock() while calling it?  That makes sense to me;
thanks a lot for articulating that!

1) The current patch protects the err_handler->reset_notify() uses by
adding or expanding device_lock regions in the paths that lead to
pci_reset_notify().  Could we simplify it by doing the locking
directly in pci_reset_notify()?  Then it would be easy to verify the
locking, and we would be less likely to add new callers without the
proper locking.

2) Stating the rule explicitly helps look for other problems, and I
think we have a similar problem in all the pcie_portdrv_err_handler
methods.  These are all called in the AER do_recovery() path, and the
functions there, e.g., report_error_detected() do hold device_lock().
But pcie_portdrv_error_detected() propagates this to all the children,
and we *don't* hold the lock for the children.

Bjorn

WARNING: multiple messages have this Message-ID (diff)
From: helgaas@kernel.org (Bjorn Helgaas)
Subject: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
Date: Tue, 6 Jun 2017 16:14:43 -0500	[thread overview]
Message-ID: <20170606211443.GB12672@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <20170606104836.GB24297@lst.de>

On Tue, Jun 06, 2017@12:48:36PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017@12:31:42AM -0500, Bjorn Helgaas wrote:
> > OK, sorry to be dense; it's taking me a long time to work out the
> > details here.  It feels like there should be a general principle to
> > help figure out where we need locking, and it would be really awesome
> > if we could include that in the changelog.  But it's not obvious to me
> > what that principle would be.
> 
> The principle is very simple: every method in struct device_driver
> or structures derived from it like struct pci_driver MUST provide
> exclusion vs ->remove.  Usuaull by using device_lock().
> 
> If we don't provide such an exclusion the method call can race with
> a removal in one form or another.

So I guess the method here is
dev->driver->err_handler->reset_notify(), and the PCI core should be
holding device_lock() while calling it?  That makes sense to me;
thanks a lot for articulating that!

1) The current patch protects the err_handler->reset_notify() uses by
adding or expanding device_lock regions in the paths that lead to
pci_reset_notify().  Could we simplify it by doing the locking
directly in pci_reset_notify()?  Then it would be easy to verify the
locking, and we would be less likely to add new callers without the
proper locking.

2) Stating the rule explicitly helps look for other problems, and I
think we have a similar problem in all the pcie_portdrv_err_handler
methods.  These are all called in the AER do_recovery() path, and the
functions there, e.g., report_error_detected() do hold device_lock().
But pcie_portdrv_error_detected() propagates this to all the children,
and we *don't* hold the lock for the children.

Bjorn

  reply	other threads:[~2017-06-06 21:14 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-01 11:10 avoid null pointer rereference during FLR V2 Christoph Hellwig
2017-06-01 11:10 ` Christoph Hellwig
2017-06-01 11:10 ` [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls Christoph Hellwig
2017-06-01 11:10   ` Christoph Hellwig
2017-06-06  5:31   ` Bjorn Helgaas
2017-06-06  5:31     ` Bjorn Helgaas
2017-06-06  7:28     ` Marta Rybczynska
2017-06-06  7:28       ` Marta Rybczynska
2017-06-06 10:48     ` Christoph Hellwig
2017-06-06 10:48       ` Christoph Hellwig
2017-06-06 21:14       ` Bjorn Helgaas [this message]
2017-06-06 21:14         ` Bjorn Helgaas
2017-06-07 18:29         ` Christoph Hellwig
2017-06-07 18:29           ` Christoph Hellwig
2017-06-12 23:14           ` Bjorn Helgaas
2017-06-12 23:14             ` Bjorn Helgaas
2017-06-13  7:08             ` Christoph Hellwig
2017-06-13  7:08               ` Christoph Hellwig
2017-06-13 14:05               ` Bjorn Helgaas
2017-06-13 14:05                 ` Bjorn Helgaas
2017-06-22 20:41             ` Guilherme G. Piccoli
2017-06-22 20:41               ` Guilherme G. Piccoli
2017-06-01 11:10 ` [PATCH 2/3] PCI: split reset_notify method Christoph Hellwig
2017-06-01 11:10   ` Christoph Hellwig
2017-06-01 11:10 ` [PATCH 3/3] PCI: remove __pci_dev_reset and pci_dev_reset Christoph Hellwig
2017-06-01 11:10   ` Christoph Hellwig
2017-06-15  3:11 ` avoid null pointer rereference during FLR V2 Bjorn Helgaas
2017-06-15  3:11   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170606211443.GB12672@bhelgaas-glaptop.roam.corp.google.com \
    --to=helgaas@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rakesh@tuxera.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.