linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: gokul cg <gokuljnpr@gmail.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Ashok Raj <ashok.raj@intel.com>,
	Keith Busch <keith.busch@intel.com>,
	Yinghai Lu <yinghai@kernel.org>, Sinan Kaya <okaya@kernel.org>,
	linux-pci@vger.kernel.org,
	Alexandru Gagniuc <mr.nuke.me@gmail.com>
Subject: Re: [PATCH] PCI: pciehp: Differentiate between surprise and safe removal
Date: Thu, 2 Aug 2018 10:46:57 +0200	[thread overview]
Message-ID: <20180802084657.GA21267@wunner.de> (raw)
In-Reply-To: <CAFP4jM8AYG7hmkC_rYgXAfLoJmkJuW0e1UbgiayGrCPbb_yw8A@mail.gmail.com>

On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg wrote:
> I am suspecting a possible race condition in the kernel between PCI driver
> and AER handling.
> 
> Because of the same kernel panic happens from worker thread which handles
> bottom half of aer irq.
> 
> I am seeing this issue when I suddenly power off PCI card which
> supports/enabled PCIE AER error reporting.
> 
> While powering off PCI device, AER driver will get AER IRQ for the device,
> from AER IRQ handler, it will cache AER error code and schedule worker
> thread to handle error.
> 
> The PCIe device will get removed from PCI tree before worker thread
> completes its task and kernel panic is  happening when worker thread tries
> to access PCI device's config space.
> 
> #5 [ffff88027469fc70] general_protection at ffffffff8176cdf2
>     [exception RIP: pci_bus_read_config_dword+100]
> #6 [ffff88027469fd50] pci_find_next_ext_capability at ffffffff81345d7b
> #7 [ffff88027469fd90] pci_find_ext_capability at ffffffff81347225
> #8 [ffff88027469fda0] get_device_error_info at ffffffff81356c4d
> #9 [ffff88027469fdd0] aer_isr at ffffffff81357a38
> #10 [ffff88027469fe28] process_one_work at ffffffff8105d4c0
> #11 [ffff88027469fe70] worker_thread at ffffffff8105e251
> #12 [ffff88027469fed0] kthread at ffffffff81064260
> #13 [ffff88027469ff50] ret_from_fork at ffffffff81773a38
> 
> I have tested it on kernel 3.10 . But from source i could see that this
> case is still relevant for latest Linux source .

I'm not really familiar with the AER driver, but the problem is
actually easy to spot:

find_source_device() walks the hierarchy and saves a pointer to
pci_dev's in an array.  That array is later traversed and the
pci_dev's are accessed.

The solution is to acquire a ref on each device in add_error_device():

-	e_info->dev[e_info->error_dev_num] = dev;
+	e_info->dev[e_info->error_dev_num] = pci_dev_get(dev);

Then release the ref aer_process_err_devices() by calling pci_dev_put().

I believe there's an ongoing refactoring of the AER driver and the
issue may be addressed in the course of it, but as a quick fix for
an ancient v3.10 kernel, the above should do the trick.

HTH,

Lukas

  reply	other threads:[~2018-08-02  8:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-31  5:50 [PATCH] PCI: pciehp: Differentiate between surprise and safe removal Lukas Wunner
2018-08-01 16:43 ` Mika Westerberg
2018-08-01 17:15   ` Lukas Wunner
2018-08-01 19:09     ` Alex G.
2018-08-02  7:20     ` Mika Westerberg
2018-08-02  7:29       ` gokul cg
2018-08-02  8:46         ` Lukas Wunner [this message]
2018-08-02 12:28           ` gokul cg
2018-08-02 15:07           ` Lukas Wunner
2018-08-02 17:09             ` Thomas Tai
2018-08-06 18:33               ` gokul cg
2018-08-07 14:26                 ` Thomas Tai
2018-08-07 15:30                 ` Thomas Tai
2018-08-08  9:59                   ` gokul cg
2018-08-08 11:21                   ` gokul cg
2018-08-08 20:49                     ` Thomas Tai
2018-09-04 17:53 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180802084657.GA21267@wunner.de \
    --to=lukas@wunner.de \
    --cc=ashok.raj@intel.com \
    --cc=gokuljnpr@gmail.com \
    --cc=helgaas@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mr.nuke.me@gmail.com \
    --cc=okaya@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).