linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: "Hoyer, David" <David.Hoyer@netapp.com>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Keith Busch <kbusch@kernel.org>
Subject: Re: Kernel hangs when powering up/down drive using sysfs
Date: Mon, 16 Mar 2020 19:19:59 +0100	[thread overview]
Message-ID: <20200316181959.wpzi4hkoyzpghwpw@wunner.de> (raw)
In-Reply-To: <DM5PR06MB313235E97731D97AB813F65D92FB0@DM5PR06MB3132.namprd06.prod.outlook.com>

On Sat, Mar 14, 2020 at 02:19:44PM +0000, Hoyer, David wrote:
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -637,6 +637,8 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>         events = atomic_xchg(&ctrl->pending_events, 0);
>         if (!events) {
>                 pci_config_pm_runtime_put(pdev);
> +               ctrl->ist_running = false;
> +               wake_up(&ctrl->requester);
>                 return IRQ_NONE;
>        }

Thanks David for the report and sorry for the breakage.

The above LGTM, please submit it as a proper patch and
feel free to add my Reviewed-by.  Please add the same
two lines before the "return ret" a little further up
in the function.

If it's too cumbersome for you to submit a proper patch
I can do it for you.


> We've instrumented the code and we do see that pciehp_ist() runs
> twice, once exiting with IRQ_HANDLED and then again with IRQ_NONE.
> We believe that is due to the timing differences.  Adding debug in
> here changes the timings enough that the hang goes away, so we are
> having troubles proving this 100% at the moment.  But just based on
> code inspection, if pciehp_ist() exits with the IRQ_NONE case, then
> nothing will ever set ist_running=false until a subsequent hotplug
> event happens that causes the IRQ_HANDLED case to run.  (We were
> able to prove that will cause things to "unhang" and progress at
> that point - if you're hung and you remove a drive, the slot status
> change will then unstick things.)

The question is, why is pciehp_ist() run once more.  Most likely
because another event is signaled from the slot.  Try adding a
printk() at the top of pciehp_ist() which emits ctrl->pending_events
to understand what's going on.

Thanks,

Lukas

  parent reply	other threads:[~2020-03-16 18:20 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-14 14:19 Kernel hangs when powering up/down drive using sysfs Hoyer, David
2020-03-16 16:15 ` Keith Busch
2020-03-16 18:10   ` Lukas Wunner
2020-03-16 18:42     ` Keith Busch
2020-03-18 11:53       ` Lukas Wunner
2020-03-16 18:19 ` Lukas Wunner [this message]
2020-03-16 18:25   ` Hoyer, David
2020-03-16 21:35     ` Hoyer, David
2020-03-18 11:49     ` Lukas Wunner
2020-03-18 14:06       ` Hoyer, David
2020-03-18 11:33 ` [PATCH] PCI: pciehp: Fix indefinite wait on sysfs requests Lukas Wunner
2020-03-18 16:43   ` Keith Busch
2020-03-28 20:25   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200316181959.wpzi4hkoyzpghwpw@wunner.de \
    --to=lukas@wunner.de \
    --cc=David.Hoyer@netapp.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).