All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: Karol Herbst <kherbst@redhat.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Lyude Paul <lyude@redhat.com>,
	Linux PCI <linux-pci@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	nouveau <nouveau@lists.freedesktop.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM <linux-pm@vger.kernel.org>
Subject: Re: [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges
Date: Mon, 30 Sep 2019 11:05:34 +0300	[thread overview]
Message-ID: <20190930080534.GS2714@lahna.fi.intel.com> (raw)
In-Reply-To: <CACO55tuaY1jFXpJPeC9M4PoWEDyy547_tE8MpLaTDb+C+ffsbg@mail.gmail.com>

Hi Karol,

On Fri, Sep 27, 2019 at 11:53:48PM +0200, Karol Herbst wrote:
> > What exactly is the serious issue?  I guess it's that the rescan
> > doesn't detect the GPU, which means it's not responding to config
> > accesses?  Is there any timing component here, e.g., maybe we're
> > missing some delay like the ones Mika is adding to the reset paths?
> 
> When I was checking up on some of the PCI registers of the bridge
> controller, the slot detection told me that there is no device
> recognized anymore. I don't know which register it was anymore, though
> I guess one could read it up in the SoC spec document by Intel.
> 
> My guess is, that the bridge controller fails to detect the GPU being
> here or actively threw it of the bus or something. But a normal system
> suspend/resume cycle brings the GPU back online (doing a rescan via
> sysfs gets the device detected again)

Can you elaborate a bit what kind of scenario the issue happens (e.g
steps how it reproduces)? It was not 100% clear from the changelog. Also
what the result when the failure happens?

I see there is a script that does something but unfortunately I'm not
fluent in Python so can't extract the steps how the issue can be
reproduced ;-)

One thing that I'm working on is that Linux PCI subsystem misses certain
delays that are needed after D3cold -> D0 transition, otherwise the
device and/or link may not be ready before we access it. What you are
experiencing sounds similar. I wonder if you could try the following
patch and see if it makes any difference?

https://patchwork.kernel.org/patch/11106611/

WARNING: multiple messages have this Message-ID (diff)
From: Mika Westerberg <mika.westerberg-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Karol Herbst <kherbst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Linux PM <linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	nouveau
	<nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"Rafael J. Wysocki" <rjw-LthD3rsA81gm4RdzfppkhA@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	dri-devel
	<dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	Bjorn Helgaas <helgaas-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Linux PCI <linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges
Date: Mon, 30 Sep 2019 11:05:34 +0300	[thread overview]
Message-ID: <20190930080534.GS2714@lahna.fi.intel.com> (raw)
In-Reply-To: <CACO55tuaY1jFXpJPeC9M4PoWEDyy547_tE8MpLaTDb+C+ffsbg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi Karol,

On Fri, Sep 27, 2019 at 11:53:48PM +0200, Karol Herbst wrote:
> > What exactly is the serious issue?  I guess it's that the rescan
> > doesn't detect the GPU, which means it's not responding to config
> > accesses?  Is there any timing component here, e.g., maybe we're
> > missing some delay like the ones Mika is adding to the reset paths?
> 
> When I was checking up on some of the PCI registers of the bridge
> controller, the slot detection told me that there is no device
> recognized anymore. I don't know which register it was anymore, though
> I guess one could read it up in the SoC spec document by Intel.
> 
> My guess is, that the bridge controller fails to detect the GPU being
> here or actively threw it of the bus or something. But a normal system
> suspend/resume cycle brings the GPU back online (doing a rescan via
> sysfs gets the device detected again)

Can you elaborate a bit what kind of scenario the issue happens (e.g
steps how it reproduces)? It was not 100% clear from the changelog. Also
what the result when the failure happens?

I see there is a script that does something but unfortunately I'm not
fluent in Python so can't extract the steps how the issue can be
reproduced ;-)

One thing that I'm working on is that Linux PCI subsystem misses certain
delays that are needed after D3cold -> D0 transition, otherwise the
device and/or link may not be ready before we access it. What you are
experiencing sounds similar. I wonder if you could try the following
patch and see if it makes any difference?

https://patchwork.kernel.org/patch/11106611/
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

  reply	other threads:[~2019-09-30  8:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-27 14:44 [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges Karol Herbst
2019-09-27 21:42 ` Bjorn Helgaas
2019-09-27 21:42   ` Bjorn Helgaas
2019-09-27 21:53   ` Karol Herbst
2019-09-30  8:05     ` Mika Westerberg [this message]
2019-09-30  8:05       ` Mika Westerberg
2019-09-30  9:15       ` Karol Herbst
2019-09-30  9:15         ` Karol Herbst
2019-09-30  9:29         ` Mika Westerberg
2019-09-30 16:05           ` Karol Herbst
2019-09-30 16:30             ` Mika Westerberg
2019-09-30 16:30               ` Mika Westerberg
2019-09-30 16:36               ` Karol Herbst
2019-10-01  8:46                 ` Mika Westerberg
2019-10-01  8:56                   ` Karol Herbst
2019-10-01  9:11                     ` Mika Westerberg
2019-10-01 10:00                       ` Karol Herbst
2019-10-03  9:47                         ` Rafael J. Wysocki
2019-10-01 13:27                 ` Bjorn Helgaas
2019-10-01 16:21                   ` Karol Herbst
2019-10-01 19:34                     ` Bjorn Helgaas
2019-10-01 19:34                       ` Bjorn Helgaas
2019-10-02  7:51                       ` Rafael J. Wysocki
2019-10-02  7:51                         ` Rafael J. Wysocki
2019-09-30 17:07 Karol Herbst
2019-09-30 17:07 ` Karol Herbst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190930080534.GS2714@lahna.fi.intel.com \
    --to=mika.westerberg@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=helgaas@kernel.org \
    --cc=kherbst@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lyude@redhat.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.