All of lore.kernel.org
 help / color / mirror / Atom feed
* Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-16 16:35 Marcin Zajączkowski
       [not found] ` <c34a6fe1-80dd-a4db-c605-0a13c69e803f-5tc4TXWwyLM@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Marcin Zajączkowski @ 2019-12-16 16:35 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi,

I've encountered a severe regression in TU116 (probably also TU117)
introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
usually hangs on the subsequent graphic mode related operation (calling
xrandr after login is enough) with the following error:

> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
...
> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> kernel: ------------[ cut here ]------------
> kernel: nouveau 0000:01:00.0: timeout
> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]

(detailed log in a corresponding issue - [1])

With earlier kernels there was no hardware acceleration for NVidia GTX
1660 Ti, but at least I could use nouveau to disable it (to save
battery, trees and lower temperature) or even have an external output
(with Wayland). Now, the system is unusable with nouveau :(.

I spent some time trying to narrow the scope using on the existing
kernel builds for Fedora. I was able to determine that the problem was
introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).

It's just a few days (7-11 Aug) and "only" around 250 commits. I went
through them, but (based on the commits name) I haven't seen any nouveau
related changes and in general no very suspected drm related changes.

> git log 33920f1ec5bf..v5.3-rc4 --stat


Maybe some of more nouveau/drm-experienced developers could take a look
at that to determine which commit could break it (to make it easier to
find out what should be fixed to prevent that regression)?


[1] -
https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516

Thanks in advance
Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
       [not found] ` <c34a6fe1-80dd-a4db-c605-0a13c69e803f-5tc4TXWwyLM@public.gmane.org>
@ 2019-12-16 17:08   ` Ilia Mirkin
       [not found]     ` <CAKb7UviSYORoeDm1sbDFEzkGd68+DV=StCpzsiaGbA=1VQX3gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Ilia Mirkin @ 2019-12-16 17:08 UTC (permalink / raw)
  To: Marcin Zajączkowski; +Cc: nouveau

Hi Marcin,

You should do a git bisect rather than guessing about commits. I
suspect that searching for "kernel git bisect fedora" should prove
instructive if you're not sure how to do this.

Cheers,

  -ilia

On Mon, Dec 16, 2019 at 11:42 AM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>
> Hi,
>
> I've encountered a severe regression in TU116 (probably also TU117)
> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
> usually hangs on the subsequent graphic mode related operation (calling
> xrandr after login is enough) with the following error:
>
> > kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> ...
> > kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> > kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> > kernel: ------------[ cut here ]------------
> > kernel: nouveau 0000:01:00.0: timeout
> > kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
>
> (detailed log in a corresponding issue - [1])
>
> With earlier kernels there was no hardware acceleration for NVidia GTX
> 1660 Ti, but at least I could use nouveau to disable it (to save
> battery, trees and lower temperature) or even have an external output
> (with Wayland). Now, the system is unusable with nouveau :(.
>
> I spent some time trying to narrow the scope using on the existing
> kernel builds for Fedora. I was able to determine that the problem was
> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).
>
> It's just a few days (7-11 Aug) and "only" around 250 commits. I went
> through them, but (based on the commits name) I haven't seen any nouveau
> related changes and in general no very suspected drm related changes.
>
> > git log 33920f1ec5bf..v5.3-rc4 --stat
>
>
> Maybe some of more nouveau/drm-experienced developers could take a look
> at that to determine which commit could break it (to make it easier to
> find out what should be fixed to prevent that regression)?
>
>
> [1] -
> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516
>
> Thanks in advance
> Marcin
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
       [not found]     ` <CAKb7UviSYORoeDm1sbDFEzkGd68+DV=StCpzsiaGbA=1VQX3gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-12-16 17:42       ` Marcin Zajączkowski
       [not found]         ` <233aafa2-1474-39bf-8ea0-fe1a3ecef167-5tc4TXWwyLM@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Marcin Zajączkowski @ 2019-12-16 17:42 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau

On 2019-12-16 18:08, Ilia Mirkin wrote:
> Hi Marcin,
> 
> You should do a git bisect rather than guessing about commits. I
> suspect that searching for "kernel git bisect fedora" should prove
> instructive if you're not sure how to do this.

Thanks for your suggestion. I realize that I can do it at the Git level
and it is the ultimate way to go. However, building the kernel version
from sources takes some time (in addition to a regular time needed to
install/restart/verify which I already experienced narrowing down to a
"just" ~250 commits).

Therefore, I would be really thankful for a suggestion which commits
could be good to check first - having 2, 4 is better than 8-10 (assuming
someone is right :) ).

Marcin



> On Mon, Dec 16, 2019 at 11:42 AM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>>
>> Hi,
>>
>> I've encountered a severe regression in TU116 (probably also TU117)
>> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
>> usually hangs on the subsequent graphic mode related operation (calling
>> xrandr after login is enough) with the following error:
>>
>>> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
>> ...
>>> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
>>> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
>>> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
>>> kernel: ------------[ cut here ]------------
>>> kernel: nouveau 0000:01:00.0: timeout
>>> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
>>
>> (detailed log in a corresponding issue - [1])
>>
>> With earlier kernels there was no hardware acceleration for NVidia GTX
>> 1660 Ti, but at least I could use nouveau to disable it (to save
>> battery, trees and lower temperature) or even have an external output
>> (with Wayland). Now, the system is unusable with nouveau :(.
>>
>> I spent some time trying to narrow the scope using on the existing
>> kernel builds for Fedora. I was able to determine that the problem was
>> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
>> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).
>>
>> It's just a few days (7-11 Aug) and "only" around 250 commits. I went
>> through them, but (based on the commits name) I haven't seen any nouveau
>> related changes and in general no very suspected drm related changes.
>>
>>> git log 33920f1ec5bf..v5.3-rc4 --stat
>>
>>
>> Maybe some of more nouveau/drm-experienced developers could take a look
>> at that to determine which commit could break it (to make it easier to
>> find out what should be fixed to prevent that regression)?
>>
>>
>> [1] -
>> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516
>>
>> Thanks in advance
>> Marcin
>> _______________________________________________
>> Nouveau mailing list
>> Nouveau@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
       [not found]         ` <233aafa2-1474-39bf-8ea0-fe1a3ecef167-5tc4TXWwyLM@public.gmane.org>
@ 2019-12-16 18:45           ` Ilia Mirkin
       [not found]             ` <CAKb7UvgOVrwC91ys19uTAG2p_MRVqcsV_MAHOSL4-m3f+j=dNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Ilia Mirkin @ 2019-12-16 18:45 UTC (permalink / raw)
  To: Marcin Zajączkowski; +Cc: nouveau

The obvious candidate based on a quick scan is
0acf5676dc0ffe0683543a20d5ecbd112af5b8ee -- it merges a fix that
messes with PCI stuff, and there lie dragons. You could try building
that commit, and if things still work, then I have no idea (and you've
narrowed the range). Also I'd recommend ensuring that the good kernel
is really good and the bad kernel is really bad -- boot them a few
times.

Cheers,

  -ilia

On Mon, Dec 16, 2019 at 12:42 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>
> On 2019-12-16 18:08, Ilia Mirkin wrote:
> > Hi Marcin,
> >
> > You should do a git bisect rather than guessing about commits. I
> > suspect that searching for "kernel git bisect fedora" should prove
> > instructive if you're not sure how to do this.
>
> Thanks for your suggestion. I realize that I can do it at the Git level
> and it is the ultimate way to go. However, building the kernel version
> from sources takes some time (in addition to a regular time needed to
> install/restart/verify which I already experienced narrowing down to a
> "just" ~250 commits).
>
> Therefore, I would be really thankful for a suggestion which commits
> could be good to check first - having 2, 4 is better than 8-10 (assuming
> someone is right :) ).
>
> Marcin
>
>
>
> > On Mon, Dec 16, 2019 at 11:42 AM Marcin Zajączkowski <mszpak@wp.pl> wrote:
> >>
> >> Hi,
> >>
> >> I've encountered a severe regression in TU116 (probably also TU117)
> >> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
> >> usually hangs on the subsequent graphic mode related operation (calling
> >> xrandr after login is enough) with the following error:
> >>
> >>> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> >> ...
> >>> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> >>> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> >>> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> >>> kernel: ------------[ cut here ]------------
> >>> kernel: nouveau 0000:01:00.0: timeout
> >>> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
> >>
> >> (detailed log in a corresponding issue - [1])
> >>
> >> With earlier kernels there was no hardware acceleration for NVidia GTX
> >> 1660 Ti, but at least I could use nouveau to disable it (to save
> >> battery, trees and lower temperature) or even have an external output
> >> (with Wayland). Now, the system is unusable with nouveau :(.
> >>
> >> I spent some time trying to narrow the scope using on the existing
> >> kernel builds for Fedora. I was able to determine that the problem was
> >> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
> >> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).
> >>
> >> It's just a few days (7-11 Aug) and "only" around 250 commits. I went
> >> through them, but (based on the commits name) I haven't seen any nouveau
> >> related changes and in general no very suspected drm related changes.
> >>
> >>> git log 33920f1ec5bf..v5.3-rc4 --stat
> >>
> >>
> >> Maybe some of more nouveau/drm-experienced developers could take a look
> >> at that to determine which commit could break it (to make it easier to
> >> find out what should be fixed to prevent that regression)?
> >>
> >>
> >> [1] -
> >> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516
> >>
> >> Thanks in advance
> >> Marcin
> >> _______________________________________________
> >> Nouveau mailing list
> >> Nouveau@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
       [not found]             ` <CAKb7UvgOVrwC91ys19uTAG2p_MRVqcsV_MAHOSL4-m3f+j=dNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-12-19 20:27               ` Marcin Zajączkowski
  2019-12-19 20:38                   ` Ilia Mirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Marcin Zajączkowski @ 2019-12-19 20:27 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau

On 2019-12-16 19:45, Ilia Mirkin wrote:
> The obvious candidate based on a quick scan is
> 0acf5676dc0ffe0683543a20d5ecbd112af5b8ee -- it merges a fix that
> messes with PCI stuff, and there lie dragons. You could try building
> that commit, and if things still work, then I have no idea (and you've

Nice shot Ilia!

I managed to build kernel from suspected bd112af5b8ee and it fails
miserably (as previously described). The build from the previous commit
86a04561920b works fine.

> narrowed the range). Also I'd recommend ensuring that the good kernel
> is really good and the bad kernel is really bad -- boot them a few
> times.

Well, this problem is reproducible in 100% in newer kernels. I see the
errors on boot logs and after login to Gnome Shell the first execution
of xrandr (or opening a lid) hangs the system (the graphic card). On the
other side I haven't seen that problem in any earlier kernel. Therefore,
the situation is rather clear in my case. Nevertheless, I will stay with
that self-build good kernel (5.3.0-0.rc3 + git) to check it further.


How would you see it, Ilia? Is there anything in nouveau that needs to
be adjusted to that changes or rather those changes break something in
nouveau that would be best to fix/revert them (and it would be good to
let the committer know about the problem)?

Marcin



> On Mon, Dec 16, 2019 at 12:42 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>>
>> On 2019-12-16 18:08, Ilia Mirkin wrote:
>>> Hi Marcin,
>>>
>>> You should do a git bisect rather than guessing about commits. I
>>> suspect that searching for "kernel git bisect fedora" should prove
>>> instructive if you're not sure how to do this.
>>
>> Thanks for your suggestion. I realize that I can do it at the Git level
>> and it is the ultimate way to go. However, building the kernel version
>> from sources takes some time (in addition to a regular time needed to
>> install/restart/verify which I already experienced narrowing down to a
>> "just" ~250 commits).
>>
>> Therefore, I would be really thankful for a suggestion which commits
>> could be good to check first - having 2, 4 is better than 8-10 (assuming
>> someone is right :) ).
>>
>> Marcin
>>
>>
>>
>>> On Mon, Dec 16, 2019 at 11:42 AM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I've encountered a severe regression in TU116 (probably also TU117)
>>>> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
>>>> usually hangs on the subsequent graphic mode related operation (calling
>>>> xrandr after login is enough) with the following error:
>>>>
>>>>> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
>>>> ...
>>>>> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
>>>>> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
>>>>> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
>>>>> kernel: ------------[ cut here ]------------
>>>>> kernel: nouveau 0000:01:00.0: timeout
>>>>> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
>>>>
>>>> (detailed log in a corresponding issue - [1])
>>>>
>>>> With earlier kernels there was no hardware acceleration for NVidia GTX
>>>> 1660 Ti, but at least I could use nouveau to disable it (to save
>>>> battery, trees and lower temperature) or even have an external output
>>>> (with Wayland). Now, the system is unusable with nouveau :(.
>>>>
>>>> I spent some time trying to narrow the scope using on the existing
>>>> kernel builds for Fedora. I was able to determine that the problem was
>>>> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
>>>> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).
>>>>
>>>> It's just a few days (7-11 Aug) and "only" around 250 commits. I went
>>>> through them, but (based on the commits name) I haven't seen any nouveau
>>>> related changes and in general no very suspected drm related changes.
>>>>
>>>>> git log 33920f1ec5bf..v5.3-rc4 --stat
>>>>
>>>>
>>>> Maybe some of more nouveau/drm-experienced developers could take a look
>>>> at that to determine which commit could break it (to make it easier to
>>>> find out what should be fixed to prevent that regression)?
>>>>
>>>>
>>>> [1] -
>>>> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516
>>>>
>>>> Thanks in advance
>>>> Marcin
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Nouveau] Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-19 20:38                   ` Ilia Mirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Ilia Mirkin @ 2019-12-19 20:38 UTC (permalink / raw)
  To: Marcin Zajączkowski, Mika Westerberg, Rafael J. Wysocki
  Cc: nouveau, Linux PCI

Let's add Mika and Rafael, as they were responsible for that commit.
Mika/Rafael - any ideas? The commit in question is

0617bdede5114a0002298b12cd0ca2b0cfd0395d

Marcin -- would be nice if you could confirm that taking a recent
kernel + "git revert 0617bdede5114a0002298b12cd0ca2b0cfd0395d" works
well for you.

On Thu, Dec 19, 2019 at 3:27 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>
> On 2019-12-16 19:45, Ilia Mirkin wrote:
> > The obvious candidate based on a quick scan is
> > 0acf5676dc0ffe0683543a20d5ecbd112af5b8ee -- it merges a fix that
> > messes with PCI stuff, and there lie dragons. You could try building
> > that commit, and if things still work, then I have no idea (and you've
>
> Nice shot Ilia!
>
> I managed to build kernel from suspected bd112af5b8ee and it fails

Took me a while, but this is the end of the hash. Normally you list
the start of the hash (and that's what all the git tools accept). In
this case this is commit

0acf5676dc0ffe0683543a20d5ecbd112af5b8ee

> miserably (as previously described). The build from the previous commit
> 86a04561920b works fine.

e577dc152e232c78e5774e4c9b5486a04561920b

>
> > narrowed the range). Also I'd recommend ensuring that the good kernel
> > is really good and the bad kernel is really bad -- boot them a few
> > times.
>
> Well, this problem is reproducible in 100% in newer kernels. I see the
> errors on boot logs and after login to Gnome Shell the first execution
> of xrandr (or opening a lid) hangs the system (the graphic card). On the
> other side I haven't seen that problem in any earlier kernel. Therefore,
> the situation is rather clear in my case. Nevertheless, I will stay with
> that self-build good kernel (5.3.0-0.rc3 + git) to check it further.
>
>
> How would you see it, Ilia? Is there anything in nouveau that needs to
> be adjusted to that changes or rather those changes break something in
> nouveau that would be best to fix/revert them (and it would be good to
> let the committer know about the problem)?
>
> Marcin
>
>
>
> > On Mon, Dec 16, 2019 at 12:42 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
> >>
> >> On 2019-12-16 18:08, Ilia Mirkin wrote:
> >>> Hi Marcin,
> >>>
> >>> You should do a git bisect rather than guessing about commits. I
> >>> suspect that searching for "kernel git bisect fedora" should prove
> >>> instructive if you're not sure how to do this.
> >>
> >> Thanks for your suggestion. I realize that I can do it at the Git level
> >> and it is the ultimate way to go. However, building the kernel version
> >> from sources takes some time (in addition to a regular time needed to
> >> install/restart/verify which I already experienced narrowing down to a
> >> "just" ~250 commits).
> >>
> >> Therefore, I would be really thankful for a suggestion which commits
> >> could be good to check first - having 2, 4 is better than 8-10 (assuming
> >> someone is right :) ).
> >>
> >> Marcin
> >>
> >>
> >>
> >>> On Mon, Dec 16, 2019 at 11:42 AM Marcin Zajączkowski <mszpak@wp.pl> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I've encountered a severe regression in TU116 (probably also TU117)
> >>>> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
> >>>> usually hangs on the subsequent graphic mode related operation (calling
> >>>> xrandr after login is enough) with the following error:
> >>>>
> >>>>> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> >>>> ...
> >>>>> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> >>>>> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> >>>>> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> >>>>> kernel: ------------[ cut here ]------------
> >>>>> kernel: nouveau 0000:01:00.0: timeout
> >>>>> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
> >>>>
> >>>> (detailed log in a corresponding issue - [1])
> >>>>
> >>>> With earlier kernels there was no hardware acceleration for NVidia GTX
> >>>> 1660 Ti, but at least I could use nouveau to disable it (to save
> >>>> battery, trees and lower temperature) or even have an external output
> >>>> (with Wayland). Now, the system is unusable with nouveau :(.
> >>>>
> >>>> I spent some time trying to narrow the scope using on the existing
> >>>> kernel builds for Fedora. I was able to determine that the problem was
> >>>> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
> >>>> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).
> >>>>
> >>>> It's just a few days (7-11 Aug) and "only" around 250 commits. I went
> >>>> through them, but (based on the commits name) I haven't seen any nouveau
> >>>> related changes and in general no very suspected drm related changes.
> >>>>
> >>>>> git log 33920f1ec5bf..v5.3-rc4 --stat
> >>>>
> >>>>
> >>>> Maybe some of more nouveau/drm-experienced developers could take a look
> >>>> at that to determine which commit could break it (to make it easier to
> >>>> find out what should be fixed to prevent that regression)?
> >>>>
> >>>>
> >>>> [1] -
> >>>> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516
> >>>>
> >>>> Thanks in advance
> >>>> Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-19 20:38                   ` Ilia Mirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Ilia Mirkin @ 2019-12-19 20:38 UTC (permalink / raw)
  To: Marcin Zajączkowski, Mika Westerberg, Rafael J. Wysocki
  Cc: nouveau, Linux PCI

Let's add Mika and Rafael, as they were responsible for that commit.
Mika/Rafael - any ideas? The commit in question is

0617bdede5114a0002298b12cd0ca2b0cfd0395d

Marcin -- would be nice if you could confirm that taking a recent
kernel + "git revert 0617bdede5114a0002298b12cd0ca2b0cfd0395d" works
well for you.

On Thu, Dec 19, 2019 at 3:27 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>
> On 2019-12-16 19:45, Ilia Mirkin wrote:
> > The obvious candidate based on a quick scan is
> > 0acf5676dc0ffe0683543a20d5ecbd112af5b8ee -- it merges a fix that
> > messes with PCI stuff, and there lie dragons. You could try building
> > that commit, and if things still work, then I have no idea (and you've
>
> Nice shot Ilia!
>
> I managed to build kernel from suspected bd112af5b8ee and it fails

Took me a while, but this is the end of the hash. Normally you list
the start of the hash (and that's what all the git tools accept). In
this case this is commit

0acf5676dc0ffe0683543a20d5ecbd112af5b8ee

> miserably (as previously described). The build from the previous commit
> 86a04561920b works fine.

e577dc152e232c78e5774e4c9b5486a04561920b

>
> > narrowed the range). Also I'd recommend ensuring that the good kernel
> > is really good and the bad kernel is really bad -- boot them a few
> > times.
>
> Well, this problem is reproducible in 100% in newer kernels. I see the
> errors on boot logs and after login to Gnome Shell the first execution
> of xrandr (or opening a lid) hangs the system (the graphic card). On the
> other side I haven't seen that problem in any earlier kernel. Therefore,
> the situation is rather clear in my case. Nevertheless, I will stay with
> that self-build good kernel (5.3.0-0.rc3 + git) to check it further.
>
>
> How would you see it, Ilia? Is there anything in nouveau that needs to
> be adjusted to that changes or rather those changes break something in
> nouveau that would be best to fix/revert them (and it would be good to
> let the committer know about the problem)?
>
> Marcin
>
>
>
> > On Mon, Dec 16, 2019 at 12:42 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
> >>
> >> On 2019-12-16 18:08, Ilia Mirkin wrote:
> >>> Hi Marcin,
> >>>
> >>> You should do a git bisect rather than guessing about commits. I
> >>> suspect that searching for "kernel git bisect fedora" should prove
> >>> instructive if you're not sure how to do this.
> >>
> >> Thanks for your suggestion. I realize that I can do it at the Git level
> >> and it is the ultimate way to go. However, building the kernel version
> >> from sources takes some time (in addition to a regular time needed to
> >> install/restart/verify which I already experienced narrowing down to a
> >> "just" ~250 commits).
> >>
> >> Therefore, I would be really thankful for a suggestion which commits
> >> could be good to check first - having 2, 4 is better than 8-10 (assuming
> >> someone is right :) ).
> >>
> >> Marcin
> >>
> >>
> >>
> >>> On Mon, Dec 16, 2019 at 11:42 AM Marcin Zajączkowski <mszpak@wp.pl> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I've encountered a severe regression in TU116 (probably also TU117)
> >>>> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
> >>>> usually hangs on the subsequent graphic mode related operation (calling
> >>>> xrandr after login is enough) with the following error:
> >>>>
> >>>>> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
> >>>> ...
> >>>>> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> >>>>> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> >>>>> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> >>>>> kernel: ------------[ cut here ]------------
> >>>>> kernel: nouveau 0000:01:00.0: timeout
> >>>>> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
> >>>>
> >>>> (detailed log in a corresponding issue - [1])
> >>>>
> >>>> With earlier kernels there was no hardware acceleration for NVidia GTX
> >>>> 1660 Ti, but at least I could use nouveau to disable it (to save
> >>>> battery, trees and lower temperature) or even have an external output
> >>>> (with Wayland). Now, the system is unusable with nouveau :(.
> >>>>
> >>>> I spent some time trying to narrow the scope using on the existing
> >>>> kernel builds for Fedora. I was able to determine that the problem was
> >>>> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
> >>>> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).
> >>>>
> >>>> It's just a few days (7-11 Aug) and "only" around 250 commits. I went
> >>>> through them, but (based on the commits name) I haven't seen any nouveau
> >>>> related changes and in general no very suspected drm related changes.
> >>>>
> >>>>> git log 33920f1ec5bf..v5.3-rc4 --stat
> >>>>
> >>>>
> >>>> Maybe some of more nouveau/drm-experienced developers could take a look
> >>>> at that to determine which commit could break it (to make it easier to
> >>>> find out what should be fixed to prevent that regression)?
> >>>>
> >>>>
> >>>> [1] -
> >>>> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516
> >>>>
> >>>> Thanks in advance
> >>>> Marcin
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Nouveau] Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-19 21:58                     ` Marcin Zajączkowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Zajączkowski @ 2019-12-19 21:58 UTC (permalink / raw)
  To: Ilia Mirkin, Mika Westerberg, Rafael J. Wysocki; +Cc: nouveau, Linux PCI

On 2019-12-19 21:38, Ilia Mirkin wrote:
> Let's add Mika and Rafael, as they were responsible for that commit.
> Mika/Rafael - any ideas? The commit in question is
> 
> 0617bdede5114a0002298b12cd0ca2b0cfd0395d
> 
> Marcin -- would be nice if you could confirm that taking a recent
> kernel + "git revert 0617bdede5114a0002298b12cd0ca2b0cfd0395d" works
> well for you.

I gave it a try, however, there were subsequent changes in the
neighborhood and I'm not sure how to solve the conflicts (as of master
today). Nevertheless, I should be able to test a provided patch to
verify that some assumptions are right.

Marcin


> 
> On Thu, Dec 19, 2019 at 3:27 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>>
>> On 2019-12-16 19:45, Ilia Mirkin wrote:
>>> The obvious candidate based on a quick scan is
>>> 0acf5676dc0ffe0683543a20d5ecbd112af5b8ee -- it merges a fix that
>>> messes with PCI stuff, and there lie dragons. You could try building
>>> that commit, and if things still work, then I have no idea (and you've
>>
>> Nice shot Ilia!
>>
>> I managed to build kernel from suspected bd112af5b8ee and it fails
> 
> Took me a while, but this is the end of the hash. Normally you list
> the start of the hash (and that's what all the git tools accept). In
> this case this is commit

What a bummer, I knew that...

> 
> 0acf5676dc0ffe0683543a20d5ecbd112af5b8ee
> 
>> miserably (as previously described). The build from the previous commit
>> 86a04561920b works fine.
> 
> e577dc152e232c78e5774e4c9b5486a04561920b
> 
>>
>>> narrowed the range). Also I'd recommend ensuring that the good kernel
>>> is really good and the bad kernel is really bad -- boot them a few
>>> times.
>>
>> Well, this problem is reproducible in 100% in newer kernels. I see the
>> errors on boot logs and after login to Gnome Shell the first execution
>> of xrandr (or opening a lid) hangs the system (the graphic card). On the
>> other side I haven't seen that problem in any earlier kernel. Therefore,
>> the situation is rather clear in my case. Nevertheless, I will stay with
>> that self-build good kernel (5.3.0-0.rc3 + git) to check it further.
>>
>>
>> How would you see it, Ilia? Is there anything in nouveau that needs to
>> be adjusted to that changes or rather those changes break something in
>> nouveau that would be best to fix/revert them (and it would be good to
>> let the committer know about the problem)?
>>
>> Marcin
>>
>>
>>
>>> On Mon, Dec 16, 2019 at 12:42 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>>>>
>>>> On 2019-12-16 18:08, Ilia Mirkin wrote:
>>>>> Hi Marcin,
>>>>>
>>>>> You should do a git bisect rather than guessing about commits. I
>>>>> suspect that searching for "kernel git bisect fedora" should prove
>>>>> instructive if you're not sure how to do this.
>>>>
>>>> Thanks for your suggestion. I realize that I can do it at the Git level
>>>> and it is the ultimate way to go. However, building the kernel version
>>>> from sources takes some time (in addition to a regular time needed to
>>>> install/restart/verify which I already experienced narrowing down to a
>>>> "just" ~250 commits).
>>>>
>>>> Therefore, I would be really thankful for a suggestion which commits
>>>> could be good to check first - having 2, 4 is better than 8-10 (assuming
>>>> someone is right :) ).
>>>>
>>>> Marcin
>>>>
>>>>
>>>>
>>>>> On Mon, Dec 16, 2019 at 11:42 AM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've encountered a severe regression in TU116 (probably also TU117)
>>>>>> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
>>>>>> usually hangs on the subsequent graphic mode related operation (calling
>>>>>> xrandr after login is enough) with the following error:
>>>>>>
>>>>>>> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
>>>>>> ...
>>>>>>> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
>>>>>>> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
>>>>>>> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
>>>>>>> kernel: ------------[ cut here ]------------
>>>>>>> kernel: nouveau 0000:01:00.0: timeout
>>>>>>> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
>>>>>>
>>>>>> (detailed log in a corresponding issue - [1])
>>>>>>
>>>>>> With earlier kernels there was no hardware acceleration for NVidia GTX
>>>>>> 1660 Ti, but at least I could use nouveau to disable it (to save
>>>>>> battery, trees and lower temperature) or even have an external output
>>>>>> (with Wayland). Now, the system is unusable with nouveau :(.
>>>>>>
>>>>>> I spent some time trying to narrow the scope using on the existing
>>>>>> kernel builds for Fedora. I was able to determine that the problem was
>>>>>> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
>>>>>> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).
>>>>>>
>>>>>> It's just a few days (7-11 Aug) and "only" around 250 commits. I went
>>>>>> through them, but (based on the commits name) I haven't seen any nouveau
>>>>>> related changes and in general no very suspected drm related changes.
>>>>>>
>>>>>>> git log 33920f1ec5bf..v5.3-rc4 --stat
>>>>>>
>>>>>>
>>>>>> Maybe some of more nouveau/drm-experienced developers could take a look
>>>>>> at that to determine which commit could break it (to make it easier to
>>>>>> find out what should be fixed to prevent that regression)?
>>>>>>
>>>>>>
>>>>>> [1] -
>>>>>> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516
>>>>>>
>>>>>> Thanks in advance
>>>>>> Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-19 21:58                     ` Marcin Zajączkowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Zajączkowski @ 2019-12-19 21:58 UTC (permalink / raw)
  To: Ilia Mirkin, Mika Westerberg, Rafael J. Wysocki; +Cc: nouveau, Linux PCI

On 2019-12-19 21:38, Ilia Mirkin wrote:
> Let's add Mika and Rafael, as they were responsible for that commit.
> Mika/Rafael - any ideas? The commit in question is
> 
> 0617bdede5114a0002298b12cd0ca2b0cfd0395d
> 
> Marcin -- would be nice if you could confirm that taking a recent
> kernel + "git revert 0617bdede5114a0002298b12cd0ca2b0cfd0395d" works
> well for you.

I gave it a try, however, there were subsequent changes in the
neighborhood and I'm not sure how to solve the conflicts (as of master
today). Nevertheless, I should be able to test a provided patch to
verify that some assumptions are right.

Marcin


> 
> On Thu, Dec 19, 2019 at 3:27 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>>
>> On 2019-12-16 19:45, Ilia Mirkin wrote:
>>> The obvious candidate based on a quick scan is
>>> 0acf5676dc0ffe0683543a20d5ecbd112af5b8ee -- it merges a fix that
>>> messes with PCI stuff, and there lie dragons. You could try building
>>> that commit, and if things still work, then I have no idea (and you've
>>
>> Nice shot Ilia!
>>
>> I managed to build kernel from suspected bd112af5b8ee and it fails
> 
> Took me a while, but this is the end of the hash. Normally you list
> the start of the hash (and that's what all the git tools accept). In
> this case this is commit

What a bummer, I knew that...

> 
> 0acf5676dc0ffe0683543a20d5ecbd112af5b8ee
> 
>> miserably (as previously described). The build from the previous commit
>> 86a04561920b works fine.
> 
> e577dc152e232c78e5774e4c9b5486a04561920b
> 
>>
>>> narrowed the range). Also I'd recommend ensuring that the good kernel
>>> is really good and the bad kernel is really bad -- boot them a few
>>> times.
>>
>> Well, this problem is reproducible in 100% in newer kernels. I see the
>> errors on boot logs and after login to Gnome Shell the first execution
>> of xrandr (or opening a lid) hangs the system (the graphic card). On the
>> other side I haven't seen that problem in any earlier kernel. Therefore,
>> the situation is rather clear in my case. Nevertheless, I will stay with
>> that self-build good kernel (5.3.0-0.rc3 + git) to check it further.
>>
>>
>> How would you see it, Ilia? Is there anything in nouveau that needs to
>> be adjusted to that changes or rather those changes break something in
>> nouveau that would be best to fix/revert them (and it would be good to
>> let the committer know about the problem)?
>>
>> Marcin
>>
>>
>>
>>> On Mon, Dec 16, 2019 at 12:42 PM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>>>>
>>>> On 2019-12-16 18:08, Ilia Mirkin wrote:
>>>>> Hi Marcin,
>>>>>
>>>>> You should do a git bisect rather than guessing about commits. I
>>>>> suspect that searching for "kernel git bisect fedora" should prove
>>>>> instructive if you're not sure how to do this.
>>>>
>>>> Thanks for your suggestion. I realize that I can do it at the Git level
>>>> and it is the ultimate way to go. However, building the kernel version
>>>> from sources takes some time (in addition to a regular time needed to
>>>> install/restart/verify which I already experienced narrowing down to a
>>>> "just" ~250 commits).
>>>>
>>>> Therefore, I would be really thankful for a suggestion which commits
>>>> could be good to check first - having 2, 4 is better than 8-10 (assuming
>>>> someone is right :) ).
>>>>
>>>> Marcin
>>>>
>>>>
>>>>
>>>>> On Mon, Dec 16, 2019 at 11:42 AM Marcin Zajączkowski <mszpak@wp.pl> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've encountered a severe regression in TU116 (probably also TU117)
>>>>>> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
>>>>>> usually hangs on the subsequent graphic mode related operation (calling
>>>>>> xrandr after login is enough) with the following error:
>>>>>>
>>>>>>> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
>>>>>> ...
>>>>>>> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
>>>>>>> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
>>>>>>> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
>>>>>>> kernel: ------------[ cut here ]------------
>>>>>>> kernel: nouveau 0000:01:00.0: timeout
>>>>>>> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
>>>>>>
>>>>>> (detailed log in a corresponding issue - [1])
>>>>>>
>>>>>> With earlier kernels there was no hardware acceleration for NVidia GTX
>>>>>> 1660 Ti, but at least I could use nouveau to disable it (to save
>>>>>> battery, trees and lower temperature) or even have an external output
>>>>>> (with Wayland). Now, the system is unusable with nouveau :(.
>>>>>>
>>>>>> I spent some time trying to narrow the scope using on the existing
>>>>>> kernel builds for Fedora. I was able to determine that the problem was
>>>>>> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
>>>>>> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).
>>>>>>
>>>>>> It's just a few days (7-11 Aug) and "only" around 250 commits. I went
>>>>>> through them, but (based on the commits name) I haven't seen any nouveau
>>>>>> related changes and in general no very suspected drm related changes.
>>>>>>
>>>>>>> git log 33920f1ec5bf..v5.3-rc4 --stat
>>>>>>
>>>>>>
>>>>>> Maybe some of more nouveau/drm-experienced developers could take a look
>>>>>> at that to determine which commit could break it (to make it easier to
>>>>>> find out what should be fixed to prevent that regression)?
>>>>>>
>>>>>>
>>>>>> [1] -
>>>>>> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516
>>>>>>
>>>>>> Thanks in advance
>>>>>> Marcin
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Nouveau] Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-20  6:05                     ` Mika Westerberg
  0 siblings, 0 replies; 13+ messages in thread
From: Mika Westerberg @ 2019-12-20  6:05 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: Marcin Zajączkowski, Rafael J. Wysocki, nouveau, Linux PCI

On Thu, Dec 19, 2019 at 03:38:10PM -0500, Ilia Mirkin wrote:
> Let's add Mika and Rafael, as they were responsible for that commit.
> Mika/Rafael - any ideas? The commit in question is
> 
> 0617bdede5114a0002298b12cd0ca2b0cfd0395d

This seems to be

  Revert "PCI: Add missing link delays required by the PCIe spec"

Can you try v5.5-rcX without any additional changes? It should include
the same fix done bit differently (trying to avoid breaking systems
which caused us to revert the previous one):

  4827d63891b6 PCI/PM: Add pcie_wait_for_link_delay()
  ad9001f2f411 PCI/PM: Add missing link delays required by the PCIe spec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-20  6:05                     ` Mika Westerberg
  0 siblings, 0 replies; 13+ messages in thread
From: Mika Westerberg @ 2019-12-20  6:05 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau, Rafael J. Wysocki, Linux PCI

On Thu, Dec 19, 2019 at 03:38:10PM -0500, Ilia Mirkin wrote:
> Let's add Mika and Rafael, as they were responsible for that commit.
> Mika/Rafael - any ideas? The commit in question is
> 
> 0617bdede5114a0002298b12cd0ca2b0cfd0395d

This seems to be

  Revert "PCI: Add missing link delays required by the PCIe spec"

Can you try v5.5-rcX without any additional changes? It should include
the same fix done bit differently (trying to avoid breaking systems
which caused us to revert the previous one):

  4827d63891b6 PCI/PM: Add pcie_wait_for_link_delay()
  ad9001f2f411 PCI/PM: Add missing link delays required by the PCIe spec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Nouveau] Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-20 12:34                       ` Marcin Zajączkowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Zajączkowski @ 2019-12-20 12:34 UTC (permalink / raw)
  To: Mika Westerberg, Ilia Mirkin; +Cc: Rafael J. Wysocki, nouveau, Linux PCI

On 2019-12-20 07:05, Mika Westerberg wrote:
> On Thu, Dec 19, 2019 at 03:38:10PM -0500, Ilia Mirkin wrote:
>> Let's add Mika and Rafael, as they were responsible for that commit.
>> Mika/Rafael - any ideas? The commit in question is
>>
>> 0617bdede5114a0002298b12cd0ca2b0cfd0395d
> 
> This seems to be
> 
>   Revert "PCI: Add missing link delays required by the PCIe spec"
> 
> Can you try v5.5-rcX without any additional changes? It should include
> the same fix done bit differently (trying to avoid breaking systems
> which caused us to revert the previous one):
> 
>   4827d63891b6 PCI/PM: Add pcie_wait_for_link_delay()
>   ad9001f2f411 PCI/PM: Add missing link delays required by the PCIe spec

Thanks Mika, it looks very promising.
kernel-core-5.5.0-0.rc2.git0.1.fc32.x86_64 boots up without the
aforementioned errors and I can operate normally. I will play more with
5.5 before closing the issue, but at the moment it seems to be fixed.

Before I started digging which commits introduced regression I tested my
system with (then) latest stable kernel-5.4.2-300, but I see your
changes are only in the 5.5 line :).

Big thanks Ilia for your help to pinpoint the problematic commit.

Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-20 12:34                       ` Marcin Zajączkowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Zajączkowski @ 2019-12-20 12:34 UTC (permalink / raw)
  To: Mika Westerberg, Ilia Mirkin; +Cc: nouveau, Rafael J. Wysocki, Linux PCI

On 2019-12-20 07:05, Mika Westerberg wrote:
> On Thu, Dec 19, 2019 at 03:38:10PM -0500, Ilia Mirkin wrote:
>> Let's add Mika and Rafael, as they were responsible for that commit.
>> Mika/Rafael - any ideas? The commit in question is
>>
>> 0617bdede5114a0002298b12cd0ca2b0cfd0395d
> 
> This seems to be
> 
>   Revert "PCI: Add missing link delays required by the PCIe spec"
> 
> Can you try v5.5-rcX without any additional changes? It should include
> the same fix done bit differently (trying to avoid breaking systems
> which caused us to revert the previous one):
> 
>   4827d63891b6 PCI/PM: Add pcie_wait_for_link_delay()
>   ad9001f2f411 PCI/PM: Add missing link delays required by the PCIe spec

Thanks Mika, it looks very promising.
kernel-core-5.5.0-0.rc2.git0.1.fc32.x86_64 boots up without the
aforementioned errors and I can operate normally. I will play more with
5.5 before closing the issue, but at the moment it seems to be fixed.

Before I started digging which commits introduced regression I tested my
system with (then) latest stable kernel-5.4.2-300, but I see your
changes are only in the 5.5 line :).

Big thanks Ilia for your help to pinpoint the problematic commit.

Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-12-20 12:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-16 16:35 Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed Marcin Zajączkowski
     [not found] ` <c34a6fe1-80dd-a4db-c605-0a13c69e803f-5tc4TXWwyLM@public.gmane.org>
2019-12-16 17:08   ` Ilia Mirkin
     [not found]     ` <CAKb7UviSYORoeDm1sbDFEzkGd68+DV=StCpzsiaGbA=1VQX3gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-12-16 17:42       ` Marcin Zajączkowski
     [not found]         ` <233aafa2-1474-39bf-8ea0-fe1a3ecef167-5tc4TXWwyLM@public.gmane.org>
2019-12-16 18:45           ` Ilia Mirkin
     [not found]             ` <CAKb7UvgOVrwC91ys19uTAG2p_MRVqcsV_MAHOSL4-m3f+j=dNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-12-19 20:27               ` Marcin Zajączkowski
2019-12-19 20:38                 ` [Nouveau] " Ilia Mirkin
2019-12-19 20:38                   ` Ilia Mirkin
2019-12-19 21:58                   ` [Nouveau] " Marcin Zajączkowski
2019-12-19 21:58                     ` Marcin Zajączkowski
2019-12-20  6:05                   ` [Nouveau] " Mika Westerberg
2019-12-20  6:05                     ` Mika Westerberg
2019-12-20 12:34                     ` [Nouveau] " Marcin Zajączkowski
2019-12-20 12:34                       ` Marcin Zajączkowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.