All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luben Tuikov <ltuikov89@gmail.com>
To: Alex Deucher <alexdeucher@gmail.com>
Cc: "Phillip Susi" <phill@thesusis.net>,
	"Linux regressions mailing list" <regressions@lists.linux.dev>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	linux-kernel@vger.kernel.org,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	dri-devel@lists.freedesktop.org,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Danilo Krummrich" <dakr@redhat.com>
Subject: Re: Radeon regression in 6.6 kernel
Date: Wed, 29 Nov 2023 11:41:40 -0500	[thread overview]
Message-ID: <9595b8bf-e64d-4926-9263-97e18bcd7d05@gmail.com> (raw)
In-Reply-To: <CADnq5_OC=JFpGcN0oGbTF5xYEt4X3r0=jEY6hJ12W8CzYq1+cA@mail.gmail.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 3250 bytes --]

On 2023-11-29 10:22, Alex Deucher wrote:
> On Wed, Nov 29, 2023 at 8:50 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>>
>> On Tue, Nov 28, 2023 at 11:45 PM Luben Tuikov <ltuikov89@gmail.com> wrote:
>>>
>>> On 2023-11-28 17:13, Alex Deucher wrote:
>>>> On Mon, Nov 27, 2023 at 6:24 PM Phillip Susi <phill@thesusis.net> wrote:
>>>>>
>>>>> Alex Deucher <alexdeucher@gmail.com> writes:
>>>>>
>>>>>>> In that case those are the already known problems with the scheduler
>>>>>>> changes, aren't they?
>>>>>>
>>>>>> Yes.  Those changes went into 6.7 though, not 6.6 AFAIK.  Maybe I'm
>>>>>> misunderstanding what the original report was actually testing.  If it
>>>>>> was 6.7, then try reverting:
>>>>>> 56e449603f0ac580700621a356d35d5716a62ce5
>>>>>> b70438004a14f4d0f9890b3297cd66248728546c
>>>>>
>>>>> At some point it was suggested that I file a gitlab issue, but I took
>>>>> this to mean it was already known and being worked on.  -rc3 came out
>>>>> today and still has the problem.  Is there a known issue I could track?
>>>>>
>>>>
>>>> At this point, unless there are any objections, I think we should just
>>>> revert the two patches
>>> Uhm, no.
>>>
>>> Why "the two" patches?
>>>
>>> This email, part of this thread,
>>>
>>> https://lore.kernel.org/all/87r0kircdo.fsf@vps.thesusis.net/
>>>
>>> clearly states that reverting *only* this commit,
>>> 56e449603f0ac5 drm/sched: Convert the GPU scheduler to variable number of run-queues
>>> *does not* mitigate the failed suspend. (Furthermore, this commit doesn't really change
>>> anything operational, other than using an allocated array, instead of a static one, in DRM,
>>> while the 2nd patch is solely contained within the amdgpu driver code.)
>>>
>>> Leaving us with only this change,
>>> b70438004a14f4 drm/amdgpu: move buffer funcs setting up a level
>>> to be at fault, as the kernel log attached in the linked email above shows.
>>>
>>> The conclusion is that only b70438004a14f4 needs reverting.
>>
>> b70438004a14f4 was a fix for 56e449603f0ac5.  Without b70438004a14f4,
>> 56e449603f0ac5 breaks amdgpu.
> 
> We can try and re-enable it in the next kernel.  I'm just not sure
> we'll be able to fix this in time for 6.7 with the holidays and all
> and I don't want to cause a lot of scheduler churn at the end of the
> 6.7 cycle if we hold off and try and fix it.  Reverting seems like the
> best short term solution.

A lot of subsequent code has come in since commit 56e449603f0ac5, as it opened
the opportunity for a 1-to-1 relationship between an entity and a scheduler.
(Should've always been the case, from the outset. Not sure why it was coded as
a fixed-size array.)

Given that commit 56e449603f0ac5 has nothing to do with amdgpu, and the problem
is wholly contained in amdgpu, and no other driver has this problem, there is
no reason to have to "churn", i.e. go back and forth in DRM, only to cover up
an init bug in amdgpu. See the response I just sent in @this thread:
https://lore.kernel.org/r/05007cb0-871e-4dc7-af58-1351f4ba43e2@gmail.com

And it's not like this issue is unknown. I first posted about it on 2023-10-16. 

Ideally, amdgpu would just fix their init code.
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Luben Tuikov <ltuikov89@gmail.com>
To: Alex Deucher <alexdeucher@gmail.com>
Cc: "Linux regressions mailing list" <regressions@lists.linux.dev>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	"Danilo Krummrich" <dakr@redhat.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Phillip Susi" <phill@thesusis.net>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: Radeon regression in 6.6 kernel
Date: Wed, 29 Nov 2023 11:41:40 -0500	[thread overview]
Message-ID: <9595b8bf-e64d-4926-9263-97e18bcd7d05@gmail.com> (raw)
In-Reply-To: <CADnq5_OC=JFpGcN0oGbTF5xYEt4X3r0=jEY6hJ12W8CzYq1+cA@mail.gmail.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 3250 bytes --]

On 2023-11-29 10:22, Alex Deucher wrote:
> On Wed, Nov 29, 2023 at 8:50 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>>
>> On Tue, Nov 28, 2023 at 11:45 PM Luben Tuikov <ltuikov89@gmail.com> wrote:
>>>
>>> On 2023-11-28 17:13, Alex Deucher wrote:
>>>> On Mon, Nov 27, 2023 at 6:24 PM Phillip Susi <phill@thesusis.net> wrote:
>>>>>
>>>>> Alex Deucher <alexdeucher@gmail.com> writes:
>>>>>
>>>>>>> In that case those are the already known problems with the scheduler
>>>>>>> changes, aren't they?
>>>>>>
>>>>>> Yes.  Those changes went into 6.7 though, not 6.6 AFAIK.  Maybe I'm
>>>>>> misunderstanding what the original report was actually testing.  If it
>>>>>> was 6.7, then try reverting:
>>>>>> 56e449603f0ac580700621a356d35d5716a62ce5
>>>>>> b70438004a14f4d0f9890b3297cd66248728546c
>>>>>
>>>>> At some point it was suggested that I file a gitlab issue, but I took
>>>>> this to mean it was already known and being worked on.  -rc3 came out
>>>>> today and still has the problem.  Is there a known issue I could track?
>>>>>
>>>>
>>>> At this point, unless there are any objections, I think we should just
>>>> revert the two patches
>>> Uhm, no.
>>>
>>> Why "the two" patches?
>>>
>>> This email, part of this thread,
>>>
>>> https://lore.kernel.org/all/87r0kircdo.fsf@vps.thesusis.net/
>>>
>>> clearly states that reverting *only* this commit,
>>> 56e449603f0ac5 drm/sched: Convert the GPU scheduler to variable number of run-queues
>>> *does not* mitigate the failed suspend. (Furthermore, this commit doesn't really change
>>> anything operational, other than using an allocated array, instead of a static one, in DRM,
>>> while the 2nd patch is solely contained within the amdgpu driver code.)
>>>
>>> Leaving us with only this change,
>>> b70438004a14f4 drm/amdgpu: move buffer funcs setting up a level
>>> to be at fault, as the kernel log attached in the linked email above shows.
>>>
>>> The conclusion is that only b70438004a14f4 needs reverting.
>>
>> b70438004a14f4 was a fix for 56e449603f0ac5.  Without b70438004a14f4,
>> 56e449603f0ac5 breaks amdgpu.
> 
> We can try and re-enable it in the next kernel.  I'm just not sure
> we'll be able to fix this in time for 6.7 with the holidays and all
> and I don't want to cause a lot of scheduler churn at the end of the
> 6.7 cycle if we hold off and try and fix it.  Reverting seems like the
> best short term solution.

A lot of subsequent code has come in since commit 56e449603f0ac5, as it opened
the opportunity for a 1-to-1 relationship between an entity and a scheduler.
(Should've always been the case, from the outset. Not sure why it was coded as
a fixed-size array.)

Given that commit 56e449603f0ac5 has nothing to do with amdgpu, and the problem
is wholly contained in amdgpu, and no other driver has this problem, there is
no reason to have to "churn", i.e. go back and forth in DRM, only to cover up
an init bug in amdgpu. See the response I just sent in @this thread:
https://lore.kernel.org/r/05007cb0-871e-4dc7-af58-1351f4ba43e2@gmail.com

And it's not like this issue is unknown. I first posted about it on 2023-10-16. 

Ideally, amdgpu would just fix their init code.
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

  reply	other threads:[~2023-11-29 16:41 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-12  0:46 Radeon regression in 6.6 kernel Phillip Susi
2023-11-12  0:46 ` Phillip Susi
2023-11-12 11:12 ` Bagas Sanjaya
2023-11-12 11:12   ` Bagas Sanjaya
2023-11-12 18:42   ` Phillip Susi
2023-11-12 18:42     ` Phillip Susi
2023-11-19  6:32 ` Linux regression tracking (Thorsten Leemhuis)
2023-11-19  6:32   ` Linux regression tracking (Thorsten Leemhuis)
2023-11-19  6:47   ` Dave Airlie
2023-11-19  6:47     ` Dave Airlie
2023-11-19 13:24     ` Bagas Sanjaya
2023-11-19 13:24       ` Bagas Sanjaya
2023-11-19 13:48       ` Linux regression tracking (Thorsten Leemhuis)
2023-11-19 13:48         ` Linux regression tracking (Thorsten Leemhuis)
2023-11-19 13:53         ` Bagas Sanjaya
2023-11-19 13:53           ` Bagas Sanjaya
2023-11-20 15:57     ` Christian König
2023-11-20 15:57       ` Christian König
2023-11-20 16:08       ` Alex Deucher
2023-11-20 16:08         ` Alex Deucher
2023-11-20 16:08         ` Alex Deucher
2023-11-20 16:24         ` Christian König
2023-11-20 16:24           ` Christian König
2023-11-20 16:24           ` Christian König
2023-11-20 17:31           ` Alex Deucher
2023-11-20 17:31             ` Alex Deucher
2023-11-20 17:31             ` Alex Deucher
2023-11-20 22:40             ` Phillip Susi
2023-11-20 22:40               ` Phillip Susi
2023-11-20 22:40               ` Phillip Susi
2023-11-21 14:05               ` Alex Deucher
2023-11-21 14:05                 ` Alex Deucher
2023-11-21 14:05                 ` Alex Deucher
2023-11-21 22:05                 ` Phillip Susi
2023-11-21 22:05                   ` Phillip Susi
2023-11-21 22:05                   ` Phillip Susi
2023-11-23  1:34                   ` Luben Tuikov
2023-11-23  1:34                     ` Luben Tuikov
2023-11-27 23:24             ` Phillip Susi
2023-11-27 23:24               ` Phillip Susi
2023-11-27 23:24               ` Phillip Susi
2023-11-28 22:13               ` Alex Deucher
2023-11-28 22:13                 ` Alex Deucher
2023-11-28 22:13                 ` Alex Deucher
2023-11-29  4:44                 ` Luben Tuikov
2023-11-29  4:44                   ` Luben Tuikov
2023-11-29 13:50                   ` Alex Deucher
2023-11-29 13:50                     ` Alex Deucher
2023-11-29 15:22                     ` Alex Deucher
2023-11-29 15:22                       ` Alex Deucher
2023-11-29 16:41                       ` Luben Tuikov [this message]
2023-11-29 16:41                         ` Luben Tuikov
2023-11-29 18:52                         ` Alex Deucher
2023-11-29 18:52                           ` Alex Deucher
2023-11-29 20:10                           ` Alex Deucher
2023-11-29 20:10                             ` Alex Deucher
2023-11-29 20:49                             ` Alex Deucher
2023-11-29 20:49                               ` Alex Deucher
2023-11-30  3:36                               ` Luben Tuikov
2023-11-30  3:36                                 ` Luben Tuikov
2023-11-30  3:47                                 ` Luben Tuikov
2023-11-30  3:47                                   ` Luben Tuikov
2023-11-30 23:28                                   ` Alex Deucher
2023-11-30 23:28                                     ` Alex Deucher
2023-11-30 21:29                                 ` Alex Deucher
2023-11-30 21:29                                   ` Alex Deucher
2023-12-01 16:55                               ` Alex Deucher
2023-12-01 16:55                                 ` Alex Deucher
2023-12-03 20:40                                 ` Phillip Susi
2023-12-03 20:40                                   ` Phillip Susi
2023-12-04 14:14                                   ` Alex Deucher
2023-12-04 14:14                                     ` Alex Deucher
2023-12-11 23:50                                     ` Phillip Susi
2023-12-11 23:50                                       ` Phillip Susi
2023-12-12  0:28                                       ` Phillip Susi
2023-12-12  0:28                                         ` Phillip Susi
2023-12-12 14:55                                         ` Alex Deucher
2023-12-12 14:55                                           ` Alex Deucher
2023-11-29 16:20                     ` Luben Tuikov
2023-11-29 16:20                       ` Luben Tuikov
2023-11-29 18:45                       ` Alex Deucher
2023-11-29 18:45                         ` Alex Deucher
2023-11-29 20:24                       ` Phillip Susi
2023-11-29 20:24                         ` Phillip Susi
2023-11-20 22:08       ` Phillip Susi
2023-11-20 22:08         ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9595b8bf-e64d-4926-9263-97e18bcd7d05@gmail.com \
    --to=ltuikov89@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dakr@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phill@thesusis.net \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.