All of lore.kernel.org
 help / color / mirror / Atom feed
From: daniel@ffwll.ch
To: unlisted-recipients:; (no To-header on input)
Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>,
	"Mazin Rezk" <mnrzk@protonmail.com>,
	Duncan <1i5t5.duncan@cox.net>,
	anthony.ruhier@gmail.com, "Kees Cook" <keescook@chromium.org>,
	sunpeng.li@amd.com, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, regressions@leemhuis.info,
	amd-gfx@lists.freedesktop.org,
	"Alexander Deucher" <Alexander.Deucher@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	mphantomx@yahoo.com.br,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free
Date: Tue, 28 Jul 2020 23:58:40 +0200	[thread overview]
Message-ID: <20200728215840.GH6419@phenom.ffwll.local> (raw)
In-Reply-To: <0edb1498-6c43-27cc-b2fb-71ea5ca1a56c@amd.com>

On Tue, Jul 28, 2020 at 01:07:13PM -0400, Kazlauskas, Nicholas wrote:
> On 2020-07-28 5:22 a.m., Paul Menzel wrote:
> > Dear Linux folks,
> > 
> > 
> > Am 25.07.20 um 07:20 schrieb Mazin Rezk:
> > > On Saturday, July 25, 2020 12:59 AM, Duncan wrote:
> > > 
> > > > On Sat, 25 Jul 2020 03:03:52 +0000 Mazin Rezk wrote:
> > > > 
> > > > > > Am 24.07.20 um 19:33 schrieb Kees Cook:
> > > > > > 
> > > > > > > There was a fix to disable the async path for this driver that
> > > > > > > worked around the bug too, yes? That seems like a safer and more
> > > > > > > focused change that doesn't revert the SLUB defense for all
> > > > > > > users, and would actually provide a complete, I think, workaround
> > > > > 
> > > > > That said, I haven't seen the async disabling patch. If you could
> > > > > link to it, I'd be glad to test it out and perhaps we can use that
> > > > > instead.
> > > > 
> > > > I'm confused. Not to put words in Kees' mouth; /I/ am confused (which
> > > > admittedly could well be just because I make no claims to be a
> > > > coder and am simply reading the bug and thread, but I'd appreciate some
> > > > "unconfusing" anyway).
> > > > 
> > > > My interpretation of the "async disabling" reference was that it was to
> > > > comment #30 on the bug:
> > > > 
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=207383#c30
> > > > 
> > > > 
> > > > ... which (if I'm not confused on this point too) appears to be yours.
> > > > There it was stated...
> > > > 
> > > > I've also found that this bug exclusively occurs when commit_work is on
> > > > the workqueue. After forcing drm_atomic_helper_commit to run all of the
> > > > commits without adding to the workqueue and running the OS, the issue
> > > > seems to have disappeared.
> > > > <<<<
> > > > 
> > > > Would not forcing all commits to run directly, without placing them on
> > > > the workqueue, be "async disabling"? That's what I /thought/ he was
> > > > referencing.
> > > 
> > > Oh, I thought he was referring to a different patch. Kees, could I get
> > > your confirmation on this?
> > > 
> > > The change I made actually affected all of the DRM code, although
> > > this could
> > > easily be changed to be specific to amdgpu. (By forcing blocking on
> > > amdgpu_dm's non-blocking commit code)
> > > 
> > > That said, I'd still need to test further because I only did test it
> > > for a
> > > couple of hours then. Although it should work in theory.
> > > 
> > > > OTOH your base/context swap idea sounds like a possibly "less
> > > > disturbance" workaround, if it works, and given the point in the
> > > > commit cycle... (But if it's out Sunday it's likely too late to test
> > > > and get it in now anyway; if it's another week, tho...)
> > > 
> > > The base/context swap idea should make the use-after-free behave how it
> > > did in 5.6. Since the bug doesn't cause an issue in 5.6, it's less of a
> > > "less disturbance" workaround and more of a "no disturbance" workaround.
> > 
> > Sorry for bothering, but is there now a solution, besides reverting the
> > commits, to avoid freezes/crashes *without* performance regressions?
> > 
> > 
> > Kind regards,
> > 
> > Paul
> 
> Mazin's "drm/amd/display: Clear dm_state for fast updates" change
> accomplishes this, at least as a temporary hack.

Yeah I gets it's horrible, but better than nothing. Reverting the old
amdgpu change to a private state object is probably a lot more invasive.

> I've started work on a more large scale fix that we could get in in after.

Does that include a fix for the "stuff needed by irq handler"? Either way
pls cc dri-devel, I think this is something worth of a bit wider
discussion. Feels like unsolved homework from the entire "make DC
integrate into linux" saga ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: daniel@ffwll.ch
Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>,
	mphantomx@yahoo.com.br, Duncan <1i5t5.duncan@cox.net>,
	"Kees Cook" <keescook@chromium.org>,
	sunpeng.li@amd.com, "Mazin Rezk" <mnrzk@protonmail.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	regressions@leemhuis.info, amd-gfx@lists.freedesktop.org,
	"Alexander Deucher" <Alexander.Deucher@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	anthony.ruhier@gmail.com,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free
Date: Tue, 28 Jul 2020 23:58:40 +0200	[thread overview]
Message-ID: <20200728215840.GH6419@phenom.ffwll.local> (raw)
In-Reply-To: <0edb1498-6c43-27cc-b2fb-71ea5ca1a56c@amd.com>

On Tue, Jul 28, 2020 at 01:07:13PM -0400, Kazlauskas, Nicholas wrote:
> On 2020-07-28 5:22 a.m., Paul Menzel wrote:
> > Dear Linux folks,
> > 
> > 
> > Am 25.07.20 um 07:20 schrieb Mazin Rezk:
> > > On Saturday, July 25, 2020 12:59 AM, Duncan wrote:
> > > 
> > > > On Sat, 25 Jul 2020 03:03:52 +0000 Mazin Rezk wrote:
> > > > 
> > > > > > Am 24.07.20 um 19:33 schrieb Kees Cook:
> > > > > > 
> > > > > > > There was a fix to disable the async path for this driver that
> > > > > > > worked around the bug too, yes? That seems like a safer and more
> > > > > > > focused change that doesn't revert the SLUB defense for all
> > > > > > > users, and would actually provide a complete, I think, workaround
> > > > > 
> > > > > That said, I haven't seen the async disabling patch. If you could
> > > > > link to it, I'd be glad to test it out and perhaps we can use that
> > > > > instead.
> > > > 
> > > > I'm confused. Not to put words in Kees' mouth; /I/ am confused (which
> > > > admittedly could well be just because I make no claims to be a
> > > > coder and am simply reading the bug and thread, but I'd appreciate some
> > > > "unconfusing" anyway).
> > > > 
> > > > My interpretation of the "async disabling" reference was that it was to
> > > > comment #30 on the bug:
> > > > 
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=207383#c30
> > > > 
> > > > 
> > > > ... which (if I'm not confused on this point too) appears to be yours.
> > > > There it was stated...
> > > > 
> > > > I've also found that this bug exclusively occurs when commit_work is on
> > > > the workqueue. After forcing drm_atomic_helper_commit to run all of the
> > > > commits without adding to the workqueue and running the OS, the issue
> > > > seems to have disappeared.
> > > > <<<<
> > > > 
> > > > Would not forcing all commits to run directly, without placing them on
> > > > the workqueue, be "async disabling"? That's what I /thought/ he was
> > > > referencing.
> > > 
> > > Oh, I thought he was referring to a different patch. Kees, could I get
> > > your confirmation on this?
> > > 
> > > The change I made actually affected all of the DRM code, although
> > > this could
> > > easily be changed to be specific to amdgpu. (By forcing blocking on
> > > amdgpu_dm's non-blocking commit code)
> > > 
> > > That said, I'd still need to test further because I only did test it
> > > for a
> > > couple of hours then. Although it should work in theory.
> > > 
> > > > OTOH your base/context swap idea sounds like a possibly "less
> > > > disturbance" workaround, if it works, and given the point in the
> > > > commit cycle... (But if it's out Sunday it's likely too late to test
> > > > and get it in now anyway; if it's another week, tho...)
> > > 
> > > The base/context swap idea should make the use-after-free behave how it
> > > did in 5.6. Since the bug doesn't cause an issue in 5.6, it's less of a
> > > "less disturbance" workaround and more of a "no disturbance" workaround.
> > 
> > Sorry for bothering, but is there now a solution, besides reverting the
> > commits, to avoid freezes/crashes *without* performance regressions?
> > 
> > 
> > Kind regards,
> > 
> > Paul
> 
> Mazin's "drm/amd/display: Clear dm_state for fast updates" change
> accomplishes this, at least as a temporary hack.

Yeah I gets it's horrible, but better than nothing. Reverting the old
amdgpu change to a private state object is probably a lot more invasive.

> I've started work on a more large scale fix that we could get in in after.

Does that include a fix for the "stuff needed by irq handler"? Either way
pls cc dri-devel, I think this is something worth of a bit wider
discussion. Feels like unsolved homework from the entire "make DC
integrate into linux" saga ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: daniel@ffwll.ch
Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>,
	mphantomx@yahoo.com.br, Duncan <1i5t5.duncan@cox.net>,
	"Kees Cook" <keescook@chromium.org>,
	sunpeng.li@amd.com, "Mazin Rezk" <mnrzk@protonmail.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	regressions@leemhuis.info, amd-gfx@lists.freedesktop.org,
	"Alexander Deucher" <Alexander.Deucher@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	anthony.ruhier@gmail.com,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free
Date: Tue, 28 Jul 2020 23:58:40 +0200	[thread overview]
Message-ID: <20200728215840.GH6419@phenom.ffwll.local> (raw)
In-Reply-To: <0edb1498-6c43-27cc-b2fb-71ea5ca1a56c@amd.com>

On Tue, Jul 28, 2020 at 01:07:13PM -0400, Kazlauskas, Nicholas wrote:
> On 2020-07-28 5:22 a.m., Paul Menzel wrote:
> > Dear Linux folks,
> > 
> > 
> > Am 25.07.20 um 07:20 schrieb Mazin Rezk:
> > > On Saturday, July 25, 2020 12:59 AM, Duncan wrote:
> > > 
> > > > On Sat, 25 Jul 2020 03:03:52 +0000 Mazin Rezk wrote:
> > > > 
> > > > > > Am 24.07.20 um 19:33 schrieb Kees Cook:
> > > > > > 
> > > > > > > There was a fix to disable the async path for this driver that
> > > > > > > worked around the bug too, yes? That seems like a safer and more
> > > > > > > focused change that doesn't revert the SLUB defense for all
> > > > > > > users, and would actually provide a complete, I think, workaround
> > > > > 
> > > > > That said, I haven't seen the async disabling patch. If you could
> > > > > link to it, I'd be glad to test it out and perhaps we can use that
> > > > > instead.
> > > > 
> > > > I'm confused. Not to put words in Kees' mouth; /I/ am confused (which
> > > > admittedly could well be just because I make no claims to be a
> > > > coder and am simply reading the bug and thread, but I'd appreciate some
> > > > "unconfusing" anyway).
> > > > 
> > > > My interpretation of the "async disabling" reference was that it was to
> > > > comment #30 on the bug:
> > > > 
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=207383#c30
> > > > 
> > > > 
> > > > ... which (if I'm not confused on this point too) appears to be yours.
> > > > There it was stated...
> > > > 
> > > > I've also found that this bug exclusively occurs when commit_work is on
> > > > the workqueue. After forcing drm_atomic_helper_commit to run all of the
> > > > commits without adding to the workqueue and running the OS, the issue
> > > > seems to have disappeared.
> > > > <<<<
> > > > 
> > > > Would not forcing all commits to run directly, without placing them on
> > > > the workqueue, be "async disabling"? That's what I /thought/ he was
> > > > referencing.
> > > 
> > > Oh, I thought he was referring to a different patch. Kees, could I get
> > > your confirmation on this?
> > > 
> > > The change I made actually affected all of the DRM code, although
> > > this could
> > > easily be changed to be specific to amdgpu. (By forcing blocking on
> > > amdgpu_dm's non-blocking commit code)
> > > 
> > > That said, I'd still need to test further because I only did test it
> > > for a
> > > couple of hours then. Although it should work in theory.
> > > 
> > > > OTOH your base/context swap idea sounds like a possibly "less
> > > > disturbance" workaround, if it works, and given the point in the
> > > > commit cycle... (But if it's out Sunday it's likely too late to test
> > > > and get it in now anyway; if it's another week, tho...)
> > > 
> > > The base/context swap idea should make the use-after-free behave how it
> > > did in 5.6. Since the bug doesn't cause an issue in 5.6, it's less of a
> > > "less disturbance" workaround and more of a "no disturbance" workaround.
> > 
> > Sorry for bothering, but is there now a solution, besides reverting the
> > commits, to avoid freezes/crashes *without* performance regressions?
> > 
> > 
> > Kind regards,
> > 
> > Paul
> 
> Mazin's "drm/amd/display: Clear dm_state for fast updates" change
> accomplishes this, at least as a temporary hack.

Yeah I gets it's horrible, but better than nothing. Reverting the old
amdgpu change to a private state object is probably a lot more invasive.

> I've started work on a more large scale fix that we could get in in after.

Does that include a fix for the "stuff needed by irq handler"? Either way
pls cc dri-devel, I think this is something worth of a bit wider
discussion. Feels like unsolved homework from the entire "make DC
integrate into linux" saga ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2020-07-28 21:58 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-23 21:10 [PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free Mazin Rezk
2020-07-23 21:10 ` Mazin Rezk
2020-07-23 21:10 ` Mazin Rezk
2020-07-23 22:16 ` Kazlauskas, Nicholas
2020-07-23 22:16   ` Kazlauskas, Nicholas
2020-07-23 22:16   ` Kazlauskas, Nicholas
2020-07-23 22:57   ` Mazin Rezk
2020-07-23 22:57     ` Mazin Rezk
2020-07-23 22:57     ` Mazin Rezk
2020-07-24 21:09     ` Mazin Rezk
2020-07-24 21:09       ` Mazin Rezk
2020-07-24 21:09       ` Mazin Rezk
2020-07-23 22:32 ` Kees Cook
2020-07-23 22:32   ` Kees Cook
2020-07-23 22:32   ` Kees Cook
2020-07-23 22:58   ` Mazin Rezk
2020-07-23 22:58     ` Mazin Rezk
2020-07-23 22:58     ` Mazin Rezk
2020-07-24  7:26     ` Christian König
2020-07-24  7:26       ` Christian König
2020-07-24  7:26       ` Christian König
2020-07-24  7:45   ` Paul Menzel
2020-07-24  7:45     ` Paul Menzel
2020-07-24  7:45     ` Paul Menzel
2020-07-24 17:33     ` Kees Cook
2020-07-24 17:33       ` Kees Cook
2020-07-24 17:33       ` Kees Cook
2020-07-24 21:19       ` Paul Menzel
2020-07-24 21:19         ` Paul Menzel
2020-07-24 21:19         ` Paul Menzel
2020-07-25  3:03         ` Mazin Rezk
2020-07-25  3:03           ` Mazin Rezk
2020-07-25  3:03           ` Mazin Rezk
2020-07-25  4:59           ` Duncan
2020-07-25  4:59             ` Duncan
2020-07-25  4:59             ` Duncan
2020-07-25  5:20             ` Mazin Rezk
2020-07-25  5:20               ` Mazin Rezk
2020-07-25  5:20               ` Mazin Rezk
2020-07-28  9:22               ` Paul Menzel
2020-07-28  9:22                 ` Paul Menzel
2020-07-28  9:22                 ` Paul Menzel
2020-07-28 17:07                 ` Kazlauskas, Nicholas
2020-07-28 17:07                   ` Kazlauskas, Nicholas
2020-07-28 17:07                   ` Kazlauskas, Nicholas
2020-07-28 21:58                   ` daniel [this message]
2020-07-28 21:58                     ` daniel
2020-07-28 21:58                     ` daniel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200728215840.GH6419@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=1i5t5.duncan@cox.net \
    --cc=Alexander.Deucher@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=anthony.ruhier@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mnrzk@protonmail.com \
    --cc=mphantomx@yahoo.com.br \
    --cc=pmenzel@molgen.mpg.de \
    --cc=regressions@leemhuis.info \
    --cc=sunpeng.li@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.