All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Lisovskiy, Stanislav" <stanislav.lisovskiy@intel.com>
To: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix NULL ptr deref by checking new_crtc_state
Date: Fri, 5 May 2023 21:18:02 +0300	[thread overview]
Message-ID: <ZFVIWgT/KVIsJdR+@intel.com> (raw)
In-Reply-To: <ZFUyW1B6trFe29_i@intel.com>

On Fri, May 05, 2023 at 07:44:11PM +0300, Ville Syrjälä wrote:
> On Fri, May 05, 2023 at 06:55:18PM +0300, Lisovskiy, Stanislav wrote:
> > On Fri, May 05, 2023 at 05:17:06PM +0300, Ville Syrjälä wrote:
> > > On Fri, May 05, 2023 at 05:05:55PM +0300, Lisovskiy, Stanislav wrote:
> > > > On Fri, May 05, 2023 at 04:57:54PM +0300, Ville Syrjälä wrote:
> > > > > On Fri, May 05, 2023 at 04:42:33PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > On Fri, May 05, 2023 at 04:28:50PM +0300, Ville Syrjälä wrote:
> > > > > > > On Fri, May 05, 2023 at 04:21:16PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > On Fri, May 05, 2023 at 04:11:52PM +0300, Ville Syrjälä wrote:
> > > > > > > > > On Fri, May 05, 2023 at 03:54:58PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > On Fri, May 05, 2023 at 03:46:40PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > On Fri, May 05, 2023 at 03:27:51PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > On Fri, May 05, 2023 at 03:09:01PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > On Fri, May 05, 2023 at 02:41:24PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:25:46PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:20:17PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:06:34PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:05:27PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:02:43PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 01:58:03PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 01:54:14PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 11:22:12AM +0300, Stanislav Lisovskiy wrote:
> > > > > > > > > > > > > > > > > > > > > > intel_atomic_get_new_crtc_state can return NULL, unless crtc state wasn't
> > > > > > > > > > > > > > > > > > > > > > obtained previously with intel_atomic_get_crtc_state, so we must check it
> > > > > > > > > > > > > > > > > > > > > > for NULLness here, just as in many other places, where we can't guarantee
> > > > > > > > > > > > > > > > > > > > > > that intel_atomic_get_crtc_state was called.
> > > > > > > > > > > > > > > > > > > > > > We are currently getting NULL ptr deref because of that, so this fix was
> > > > > > > > > > > > > > > > > > > > > > confirmed to help.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Fixes: 74a75dc90869 ("drm/i915/display: move plane prepare/cleanup to intel_atomic_plane.c")
> > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > > > > > > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > > > > > > >  drivers/gpu/drm/i915/display/intel_atomic_plane.c | 4 ++--
> > > > > > > > > > > > > > > > > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.c b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> > > > > > > > > > > > > > > > > > > > > > index 9f670dcfe76e..4125ee07a271 100644
> > > > > > > > > > > > > > > > > > > > > > --- a/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> > > > > > > > > > > > > > > > > > > > > > +++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> > > > > > > > > > > > > > > > > > > > > > @@ -1029,7 +1029,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
> > > > > > > > > > > > > > > > > > > > > >  	int ret;
> > > > > > > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > > > > >  	if (old_obj) {
> > > > > > > > > > > > > > > > > > > > > > -		const struct intel_crtc_state *crtc_state =
> > > > > > > > > > > > > > > > > > > > > > +		const struct intel_crtc_state *new_crtc_state =
> > > > > > > > > > > > > > > > > > > > > >  			intel_atomic_get_new_crtc_state(state,
> > > > > > > > > > > > > > > > > > > > > >  							to_intel_crtc(old_plane_state->hw.crtc));
> > > > > > > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > > > > > @@ -1044,7 +1044,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
> > > > > > > > > > > > > > > > > > > > > >  		 * This should only fail upon a hung GPU, in which case we
> > > > > > > > > > > > > > > > > > > > > >  		 * can safely continue.
> > > > > > > > > > > > > > > > > > > > > >  		 */
> > > > > > > > > > > > > > > > > > > > > > -		if (intel_crtc_needs_modeset(crtc_state)) {
> > > > > > > > > > > > > > > > > > > > > > +		if (new_crtc_state && intel_crtc_needs_modeset(new_crtc_state)) {
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > NAK. We need to fix the bug instead of paparing over it.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > I had pushed this already.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > It didn't even finish CI. Please revert.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Swati did run CI and verified that fix helps. I'm _not_ going to revert.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Fine. I'll do it.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Problem is that you don't even care to explain, why this fix is wrong, but simply
> > > > > > > > > > > > > > > > act in authoritarian way, instead of having constructive discussion.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I've explanined this one about a hundred times. The NULL pointer should
> > > > > > > > > > > > > > > not happen. Someone needs to actually analyze what is happening instead
> > > > > > > > > > > > > > > of just adding randomg NULL checks all over the place.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I do get this point. However why are we doing those check in other places then?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We do then when they are actually necessary.
> > > > > > > > > > > > 
> > > > > > > > > > > > Well but for example when we do check like if(new_bw_state) in intel_bw.c,
> > > > > > > > > > > > we are also might be having potentially some silent bugs.
> > > > > > > > > > > > Would you guarantee that if we remove all if(crtc_state) and if(new_bw_state) checks
> > > > > > > > > > > > in our code, that there won't be NULL pointer dereferences? I bet you don't.
> > > > > > > > > > > 
> > > > > > > > > > > We have the checks where they are needed. The check in
> > > > > > > > > > > intel_bw_atomic_check() (if that's the one you mean)
> > > > > > > > > > > looks entirely correct to me.
> > > > > > > > > > 
> > > > > > > > > > Typo in my prev message, I meant intel_atomic_get_bw_state..but common idea is the same.
> > > > > > > > > 
> > > > > > > > > get_state() vs. get_{new,old}_state() are entirely different
> > > > > > > > > things.
> > > > > > > > > 
> > > > > > > > > You use get_state() when you really want the state to be
> > > > > > > > > included, and either
> > > > > > > > > - know the state isn't included already, or
> > > > > > > > > - you don't know wether the might have alerady been included
> > > > > > > > > 
> > > > > > > > > And one must of course remember that get_state() can
> > > > > > > > > - fail so error handling is needed
> > > > > > > > > - only be used during the check phase, and is illegal during the
> > > > > > > > >   commit phase.
> > > > > > > > 
> > > > > > > > Sure I know this. I even remember we discussed this many times.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > The get_{new,old}_state() (or the various for loop variants)
> > > > > > > > > you can use when you either:
> > > > > > > > > - know that the state is included already
> > > > > > > > > - are fine with the state potentially not being included
> > > > > > > > 
> > > > > > > > Don't you see that it is a bit of a contradiction in those 2 above??
> > > > > > > > 
> > > > > > > > You can't be "know that the state is included already" and 
> > > > > > > > "are fine with the state potentially not being included" same time :)
> > > > > > > > 
> > > > > > > > Those 2 above actually mean that you CANNOT be sure, because you 
> > > > > > > > are "fine with the state potentially not being included"! 
> > > > > > > > Otherwise second one would have been redundant.
> > > > > > > 
> > > > > > > No. You are either fine with NULL, XOR you know that
> > > > > > > the state is there already. There is no contradiction.
> > > > > > 
> > > > > > I do get that. But that way of calling the function is veeery counterintuitive.
> > > > > > Means that you call it and check for NULLness..if you are fine with NULL and
> > > > > > don't check for NULL..if you aren't fine with it and expect the state to be there.
> > > > > > 
> > > > > > That is really probabilistic design.
> > > > > > I think we must enumerate all the cases where 
> > > > > 
> > > > > Not sure what you mean with enumerate. You can't just delcare
> > > > > somewhere globally that in functions X and Y NULL is fine,
> > > > > and in Z it is not. It depends on how X,Y,Z are implemented
> > > > > and it may change any time the implementation is changed.
> > > > > 
> > > > > 
> > > > > > 1) we expect new_state to be there and
> > > > > >    then we don't need even any checks to be there, because we will then rely on get_state.
> > > > > > 2) we don't expect it to be there and then call get_state always.
> > > > > > 
> > > > > > Because if you are "fine" with new_state being NULL, why even calling it?
> > > > > 
> > > > > Because
> > > > > !NULL -> you have some work to do
> > > > >  NULL -> you don't have work to do
> > > > 
> > > > Pretty sure we could find a way not to call it at all in case if no work is needed,
> > > > and call it without any checks, if work is needed.
> > > > 
> > > > You typically get new bw state to recalculate and compare with old state, however
> > > > there has to be some place where you decide whether to call get_bw/crtc_state or not.
> > > > So from there, this could have been propagated to the moment where we decide where
> > > > to call get_new_bw/crtc_state or not. Then no checks would have been needed.
> > > > And NULL would always mean a bug.
> > > > Also that would be a lot more simple, following KISS principle.
> > > 
> > > You'd need to separately track each case in some boolean/etc.
> > > in the overall atomic state. Doable? Sure. Simpler? Don't see
> > > it. It's the exact same code with the NULL check just replaced
> > > with some other check. And you must additionally remember to
> > > sprinkle those bool assignments around.
> > 
> > No-no-no. This is how intel_atomic_get_bw_state is called:
> > 
> > for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) {
> > 	new_bw_state = intel_atomic_get_bw_state(state);
> 
> That's just because we don't need to do anything to the 
> bw state unless some crtc is doing stuff.
> 
> > 
> > 
> > Basically in any subsequent check, if it is called after that,
> > whenever its called under for_each_new_intel_crtc_in_state, you 
> > can be sure that intel_atomic_get_new_bw_state returns non-NULL.
> 
> intel_atomic_get_new_bw_state() is never called from a loop
> like that. At least I can't immediately see a single place
> where that would happen.

We used to do this before, however here I just put this as an example.

> 
> And there is no guarantee anyway that a crtc being part
> of the commit would imply that bw state is also included.
> The crtc could have been added to the commit after the
> code ran which adds the bw state.

Well-well, crtc has been added to the state after code which adds
the bw state ran.. Does it mean that we are actually
then getting intel_atomic_get_new_bw_state as NULL, despite
we have a crtc in state? Sounds like you just described one of the possible 
similar scenarios, why we are having this bug.
I.e we ran that code:

for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) {
     new_bw_state = intel_atomic_get_bw_state(state);

but as you mentioned this doesn't mean that we got a bw state
because there might have been no crtc.
Then it gets added later and then we call intel_atomic_get_new_bw_state
and bum.
But then checking for NULL is also wrong, because we should have called
intel_atomic_get_bw_state for the newly added crtc?..

Sometimes I think, we should make some kind of a doc, with a guidelines,
similar like we have for some other areas, describing how should code
flow be in each of the typical scenarios, plus the guidelines, how to use
it.

Stan

> 
> -- 
> Ville Syrjälä
> Intel

  reply	other threads:[~2023-05-05 18:19 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-05  8:22 [Intel-gfx] [PATCH] drm/i915: Fix NULL ptr deref by checking new_crtc_state Stanislav Lisovskiy
2023-05-05  9:11 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2023-05-05  9:29 ` [Intel-gfx] [PATCH] " Andrzej Hajda
2023-05-05 10:28   ` Lisovskiy, Stanislav
2023-05-05 10:54 ` Ville Syrjälä
2023-05-05 10:58   ` Lisovskiy, Stanislav
2023-05-05 11:02     ` Ville Syrjälä
2023-05-05 11:05       ` Lisovskiy, Stanislav
2023-05-05 11:06         ` Ville Syrjälä
2023-05-05 11:08           ` Lisovskiy, Stanislav
2023-05-05 11:20           ` Lisovskiy, Stanislav
2023-05-05 11:25             ` Ville Syrjälä
2023-05-05 11:41               ` Lisovskiy, Stanislav
2023-05-05 12:09                 ` Ville Syrjälä
2023-05-05 12:27                   ` Lisovskiy, Stanislav
2023-05-05 12:46                     ` Ville Syrjälä
2023-05-05 12:52                       ` Lisovskiy, Stanislav
2023-05-05 13:52                         ` Ville Syrjälä
2023-05-05 12:54                       ` Lisovskiy, Stanislav
2023-05-05 13:11                         ` Ville Syrjälä
2023-05-05 13:21                           ` Lisovskiy, Stanislav
2023-05-05 13:28                             ` Ville Syrjälä
2023-05-05 13:42                               ` Lisovskiy, Stanislav
2023-05-05 13:57                                 ` Ville Syrjälä
2023-05-05 14:05                                   ` Lisovskiy, Stanislav
2023-05-05 14:17                                     ` Ville Syrjälä
2023-05-05 15:55                                       ` Lisovskiy, Stanislav
2023-05-05 16:44                                         ` Ville Syrjälä
2023-05-05 18:18                                           ` Lisovskiy, Stanislav [this message]
2023-05-06  0:38 ` [Intel-gfx] ✓ Fi.CI.IGT: success for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZFVIWgT/KVIsJdR+@intel.com \
    --to=stanislav.lisovskiy@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=ville.syrjala@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.