All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Michal Hocko <mhocko@kernel.org>
Cc: "Daniel Vetter" <daniel.vetter@ffwll.ch>,
	LKML <linux-kernel@vger.kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Rientjes" <rientjes@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Daniel Vetter" <daniel.vetter@intel.com>
Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable
Date: Fri, 23 Nov 2018 13:38:38 +0100	[thread overview]
Message-ID: <20181123123838.GL4266@phenom.ffwll.local> (raw)
In-Reply-To: <20181123111237.GE8625@dhcp22.suse.cz>

On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
> On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> > We need to make sure implementations don't cheat and don't have a
> > possible schedule/blocking point deeply burried where review can't
> > catch it.
> > 
> > I'm not sure whether this is the best way to make sure all the
> > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > But it gets the job done.
> 
> Yeah, it is quite ugly. Especially because it makes DEBUG config
> bahavior much different. So is this really worth it? Has this already
> discovered any existing bug?

Given that we need an oom trigger to hit this we're not hitting this in CI
(oom is just way to unpredictable to even try). I'd kinda like to also add
some debug interface so I can provoke an oom kill of a specially prepared
process, to make sure we can reliably exercise this path without killing
the kernel accidentally. We do similar tricks for our shrinker already.

There's been patches floating with this kind of bug I think, and the call
chains we're dealing with a fairly deep. I don't trust review to reliably
catch this kind of fail, that's why I'm looking into tools to better
validat this stuff to augment review.

And yes it's ugly :-/

Wrt the behavior difference: I guess we could put another counter into the
task struct, and change might_sleep() to check it. All under
CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable
sideeffect. My worry with that is that people will spot it, and abuse it
in creative ways that do affect semantics. See horrors like
drm_can_sleep() (and I'm sure gfx folks are not the only ones who
seriously lacked taste here).

Up to the experts really how to best paint this shed I think.

Thanks, Daniel

> 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  mm/mmu_notifier.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index 59e102589a25..4d282cfb296e 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> >  	id = srcu_read_lock(&srcu);
> >  	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> >  		if (mn->ops->invalidate_range_start) {
> > -			int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > +			int _ret;
> > +
> > +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > +				preempt_disable();
> > +			_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > +				preempt_enable();
> >  			if (_ret) {
> >  				pr_info("%pS callback failed with %d in %sblockable context.\n",
> >  						mn->ops->invalidate_range_start, _ret,
> > -- 
> > 2.19.1
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Michal Hocko <mhocko@kernel.org>
Cc: "Daniel Vetter" <daniel.vetter@ffwll.ch>,
	LKML <linux-kernel@vger.kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Rientjes" <rientjes@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Daniel Vetter" <daniel.vetter@intel.com>
Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable
Date: Fri, 23 Nov 2018 13:38:38 +0100	[thread overview]
Message-ID: <20181123123838.GL4266@phenom.ffwll.local> (raw)
In-Reply-To: <20181123111237.GE8625@dhcp22.suse.cz>

On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
> On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> > We need to make sure implementations don't cheat and don't have a
> > possible schedule/blocking point deeply burried where review can't
> > catch it.
> > 
> > I'm not sure whether this is the best way to make sure all the
> > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > But it gets the job done.
> 
> Yeah, it is quite ugly. Especially because it makes DEBUG config
> bahavior much different. So is this really worth it? Has this already
> discovered any existing bug?

Given that we need an oom trigger to hit this we're not hitting this in CI
(oom is just way to unpredictable to even try). I'd kinda like to also add
some debug interface so I can provoke an oom kill of a specially prepared
process, to make sure we can reliably exercise this path without killing
the kernel accidentally. We do similar tricks for our shrinker already.

There's been patches floating with this kind of bug I think, and the call
chains we're dealing with a fairly deep. I don't trust review to reliably
catch this kind of fail, that's why I'm looking into tools to better
validat this stuff to augment review.

And yes it's ugly :-/

Wrt the behavior difference: I guess we could put another counter into the
task struct, and change might_sleep() to check it. All under
CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable
sideeffect. My worry with that is that people will spot it, and abuse it
in creative ways that do affect semantics. See horrors like
drm_can_sleep() (and I'm sure gfx folks are not the only ones who
seriously lacked taste here).

Up to the experts really how to best paint this shed I think.

Thanks, Daniel

> 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: "Christian K�nig" <christian.koenig@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "J�r�me Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  mm/mmu_notifier.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index 59e102589a25..4d282cfb296e 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> >  	id = srcu_read_lock(&srcu);
> >  	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> >  		if (mn->ops->invalidate_range_start) {
> > -			int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > +			int _ret;
> > +
> > +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > +				preempt_disable();
> > +			_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > +				preempt_enable();
> >  			if (_ret) {
> >  				pr_info("%pS callback failed with %d in %sblockable context.\n",
> >  						mn->ops->invalidate_range_start, _ret,
> > -- 
> > 2.19.1
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2018-11-23 12:38 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-22 16:51 [PATCH 0/3] RFC: mmu notifier debug checks Daniel Vetter
2018-11-22 16:51 ` [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail Daniel Vetter
2018-11-22 16:53   ` [Intel-gfx] " Chris Wilson
2018-11-22 16:53     ` Chris Wilson
2018-11-22 16:53     ` [Intel-gfx] " Chris Wilson
2018-11-23  8:49     ` Daniel Vetter
2018-11-23 11:14       ` Michal Hocko
2018-11-22 18:50   ` Koenig, Christian
2018-11-22 18:50     ` Koenig, Christian
2018-11-23 11:15   ` Michal Hocko
2018-11-23 11:15     ` Michal Hocko
2018-11-23 11:15     ` Michal Hocko
2018-11-23 12:30     ` Daniel Vetter
2018-11-23 12:30       ` Daniel Vetter
2018-11-23 12:30       ` Daniel Vetter
2018-11-23 12:43       ` Michal Hocko
2018-11-23 13:15         ` Daniel Vetter
2018-11-23 13:15           ` Daniel Vetter
2018-11-23 13:30           ` Michal Hocko
2018-11-22 16:51 ` [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter
2018-11-22 18:55   ` Koenig, Christian
2018-11-22 18:55     ` Koenig, Christian
2018-11-23  8:46     ` Daniel Vetter
2018-11-23  8:46       ` Daniel Vetter
2018-11-23  8:46       ` Daniel Vetter
2018-11-23 10:14       ` Christian König
2018-11-23 11:12   ` Michal Hocko
2018-11-23 11:12     ` Michal Hocko
2018-11-23 12:38     ` Daniel Vetter [this message]
2018-11-23 12:38       ` Daniel Vetter
2018-11-23 12:46       ` Michal Hocko
2018-11-23 13:12         ` Daniel Vetter
2018-11-23 13:12           ` Daniel Vetter
2018-11-23 13:23           ` [Intel-gfx] " Tvrtko Ursulin
2018-11-22 16:51 ` [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start Daniel Vetter
2018-11-22 16:51   ` Daniel Vetter
2018-11-27  7:49   ` Daniel Vetter
2018-11-27  7:49     ` Daniel Vetter
2018-11-27  7:49     ` Daniel Vetter
2018-11-27 16:49     ` [Intel-gfx] " Chris Wilson
2018-11-27 16:49       ` Chris Wilson
2018-11-27 17:28       ` Daniel Vetter
2018-11-27 17:28         ` Daniel Vetter
2018-11-27 17:33         ` [Intel-gfx] " Chris Wilson
2018-11-27 17:33           ` Chris Wilson
2018-11-27 17:39           ` [Intel-gfx] " Daniel Vetter
2018-11-27 17:39             ` Daniel Vetter
2018-11-27 17:39             ` Daniel Vetter
2018-11-22 18:09 ` ✗ Fi.CI.CHECKPATCH: warning for RFC: mmu notifier debug checks Patchwork
2018-11-22 18:26 ` ✓ Fi.CI.BAT: success " Patchwork
2018-11-23  0:27 ` ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181123123838.GL4266@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=akpm@linux-foundation.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.