From: Daniel Vetter <daniel@ffwll.ch> To: Michal Hocko <mhocko@kernel.org> Cc: "Daniel Vetter" <daniel.vetter@ffwll.ch>, LKML <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>, "Intel Graphics Development" <intel-gfx@lists.freedesktop.org>, "DRI Development" <dri-devel@lists.freedesktop.org>, "Andrew Morton" <akpm@linux-foundation.org>, "David Rientjes" <rientjes@google.com>, "Christian König" <christian.koenig@amd.com>, "Jérôme Glisse" <jglisse@redhat.com>, "Daniel Vetter" <daniel.vetter@intel.com> Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable Date: Fri, 23 Nov 2018 13:38:38 +0100 [thread overview] Message-ID: <20181123123838.GL4266@phenom.ffwll.local> (raw) In-Reply-To: <20181123111237.GE8625@dhcp22.suse.cz> On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote: > On Thu 22-11-18 17:51:05, Daniel Vetter wrote: > > We need to make sure implementations don't cheat and don't have a > > possible schedule/blocking point deeply burried where review can't > > catch it. > > > > I'm not sure whether this is the best way to make sure all the > > might_sleep() callsites trigger, and it's a bit ugly in the code flow. > > But it gets the job done. > > Yeah, it is quite ugly. Especially because it makes DEBUG config > bahavior much different. So is this really worth it? Has this already > discovered any existing bug? Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already. There's been patches floating with this kind of bug I think, and the call chains we're dealing with a fairly deep. I don't trust review to reliably catch this kind of fail, that's why I'm looking into tools to better validat this stuff to augment review. And yes it's ugly :-/ Wrt the behavior difference: I guess we could put another counter into the task struct, and change might_sleep() to check it. All under CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable sideeffect. My worry with that is that people will spot it, and abuse it in creative ways that do affect semantics. See horrors like drm_can_sleep() (and I'm sure gfx folks are not the only ones who seriously lacked taste here). Up to the experts really how to best paint this shed I think. Thanks, Daniel > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: David Rientjes <rientjes@google.com> > > Cc: "Christian König" <christian.koenig@amd.com> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Cc: "Jérôme Glisse" <jglisse@redhat.com> > > Cc: linux-mm@kvack.org > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > --- > > mm/mmu_notifier.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > > index 59e102589a25..4d282cfb296e 100644 > > --- a/mm/mmu_notifier.c > > +++ b/mm/mmu_notifier.c > > @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, > > id = srcu_read_lock(&srcu); > > hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { > > if (mn->ops->invalidate_range_start) { > > - int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable); > > + int _ret; > > + > > + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable) > > + preempt_disable(); > > + _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable); > > + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable) > > + preempt_enable(); > > if (_ret) { > > pr_info("%pS callback failed with %d in %sblockable context.\n", > > mn->ops->invalidate_range_start, _ret, > > -- > > 2.19.1 > > > > -- > Michal Hocko > SUSE Labs -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch> To: Michal Hocko <mhocko@kernel.org> Cc: "Daniel Vetter" <daniel.vetter@ffwll.ch>, LKML <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>, "Intel Graphics Development" <intel-gfx@lists.freedesktop.org>, "DRI Development" <dri-devel@lists.freedesktop.org>, "Andrew Morton" <akpm@linux-foundation.org>, "David Rientjes" <rientjes@google.com>, "Christian König" <christian.koenig@amd.com>, "Jérôme Glisse" <jglisse@redhat.com>, "Daniel Vetter" <daniel.vetter@intel.com> Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable Date: Fri, 23 Nov 2018 13:38:38 +0100 [thread overview] Message-ID: <20181123123838.GL4266@phenom.ffwll.local> (raw) In-Reply-To: <20181123111237.GE8625@dhcp22.suse.cz> On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote: > On Thu 22-11-18 17:51:05, Daniel Vetter wrote: > > We need to make sure implementations don't cheat and don't have a > > possible schedule/blocking point deeply burried where review can't > > catch it. > > > > I'm not sure whether this is the best way to make sure all the > > might_sleep() callsites trigger, and it's a bit ugly in the code flow. > > But it gets the job done. > > Yeah, it is quite ugly. Especially because it makes DEBUG config > bahavior much different. So is this really worth it? Has this already > discovered any existing bug? Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already. There's been patches floating with this kind of bug I think, and the call chains we're dealing with a fairly deep. I don't trust review to reliably catch this kind of fail, that's why I'm looking into tools to better validat this stuff to augment review. And yes it's ugly :-/ Wrt the behavior difference: I guess we could put another counter into the task struct, and change might_sleep() to check it. All under CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable sideeffect. My worry with that is that people will spot it, and abuse it in creative ways that do affect semantics. See horrors like drm_can_sleep() (and I'm sure gfx folks are not the only ones who seriously lacked taste here). Up to the experts really how to best paint this shed I think. Thanks, Daniel > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: David Rientjes <rientjes@google.com> > > Cc: "Christian K�nig" <christian.koenig@amd.com> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Cc: "J�r�me Glisse" <jglisse@redhat.com> > > Cc: linux-mm@kvack.org > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > --- > > mm/mmu_notifier.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > > index 59e102589a25..4d282cfb296e 100644 > > --- a/mm/mmu_notifier.c > > +++ b/mm/mmu_notifier.c > > @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, > > id = srcu_read_lock(&srcu); > > hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { > > if (mn->ops->invalidate_range_start) { > > - int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable); > > + int _ret; > > + > > + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable) > > + preempt_disable(); > > + _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable); > > + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable) > > + preempt_enable(); > > if (_ret) { > > pr_info("%pS callback failed with %d in %sblockable context.\n", > > mn->ops->invalidate_range_start, _ret, > > -- > > 2.19.1 > > > > -- > Michal Hocko > SUSE Labs -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
next prev parent reply other threads:[~2018-11-23 12:38 UTC|newest] Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-11-22 16:51 [PATCH 0/3] RFC: mmu notifier debug checks Daniel Vetter 2018-11-22 16:51 ` [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail Daniel Vetter 2018-11-22 16:53 ` [Intel-gfx] " Chris Wilson 2018-11-22 16:53 ` Chris Wilson 2018-11-22 16:53 ` [Intel-gfx] " Chris Wilson 2018-11-23 8:49 ` Daniel Vetter 2018-11-23 11:14 ` Michal Hocko 2018-11-22 18:50 ` Koenig, Christian 2018-11-22 18:50 ` Koenig, Christian 2018-11-23 11:15 ` Michal Hocko 2018-11-23 11:15 ` Michal Hocko 2018-11-23 11:15 ` Michal Hocko 2018-11-23 12:30 ` Daniel Vetter 2018-11-23 12:30 ` Daniel Vetter 2018-11-23 12:30 ` Daniel Vetter 2018-11-23 12:43 ` Michal Hocko 2018-11-23 13:15 ` Daniel Vetter 2018-11-23 13:15 ` Daniel Vetter 2018-11-23 13:30 ` Michal Hocko 2018-11-22 16:51 ` [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter 2018-11-22 18:55 ` Koenig, Christian 2018-11-22 18:55 ` Koenig, Christian 2018-11-23 8:46 ` Daniel Vetter 2018-11-23 8:46 ` Daniel Vetter 2018-11-23 8:46 ` Daniel Vetter 2018-11-23 10:14 ` Christian König 2018-11-23 11:12 ` Michal Hocko 2018-11-23 11:12 ` Michal Hocko 2018-11-23 12:38 ` Daniel Vetter [this message] 2018-11-23 12:38 ` Daniel Vetter 2018-11-23 12:46 ` Michal Hocko 2018-11-23 13:12 ` Daniel Vetter 2018-11-23 13:12 ` Daniel Vetter 2018-11-23 13:23 ` [Intel-gfx] " Tvrtko Ursulin 2018-11-22 16:51 ` [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start Daniel Vetter 2018-11-22 16:51 ` Daniel Vetter 2018-11-27 7:49 ` Daniel Vetter 2018-11-27 7:49 ` Daniel Vetter 2018-11-27 7:49 ` Daniel Vetter 2018-11-27 16:49 ` [Intel-gfx] " Chris Wilson 2018-11-27 16:49 ` Chris Wilson 2018-11-27 17:28 ` Daniel Vetter 2018-11-27 17:28 ` Daniel Vetter 2018-11-27 17:33 ` [Intel-gfx] " Chris Wilson 2018-11-27 17:33 ` Chris Wilson 2018-11-27 17:39 ` [Intel-gfx] " Daniel Vetter 2018-11-27 17:39 ` Daniel Vetter 2018-11-27 17:39 ` Daniel Vetter 2018-11-22 18:09 ` ✗ Fi.CI.CHECKPATCH: warning for RFC: mmu notifier debug checks Patchwork 2018-11-22 18:26 ` ✓ Fi.CI.BAT: success " Patchwork 2018-11-23 0:27 ` ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181123123838.GL4266@phenom.ffwll.local \ --to=daniel@ffwll.ch \ --cc=akpm@linux-foundation.org \ --cc=christian.koenig@amd.com \ --cc=daniel.vetter@ffwll.ch \ --cc=daniel.vetter@intel.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=intel-gfx@lists.freedesktop.org \ --cc=jglisse@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=rientjes@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.