linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] mmu notifier debug annotations/checks
@ 2019-08-20  8:18 Daniel Vetter
  2019-08-20  8:18 ` [PATCH 1/4] mm, notifier: Add a lockdep map for invalidate_range_start/end Daniel Vetter
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Daniel Vetter @ 2019-08-20  8:18 UTC (permalink / raw)
  To: LKML; +Cc: Linux MM, DRI Development, Intel Graphics Development, Daniel Vetter

Hi all,

Here's the respin. Changes:

- 2 patches for checking return values of callbacks dropped, they landed

- move the lockdep annotations ahead, since I think that part is less
  contentious. lockdep map now also annotates invalidate_range_end, as
  requested by Jason.

- add a patch to prime lockdep, idea from Jason, let's hear whether the
  implementation fits.

- I've stuck with the non_block_start/end for now and not switched back to
  preempt_disable/enable, but with comments as suggested by Andrew.
  Hopefully that fits the bill, otherwise I can go back again if the
  consensus is more there.

Review, comments and ideas very much welcome.

Cheers, Daniel

Daniel Vetter (4):
  mm, notifier: Add a lockdep map for invalidate_range_start/end
  mm, notifier: Prime lockdep
  kernel.h: Add non_block_start/end()
  mm, notifier: Catch sleeping/blocking for !blockable

 include/linux/kernel.h       | 25 ++++++++++++++++++++++++-
 include/linux/mmu_notifier.h |  8 ++++++++
 include/linux/sched.h        |  4 ++++
 kernel/sched/core.c          | 19 ++++++++++++++-----
 mm/mmu_notifier.c            | 24 +++++++++++++++++++++++-
 5 files changed, 73 insertions(+), 7 deletions(-)

-- 
2.23.0.rc1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] mm, notifier: Add a lockdep map for invalidate_range_start/end
  2019-08-20  8:18 [PATCH 0/4] mmu notifier debug annotations/checks Daniel Vetter
@ 2019-08-20  8:18 ` Daniel Vetter
  2019-08-20 13:31   ` Jason Gunthorpe
  2019-08-20  8:19 ` [PATCH 2/4] mm, notifier: Prime lockdep Daniel Vetter
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2019-08-20  8:18 UTC (permalink / raw)
  To: LKML
  Cc: Linux MM, DRI Development, Intel Graphics Development,
	Daniel Vetter, Jason Gunthorpe, Chris Wilson, Andrew Morton,
	David Rientjes, Jérôme Glisse, Michal Hocko,
	Christian König, Greg Kroah-Hartman, Mike Rapoport,
	Daniel Vetter

This is a similar idea to the fs_reclaim fake lockdep lock. It's
fairly easy to provoke a specific notifier to be run on a specific
range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for
all the various callchains that might lead to them. But both at the
same time is really hard to reliable hit, especially when you want to
exercise paths like direct reclaim or compaction, where it's not
easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep
to see a lot more dependencies, without having to actually hit them
in a single challchain while testing.

On Jason's suggestion this is is rolled out for both
invalidate_range_start and invalidate_range_end. They both have the
same calling context, hence we can share the same lockdep map. Note
that the annotation for invalidate_ranage_start is outside of the
mm_has_notifiers(), to make sure lockdep is informed about all paths
leading to this context irrespective of whether mmu notifiers are
present for a given context. We don't do that on the
invalidate_range_end side to avoid paying the overhead twice, there
the lockdep annotation is pushed down behind the mm_has_notifiers()
check.

v2: Use lock_map_acquire/release() like fs_reclaim, to avoid confusion
with this being a real mutex (Chris Wilson).

v3: Rebase on top of Glisse's arg rework.

v4: Also annotate invalidate_range_end (Jason Gunthorpe)
Also annotate invalidate_range_start_nonblock, I somehow missed that
one in the first version.

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/mmu_notifier.h | 8 ++++++++
 mm/mmu_notifier.c            | 9 +++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index b6c004bd9f6a..39a86b77a939 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -42,6 +42,10 @@ enum mmu_notifier_event {
 
 #ifdef CONFIG_MMU_NOTIFIER
 
+#ifdef CONFIG_LOCKDEP
+extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
+#endif
+
 /*
  * The mmu notifier_mm structure is allocated and installed in
  * mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -310,19 +314,23 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
 static inline void
 mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 {
+	lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
 	if (mm_has_notifiers(range->mm)) {
 		range->flags |= MMU_NOTIFIER_RANGE_BLOCKABLE;
 		__mmu_notifier_invalidate_range_start(range);
 	}
+	lock_map_release(&__mmu_notifier_invalidate_range_start_map);
 }
 
 static inline int
 mmu_notifier_invalidate_range_start_nonblock(struct mmu_notifier_range *range)
 {
+	lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
 	if (mm_has_notifiers(range->mm)) {
 		range->flags &= ~MMU_NOTIFIER_RANGE_BLOCKABLE;
 		return __mmu_notifier_invalidate_range_start(range);
 	}
+	lock_map_release(&__mmu_notifier_invalidate_range_start_map);
 	return 0;
 }
 
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 16f1cbc775d0..d12e3079e7a4 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -21,6 +21,13 @@
 /* global SRCU for all MMs */
 DEFINE_STATIC_SRCU(srcu);
 
+#ifdef CONFIG_LOCKDEP
+struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
+	.name = "mmu_notifier_invalidate_range_start"
+};
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map);
+#endif
+
 /*
  * This function allows mmu_notifier::release callback to delay a call to
  * a function that will free appropriate resources. The function must be
@@ -197,6 +204,7 @@ void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *range,
 	struct mmu_notifier *mn;
 	int id;
 
+	lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
 		/*
@@ -220,6 +228,7 @@ void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *range,
 			mn->ops->invalidate_range_end(mn, range);
 	}
 	srcu_read_unlock(&srcu, id);
+	lock_map_release(&__mmu_notifier_invalidate_range_start_map);
 }
 EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_end);
 
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] mm, notifier: Prime lockdep
  2019-08-20  8:18 [PATCH 0/4] mmu notifier debug annotations/checks Daniel Vetter
  2019-08-20  8:18 ` [PATCH 1/4] mm, notifier: Add a lockdep map for invalidate_range_start/end Daniel Vetter
@ 2019-08-20  8:19 ` Daniel Vetter
  2019-08-20 13:31   ` Jason Gunthorpe
  2019-08-20  8:19 ` [PATCH 3/4] kernel.h: Add non_block_start/end() Daniel Vetter
  2019-08-20  8:19 ` [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter
  3 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2019-08-20  8:19 UTC (permalink / raw)
  To: LKML
  Cc: Linux MM, DRI Development, Intel Graphics Development,
	Daniel Vetter, Jason Gunthorpe, Chris Wilson, Andrew Morton,
	David Rientjes, Jérôme Glisse, Michal Hocko,
	Christian König, Greg Kroah-Hartman, Mike Rapoport,
	Daniel Vetter

We want to teach lockdep that mmu notifiers can be called from direct
reclaim paths, since on many CI systems load might never reach that
level (e.g. when just running fuzzer or small functional tests).

Motivated by a discussion with Jason.

I've put the annotation into mmu_notifier_register since only when we
have mmu notifiers registered is there any point in teaching lockdep
about them. Also, we already have a kmalloc(, GFP_KERNEL), so this is
safe.

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 mm/mmu_notifier.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index d12e3079e7a4..538d3bb87f9b 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -256,6 +256,13 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
 
 	BUG_ON(atomic_read(&mm->mm_users) <= 0);
 
+	if (IS_ENABLED(CONFIG_LOCKDEP)) {
+		fs_reclaim_acquire(GFP_KERNEL);
+		lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
+		lock_map_release(&__mmu_notifier_invalidate_range_start_map);
+		fs_reclaim_release(GFP_KERNEL);
+	}
+
 	ret = -ENOMEM;
 	mmu_notifier_mm = kmalloc(sizeof(struct mmu_notifier_mm), GFP_KERNEL);
 	if (unlikely(!mmu_notifier_mm))
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-20  8:18 [PATCH 0/4] mmu notifier debug annotations/checks Daniel Vetter
  2019-08-20  8:18 ` [PATCH 1/4] mm, notifier: Add a lockdep map for invalidate_range_start/end Daniel Vetter
  2019-08-20  8:19 ` [PATCH 2/4] mm, notifier: Prime lockdep Daniel Vetter
@ 2019-08-20  8:19 ` Daniel Vetter
  2019-08-20 20:24   ` Daniel Vetter
  2019-08-20  8:19 ` [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter
  3 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2019-08-20  8:19 UTC (permalink / raw)
  To: LKML
  Cc: Linux MM, DRI Development, Intel Graphics Development,
	Daniel Vetter, Jason Gunthorpe, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Michal Hocko, David Rientjes,
	Christian König, Jérôme Glisse, Masahiro Yamada,
	Wei Wang, Andy Shevchenko, Thomas Gleixner, Jann Horn, Feng Tang,
	Kees Cook, Randy Dunlap, Daniel Vetter

In some special cases we must not block, but there's not a
spinlock, preempt-off, irqs-off or similar critical section already
that arms the might_sleep() debug checks. Add a non_block_start/end()
pair to annotate these.

This will be used in the oom paths of mmu-notifiers, where blocking is
not allowed to make sure there's forward progress. Quoting Michal:

"The notifier is called from quite a restricted context - oom_reaper -
which shouldn't depend on any locks or sleepable conditionals. The code
should be swift as well but we mostly do care about it to make a forward
progress. Checking for sleepable context is the best thing we could come
up with that would describe these demands at least partially."

Peter also asked whether we want to catch spinlocks on top, but Michal
said those are less of a problem because spinlocks can't have an
indirect dependency upon the page allocator and hence close the loop
with the oom reaper.

Suggested by Michal Hocko.

v2:
- Improve commit message (Michal)
- Also check in schedule, not just might_sleep (Peter)

v3: It works better when I actually squash in the fixup I had lying
around :-/

v4: Pick the suggestion from Andrew Morton to give non_block_start/end
some good kerneldoc comments. I added that other blocking calls like
wait_event pose similar issues, since that's the other example we
discussed.

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Wei Wang <wvw@google.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jannh@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-kernel@vger.kernel.org
Acked-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/kernel.h | 25 ++++++++++++++++++++++++-
 include/linux/sched.h  |  4 ++++
 kernel/sched/core.c    | 19 ++++++++++++++-----
 3 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4fa360a13c1e..82f84cfe372f 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -217,7 +217,9 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
  * might_sleep - annotation for functions that can sleep
  *
  * this macro will print a stack trace if it is executed in an atomic
- * context (spinlock, irq-handler, ...).
+ * context (spinlock, irq-handler, ...). Additional sections where blocking is
+ * not allowed can be annotated with non_block_start() and non_block_end()
+ * pairs.
  *
  * This is a useful debugging help to be able to catch problems early and not
  * be bitten later when the calling function happens to sleep when it is not
@@ -233,6 +235,25 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
 # define cant_sleep() \
 	do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
 # define sched_annotate_sleep()	(current->task_state_change = 0)
+/**
+ * non_block_start - annotate the start of section where sleeping is prohibited
+ *
+ * This is on behalf of the oom reaper, specifically when it is calling the mmu
+ * notifiers. The problem is that if the notifier were to block on, for example,
+ * mutex_lock() and if the process which holds that mutex were to perform a
+ * sleeping memory allocation, the oom reaper is now blocked on completion of
+ * that memory allocation. Other blocking calls like wait_event() pose similar
+ * issues.
+ */
+# define non_block_start() \
+	do { current->non_block_count++; } while (0)
+/**
+ * non_block_end - annotate the end of section where sleeping is prohibited
+ *
+ * Closes a section opened by non_block_start().
+ */
+# define non_block_end() \
+	do { WARN_ON(current->non_block_count-- == 0); } while (0)
 #else
   static inline void ___might_sleep(const char *file, int line,
 				   int preempt_offset) { }
@@ -241,6 +262,8 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
 # define might_sleep() do { might_resched(); } while (0)
 # define cant_sleep() do { } while (0)
 # define sched_annotate_sleep() do { } while (0)
+# define non_block_start() do { } while (0)
+# define non_block_end() do { } while (0)
 #endif
 
 #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9f51932bd543..c5630f3dca1f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -974,6 +974,10 @@ struct task_struct {
 	struct mutex_waiter		*blocked_on;
 #endif
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+	int				non_block_count;
+#endif
+
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned int			irq_events;
 	unsigned long			hardirq_enable_ip;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2b037f195473..57245770d6cc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3700,13 +3700,22 @@ static noinline void __schedule_bug(struct task_struct *prev)
 /*
  * Various schedule()-time debugging checks and statistics:
  */
-static inline void schedule_debug(struct task_struct *prev)
+static inline void schedule_debug(struct task_struct *prev, bool preempt)
 {
 #ifdef CONFIG_SCHED_STACK_END_CHECK
 	if (task_stack_end_corrupted(prev))
 		panic("corrupted stack end detected inside scheduler\n");
 #endif
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+	if (!preempt && prev->state && prev->non_block_count) {
+		printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",
+			prev->comm, prev->pid, prev->non_block_count);
+		dump_stack();
+		add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
+	}
+#endif
+
 	if (unlikely(in_atomic_preempt_off())) {
 		__schedule_bug(prev);
 		preempt_count_set(PREEMPT_DISABLED);
@@ -3813,7 +3822,7 @@ static void __sched notrace __schedule(bool preempt)
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
 
-	schedule_debug(prev);
+	schedule_debug(prev, preempt);
 
 	if (sched_feat(HRTICK))
 		hrtick_clear(rq);
@@ -6570,7 +6579,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 	rcu_sleep_check();
 
 	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
-	     !is_idle_task(current)) ||
+	     !is_idle_task(current) && !current->non_block_count) ||
 	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
 	    oops_in_progress)
 		return;
@@ -6586,8 +6595,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 		"BUG: sleeping function called from invalid context at %s:%d\n",
 			file, line);
 	printk(KERN_ERR
-		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
-			in_atomic(), irqs_disabled(),
+		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
+			in_atomic(), irqs_disabled(), current->non_block_count,
 			current->pid, current->comm);
 
 	if (task_stack_end_corrupted(current))
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-20  8:18 [PATCH 0/4] mmu notifier debug annotations/checks Daniel Vetter
                   ` (2 preceding siblings ...)
  2019-08-20  8:19 ` [PATCH 3/4] kernel.h: Add non_block_start/end() Daniel Vetter
@ 2019-08-20  8:19 ` Daniel Vetter
  2019-08-20 13:34   ` Jason Gunthorpe
  3 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2019-08-20  8:19 UTC (permalink / raw)
  To: LKML
  Cc: Linux MM, DRI Development, Intel Graphics Development,
	Daniel Vetter, Jason Gunthorpe, Andrew Morton, Michal Hocko,
	David Rientjes, Christian König, Jérôme Glisse,
	Daniel Vetter

We need to make sure implementations don't cheat and don't have a
possible schedule/blocking point deeply burried where review can't
catch it.

I'm not sure whether this is the best way to make sure all the
might_sleep() callsites trigger, and it's a bit ugly in the code flow.
But it gets the job done.

Inspired by an i915 patch series which did exactly that, because the
rules haven't been entirely clear to us.

v2: Use the shiny new non_block_start/end annotations instead of
abusing preempt_disable/enable.

v3: Rebase on top of Glisse's arg rework.

v4: Rebase on top of more Glisse rework.

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 mm/mmu_notifier.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 538d3bb87f9b..856636d06ee0 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->invalidate_range_start) {
-			int _ret = mn->ops->invalidate_range_start(mn, range);
+			int _ret;
+
+			if (!mmu_notifier_range_blockable(range))
+				non_block_start();
+			_ret = mn->ops->invalidate_range_start(mn, range);
+			if (!mmu_notifier_range_blockable(range))
+				non_block_end();
 			if (_ret) {
 				pr_info("%pS callback failed with %d in %sblockable context.\n",
 					mn->ops->invalidate_range_start, _ret,
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] mm, notifier: Add a lockdep map for invalidate_range_start/end
  2019-08-20  8:18 ` [PATCH 1/4] mm, notifier: Add a lockdep map for invalidate_range_start/end Daniel Vetter
@ 2019-08-20 13:31   ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2019-08-20 13:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Chris Wilson, Andrew Morton, David Rientjes,
	Jérôme Glisse, Michal Hocko, Christian König,
	Greg Kroah-Hartman, Mike Rapoport, Daniel Vetter

On Tue, Aug 20, 2019 at 10:18:59AM +0200, Daniel Vetter wrote:
> This is a similar idea to the fs_reclaim fake lockdep lock. It's
> fairly easy to provoke a specific notifier to be run on a specific
> range: Just prep it, and then munmap() it.
> 
> A bit harder, but still doable, is to provoke the mmu notifiers for
> all the various callchains that might lead to them. But both at the
> same time is really hard to reliable hit, especially when you want to
> exercise paths like direct reclaim or compaction, where it's not
> easy to control what exactly will be unmapped.
> 
> By introducing a lockdep map to tie them all together we allow lockdep
> to see a lot more dependencies, without having to actually hit them
> in a single challchain while testing.
> 
> On Jason's suggestion this is is rolled out for both
> invalidate_range_start and invalidate_range_end. They both have the
> same calling context, hence we can share the same lockdep map. Note
> that the annotation for invalidate_ranage_start is outside of the
> mm_has_notifiers(), to make sure lockdep is informed about all paths
> leading to this context irrespective of whether mmu notifiers are
> present for a given context. We don't do that on the
> invalidate_range_end side to avoid paying the overhead twice, there
> the lockdep annotation is pushed down behind the mm_has_notifiers()
> check.
> 
> v2: Use lock_map_acquire/release() like fs_reclaim, to avoid confusion
> with this being a real mutex (Chris Wilson).
> 
> v3: Rebase on top of Glisse's arg rework.
> 
> v4: Also annotate invalidate_range_end (Jason Gunthorpe)
> Also annotate invalidate_range_start_nonblock, I somehow missed that
> one in the first version.
> 
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  include/linux/mmu_notifier.h | 8 ++++++++
>  mm/mmu_notifier.c            | 9 +++++++++
>  2 files changed, 17 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] mm, notifier: Prime lockdep
  2019-08-20  8:19 ` [PATCH 2/4] mm, notifier: Prime lockdep Daniel Vetter
@ 2019-08-20 13:31   ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2019-08-20 13:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Chris Wilson, Andrew Morton, David Rientjes,
	Jérôme Glisse, Michal Hocko, Christian König,
	Greg Kroah-Hartman, Mike Rapoport, Daniel Vetter

On Tue, Aug 20, 2019 at 10:19:00AM +0200, Daniel Vetter wrote:
> We want to teach lockdep that mmu notifiers can be called from direct
> reclaim paths, since on many CI systems load might never reach that
> level (e.g. when just running fuzzer or small functional tests).
> 
> Motivated by a discussion with Jason.
> 
> I've put the annotation into mmu_notifier_register since only when we
> have mmu notifiers registered is there any point in teaching lockdep
> about them. Also, we already have a kmalloc(, GFP_KERNEL), so this is
> safe.
> 
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>  mm/mmu_notifier.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index d12e3079e7a4..538d3bb87f9b 100644
> +++ b/mm/mmu_notifier.c
> @@ -256,6 +256,13 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
>  
>  	BUG_ON(atomic_read(&mm->mm_users) <= 0);
>  
> +	if (IS_ENABLED(CONFIG_LOCKDEP)) {
> +		fs_reclaim_acquire(GFP_KERNEL);
> +		lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
> +		lock_map_release(&__mmu_notifier_invalidate_range_start_map);
> +		fs_reclaim_release(GFP_KERNEL);
> +	}

Lets try it out at least

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-20  8:19 ` [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter
@ 2019-08-20 13:34   ` Jason Gunthorpe
  2019-08-20 15:18     ` Daniel Vetter
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-08-20 13:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Andrew Morton, Michal Hocko, David Rientjes,
	Christian König, Jérôme Glisse, Daniel Vetter

On Tue, Aug 20, 2019 at 10:19:02AM +0200, Daniel Vetter wrote:
> We need to make sure implementations don't cheat and don't have a
> possible schedule/blocking point deeply burried where review can't
> catch it.
> 
> I'm not sure whether this is the best way to make sure all the
> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> But it gets the job done.
> 
> Inspired by an i915 patch series which did exactly that, because the
> rules haven't been entirely clear to us.
> 
> v2: Use the shiny new non_block_start/end annotations instead of
> abusing preempt_disable/enable.
> 
> v3: Rebase on top of Glisse's arg rework.
> 
> v4: Rebase on top of more Glisse rework.
> 
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>  mm/mmu_notifier.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 538d3bb87f9b..856636d06ee0 100644
> +++ b/mm/mmu_notifier.c
> @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
>  	id = srcu_read_lock(&srcu);
>  	hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
>  		if (mn->ops->invalidate_range_start) {
> -			int _ret = mn->ops->invalidate_range_start(mn, range);
> +			int _ret;
> +
> +			if (!mmu_notifier_range_blockable(range))
> +				non_block_start();
> +			_ret = mn->ops->invalidate_range_start(mn, range);
> +			if (!mmu_notifier_range_blockable(range))
> +				non_block_end();

If someone Acks all the sched changes then I can pick this for
hmm.git, but I still think the existing pre-emption debugging is fine
for this use case.

Also, same comment as for the lockdep map, this needs to apply to the
non-blocking range_end also.

Anyhow, since this series has conflicts with hmm.git it would be best
to flow through the whole thing through that tree. If there are no
remarks on the first two patches I'll grab them in a few days.

Regards,
Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-20 13:34   ` Jason Gunthorpe
@ 2019-08-20 15:18     ` Daniel Vetter
  2019-08-20 15:27       ` Jason Gunthorpe
  2019-08-21 15:41       ` Daniel Vetter
  0 siblings, 2 replies; 25+ messages in thread
From: Daniel Vetter @ 2019-08-20 15:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, LKML, Linux MM, DRI Development,
	Intel Graphics Development, Andrew Morton, Michal Hocko,
	David Rientjes, Christian König, Jérôme Glisse,
	Daniel Vetter

On Tue, Aug 20, 2019 at 10:34:18AM -0300, Jason Gunthorpe wrote:
> On Tue, Aug 20, 2019 at 10:19:02AM +0200, Daniel Vetter wrote:
> > We need to make sure implementations don't cheat and don't have a
> > possible schedule/blocking point deeply burried where review can't
> > catch it.
> > 
> > I'm not sure whether this is the best way to make sure all the
> > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > But it gets the job done.
> > 
> > Inspired by an i915 patch series which did exactly that, because the
> > rules haven't been entirely clear to us.
> > 
> > v2: Use the shiny new non_block_start/end annotations instead of
> > abusing preempt_disable/enable.
> > 
> > v3: Rebase on top of Glisse's arg rework.
> > 
> > v4: Rebase on top of more Glisse rework.
> > 
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >  mm/mmu_notifier.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index 538d3bb87f9b..856636d06ee0 100644
> > +++ b/mm/mmu_notifier.c
> > @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> >  	id = srcu_read_lock(&srcu);
> >  	hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
> >  		if (mn->ops->invalidate_range_start) {
> > -			int _ret = mn->ops->invalidate_range_start(mn, range);
> > +			int _ret;
> > +
> > +			if (!mmu_notifier_range_blockable(range))
> > +				non_block_start();
> > +			_ret = mn->ops->invalidate_range_start(mn, range);
> > +			if (!mmu_notifier_range_blockable(range))
> > +				non_block_end();
> 
> If someone Acks all the sched changes then I can pick this for
> hmm.git, but I still think the existing pre-emption debugging is fine
> for this use case.

Ok, I'll ping Peter Z. for an ack, iirc he was involved.

> Also, same comment as for the lockdep map, this needs to apply to the
> non-blocking range_end also.

Hm, I thought the page table locks we're holding there already prevent any
sleeping, so would be redundant? But reading through code I think that's
not guaranteed, so yeah makes sense to add it for invalidate_range_end
too. I'll respin once I have the ack/nack from scheduler people.

> Anyhow, since this series has conflicts with hmm.git it would be best
> to flow through the whole thing through that tree. If there are no
> remarks on the first two patches I'll grab them in a few days.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-20 15:18     ` Daniel Vetter
@ 2019-08-20 15:27       ` Jason Gunthorpe
  2019-08-21  9:34         ` Daniel Vetter
  2019-08-21 15:41       ` Daniel Vetter
  1 sibling, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-08-20 15:27 UTC (permalink / raw)
  To: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Andrew Morton, Michal Hocko, David Rientjes,
	Christian König, Jérôme Glisse, Daniel Vetter

On Tue, Aug 20, 2019 at 05:18:10PM +0200, Daniel Vetter wrote:
> > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > > index 538d3bb87f9b..856636d06ee0 100644
> > > +++ b/mm/mmu_notifier.c
> > > @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> > >  	id = srcu_read_lock(&srcu);
> > >  	hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
> > >  		if (mn->ops->invalidate_range_start) {
> > > -			int _ret = mn->ops->invalidate_range_start(mn, range);
> > > +			int _ret;
> > > +
> > > +			if (!mmu_notifier_range_blockable(range))
> > > +				non_block_start();
> > > +			_ret = mn->ops->invalidate_range_start(mn, range);
> > > +			if (!mmu_notifier_range_blockable(range))
> > > +				non_block_end();
> > 
> > If someone Acks all the sched changes then I can pick this for
> > hmm.git, but I still think the existing pre-emption debugging is fine
> > for this use case.
> 
> Ok, I'll ping Peter Z. for an ack, iirc he was involved.
> 
> > Also, same comment as for the lockdep map, this needs to apply to the
> > non-blocking range_end also.
> 
> Hm, I thought the page table locks we're holding there already prevent any
> sleeping, so would be redundant?

AFAIK no. All callers of invalidate_range_start/end pairs do so a few
lines apart and don't change their locking in between - thus since
start can block so can end.

Would love to know if that is not true??

Similarly I've also been idly wondering if we should add a
'might_sleep()' to invalidate_rangestart/end() to make this constraint
clear & tested to the mm side?

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-20  8:19 ` [PATCH 3/4] kernel.h: Add non_block_start/end() Daniel Vetter
@ 2019-08-20 20:24   ` Daniel Vetter
  2019-08-22 23:14     ` Andrew Morton
  2019-08-23  8:48     ` Peter Zijlstra
  0 siblings, 2 replies; 25+ messages in thread
From: Daniel Vetter @ 2019-08-20 20:24 UTC (permalink / raw)
  To: LKML
  Cc: Linux MM, DRI Development, Intel Graphics Development,
	Daniel Vetter, Jason Gunthorpe, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Michal Hocko, David Rientjes,
	Christian König, Jérôme Glisse, Masahiro Yamada,
	Wei Wang, Andy Shevchenko, Thomas Gleixner, Jann Horn, Feng Tang,
	Kees Cook, Randy Dunlap, Daniel Vetter

On Tue, Aug 20, 2019 at 10:19:01AM +0200, Daniel Vetter wrote:
> In some special cases we must not block, but there's not a
> spinlock, preempt-off, irqs-off or similar critical section already
> that arms the might_sleep() debug checks. Add a non_block_start/end()
> pair to annotate these.
> 
> This will be used in the oom paths of mmu-notifiers, where blocking is
> not allowed to make sure there's forward progress. Quoting Michal:
> 
> "The notifier is called from quite a restricted context - oom_reaper -
> which shouldn't depend on any locks or sleepable conditionals. The code
> should be swift as well but we mostly do care about it to make a forward
> progress. Checking for sleepable context is the best thing we could come
> up with that would describe these demands at least partially."
> 
> Peter also asked whether we want to catch spinlocks on top, but Michal
> said those are less of a problem because spinlocks can't have an
> indirect dependency upon the page allocator and hence close the loop
> with the oom reaper.
> 
> Suggested by Michal Hocko.
> 
> v2:
> - Improve commit message (Michal)
> - Also check in schedule, not just might_sleep (Peter)
> 
> v3: It works better when I actually squash in the fixup I had lying
> around :-/
> 
> v4: Pick the suggestion from Andrew Morton to give non_block_start/end
> some good kerneldoc comments. I added that other blocking calls like
> wait_event pose similar issues, since that's the other example we
> discussed.
> 
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Wei Wang <wvw@google.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Jann Horn <jannh@google.com>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: linux-kernel@vger.kernel.org
> Acked-by: Christian König <christian.koenig@amd.com> (v1)
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Hi Peter,

Iirc you've been involved at least somewhat in discussing this. -mm folks
are a bit undecided whether these new non_block semantics are a good idea.
Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
are less enthusiastic. Jason said he's ok with merging the hmm side of
this if scheduler folks ack. If not, then I'll respin with the
preempt_disable/enable instead like in v1.

So ack/nack for this from the scheduler side?

Thanks, Daniel

> ---
>  include/linux/kernel.h | 25 ++++++++++++++++++++++++-
>  include/linux/sched.h  |  4 ++++
>  kernel/sched/core.c    | 19 ++++++++++++++-----
>  3 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 4fa360a13c1e..82f84cfe372f 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -217,7 +217,9 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
>   * might_sleep - annotation for functions that can sleep
>   *
>   * this macro will print a stack trace if it is executed in an atomic
> - * context (spinlock, irq-handler, ...).
> + * context (spinlock, irq-handler, ...). Additional sections where blocking is
> + * not allowed can be annotated with non_block_start() and non_block_end()
> + * pairs.
>   *
>   * This is a useful debugging help to be able to catch problems early and not
>   * be bitten later when the calling function happens to sleep when it is not
> @@ -233,6 +235,25 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
>  # define cant_sleep() \
>  	do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
>  # define sched_annotate_sleep()	(current->task_state_change = 0)
> +/**
> + * non_block_start - annotate the start of section where sleeping is prohibited
> + *
> + * This is on behalf of the oom reaper, specifically when it is calling the mmu
> + * notifiers. The problem is that if the notifier were to block on, for example,
> + * mutex_lock() and if the process which holds that mutex were to perform a
> + * sleeping memory allocation, the oom reaper is now blocked on completion of
> + * that memory allocation. Other blocking calls like wait_event() pose similar
> + * issues.
> + */
> +# define non_block_start() \
> +	do { current->non_block_count++; } while (0)
> +/**
> + * non_block_end - annotate the end of section where sleeping is prohibited
> + *
> + * Closes a section opened by non_block_start().
> + */
> +# define non_block_end() \
> +	do { WARN_ON(current->non_block_count-- == 0); } while (0)
>  #else
>    static inline void ___might_sleep(const char *file, int line,
>  				   int preempt_offset) { }
> @@ -241,6 +262,8 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
>  # define might_sleep() do { might_resched(); } while (0)
>  # define cant_sleep() do { } while (0)
>  # define sched_annotate_sleep() do { } while (0)
> +# define non_block_start() do { } while (0)
> +# define non_block_end() do { } while (0)
>  #endif
>  
>  #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 9f51932bd543..c5630f3dca1f 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -974,6 +974,10 @@ struct task_struct {
>  	struct mutex_waiter		*blocked_on;
>  #endif
>  
> +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> +	int				non_block_count;
> +#endif
> +
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  	unsigned int			irq_events;
>  	unsigned long			hardirq_enable_ip;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2b037f195473..57245770d6cc 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3700,13 +3700,22 @@ static noinline void __schedule_bug(struct task_struct *prev)
>  /*
>   * Various schedule()-time debugging checks and statistics:
>   */
> -static inline void schedule_debug(struct task_struct *prev)
> +static inline void schedule_debug(struct task_struct *prev, bool preempt)
>  {
>  #ifdef CONFIG_SCHED_STACK_END_CHECK
>  	if (task_stack_end_corrupted(prev))
>  		panic("corrupted stack end detected inside scheduler\n");
>  #endif
>  
> +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> +	if (!preempt && prev->state && prev->non_block_count) {
> +		printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",
> +			prev->comm, prev->pid, prev->non_block_count);
> +		dump_stack();
> +		add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
> +	}
> +#endif
> +
>  	if (unlikely(in_atomic_preempt_off())) {
>  		__schedule_bug(prev);
>  		preempt_count_set(PREEMPT_DISABLED);
> @@ -3813,7 +3822,7 @@ static void __sched notrace __schedule(bool preempt)
>  	rq = cpu_rq(cpu);
>  	prev = rq->curr;
>  
> -	schedule_debug(prev);
> +	schedule_debug(prev, preempt);
>  
>  	if (sched_feat(HRTICK))
>  		hrtick_clear(rq);
> @@ -6570,7 +6579,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  	rcu_sleep_check();
>  
>  	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
> -	     !is_idle_task(current)) ||
> +	     !is_idle_task(current) && !current->non_block_count) ||
>  	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
>  	    oops_in_progress)
>  		return;
> @@ -6586,8 +6595,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  		"BUG: sleeping function called from invalid context at %s:%d\n",
>  			file, line);
>  	printk(KERN_ERR
> -		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
> -			in_atomic(), irqs_disabled(),
> +		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
> +			in_atomic(), irqs_disabled(), current->non_block_count,
>  			current->pid, current->comm);
>  
>  	if (task_stack_end_corrupted(current))
> -- 
> 2.23.0.rc1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-20 15:27       ` Jason Gunthorpe
@ 2019-08-21  9:34         ` Daniel Vetter
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel Vetter @ 2019-08-21  9:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Andrew Morton, Michal Hocko, David Rientjes,
	Christian König, Jérôme Glisse, Daniel Vetter

On Wed, Aug 21, 2019 at 9:33 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Aug 20, 2019 at 05:18:10PM +0200, Daniel Vetter wrote:
> > > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > > > index 538d3bb87f9b..856636d06ee0 100644
> > > > +++ b/mm/mmu_notifier.c
> > > > @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> > > >   id = srcu_read_lock(&srcu);
> > > >   hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
> > > >           if (mn->ops->invalidate_range_start) {
> > > > -                 int _ret = mn->ops->invalidate_range_start(mn, range);
> > > > +                 int _ret;
> > > > +
> > > > +                 if (!mmu_notifier_range_blockable(range))
> > > > +                         non_block_start();
> > > > +                 _ret = mn->ops->invalidate_range_start(mn, range);
> > > > +                 if (!mmu_notifier_range_blockable(range))
> > > > +                         non_block_end();
> > >
> > > If someone Acks all the sched changes then I can pick this for
> > > hmm.git, but I still think the existing pre-emption debugging is fine
> > > for this use case.
> >
> > Ok, I'll ping Peter Z. for an ack, iirc he was involved.
> >
> > > Also, same comment as for the lockdep map, this needs to apply to the
> > > non-blocking range_end also.
> >
> > Hm, I thought the page table locks we're holding there already prevent any
> > sleeping, so would be redundant?
>
> AFAIK no. All callers of invalidate_range_start/end pairs do so a few
> lines apart and don't change their locking in between - thus since
> start can block so can end.
>
> Would love to know if that is not true??

Yeah I reviewed them, I think I mixed up a discussion I had a while
ago with Jerome. It's a bit tricky to follow in the code since in some
places ->invalidate_range and ->invalidate_range_end seem to be called
from the same place, in others not at all.

> Similarly I've also been idly wondering if we should add a
> 'might_sleep()' to invalidate_rangestart/end() to make this constraint
> clear & tested to the mm side?

Hm, sounds like a useful idea. Since in general you wont test with mmu
notifiers, but they could happen, and then they will block for at
least some mutex usually. I'll throw that as an idea on top for the
next round.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-20 15:18     ` Daniel Vetter
  2019-08-20 15:27       ` Jason Gunthorpe
@ 2019-08-21 15:41       ` Daniel Vetter
  2019-08-21 16:16         ` Jason Gunthorpe
  1 sibling, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2019-08-21 15:41 UTC (permalink / raw)
  To: Jason Gunthorpe, LKML, Linux MM, DRI Development,
	Intel Graphics Development, Andrew Morton, Michal Hocko,
	David Rientjes, Christian König, Jérôme Glisse,
	Daniel Vetter

On Tue, Aug 20, 2019 at 05:18:10PM +0200, Daniel Vetter wrote:
> On Tue, Aug 20, 2019 at 10:34:18AM -0300, Jason Gunthorpe wrote:
> > On Tue, Aug 20, 2019 at 10:19:02AM +0200, Daniel Vetter wrote:
> > > We need to make sure implementations don't cheat and don't have a
> > > possible schedule/blocking point deeply burried where review can't
> > > catch it.
> > > 
> > > I'm not sure whether this is the best way to make sure all the
> > > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > > But it gets the job done.
> > > 
> > > Inspired by an i915 patch series which did exactly that, because the
> > > rules haven't been entirely clear to us.
> > > 
> > > v2: Use the shiny new non_block_start/end annotations instead of
> > > abusing preempt_disable/enable.
> > > 
> > > v3: Rebase on top of Glisse's arg rework.
> > > 
> > > v4: Rebase on top of more Glisse rework.
> > > 
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: David Rientjes <rientjes@google.com>
> > > Cc: "Christian König" <christian.koenig@amd.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > > Cc: linux-mm@kvack.org
> > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > >  mm/mmu_notifier.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > > index 538d3bb87f9b..856636d06ee0 100644
> > > +++ b/mm/mmu_notifier.c
> > > @@ -181,7 +181,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> > >  	id = srcu_read_lock(&srcu);
> > >  	hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
> > >  		if (mn->ops->invalidate_range_start) {
> > > -			int _ret = mn->ops->invalidate_range_start(mn, range);
> > > +			int _ret;
> > > +
> > > +			if (!mmu_notifier_range_blockable(range))
> > > +				non_block_start();
> > > +			_ret = mn->ops->invalidate_range_start(mn, range);
> > > +			if (!mmu_notifier_range_blockable(range))
> > > +				non_block_end();
> > 
> > If someone Acks all the sched changes then I can pick this for
> > hmm.git, but I still think the existing pre-emption debugging is fine
> > for this use case.
> 
> Ok, I'll ping Peter Z. for an ack, iirc he was involved.
> 
> > Also, same comment as for the lockdep map, this needs to apply to the
> > non-blocking range_end also.
> 
> Hm, I thought the page table locks we're holding there already prevent any
> sleeping, so would be redundant? But reading through code I think that's
> not guaranteed, so yeah makes sense to add it for invalidate_range_end
> too. I'll respin once I have the ack/nack from scheduler people.

So I started to look into this, and I'm a bit confused. There's no
_nonblock version of this, so does this means blocking is never allowed,
or always allowed?

From a quick look through implementations I've only seen spinlocks, and
one up_read. So I guess I should wrape this callback in some unconditional
non_block_start/end, but I'm not sure.

Thanks, Daniel


> > Anyhow, since this series has conflicts with hmm.git it would be best
> > to flow through the whole thing through that tree. If there are no
> > remarks on the first two patches I'll grab them in a few days.
> 
> Thanks, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-21 15:41       ` Daniel Vetter
@ 2019-08-21 16:16         ` Jason Gunthorpe
  2019-08-22  8:42           ` Daniel Vetter
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-08-21 16:16 UTC (permalink / raw)
  To: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Andrew Morton, Michal Hocko, David Rientjes,
	Christian König, Jérôme Glisse, Daniel Vetter

On Wed, Aug 21, 2019 at 05:41:51PM +0200, Daniel Vetter wrote:

> > Hm, I thought the page table locks we're holding there already prevent any
> > sleeping, so would be redundant? But reading through code I think that's
> > not guaranteed, so yeah makes sense to add it for invalidate_range_end
> > too. I'll respin once I have the ack/nack from scheduler people.
> 
> So I started to look into this, and I'm a bit confused. There's no
> _nonblock version of this, so does this means blocking is never allowed,
> or always allowed?

RDMA has a mutex:

ib_umem_notifier_invalidate_range_end
  rbt_ib_umem_for_each_in_range
   invalidate_range_start_trampoline
    ib_umem_notifier_end_account
      mutex_lock(&umem_odp->umem_mutex);

I'm working to delete this path though!

nonblocking or not follows the start, the same flag gets placed into
the mmu_notifier_range struct passed to end.

> From a quick look through implementations I've only seen spinlocks, and
> one up_read. So I guess I should wrape this callback in some unconditional
> non_block_start/end, but I'm not sure.

For now, we should keep it the same as start, conditionally blocking.

Hopefully before LPC I can send a RFC series that eliminates most
invalidate_range_end users in favor of common locking..

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-21 16:16         ` Jason Gunthorpe
@ 2019-08-22  8:42           ` Daniel Vetter
  2019-08-22 14:24             ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2019-08-22  8:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Andrew Morton, Michal Hocko, David Rientjes,
	Christian König, Jérôme Glisse, Daniel Vetter

On Thu, Aug 22, 2019 at 10:16 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Aug 21, 2019 at 05:41:51PM +0200, Daniel Vetter wrote:
>
> > > Hm, I thought the page table locks we're holding there already prevent any
> > > sleeping, so would be redundant? But reading through code I think that's
> > > not guaranteed, so yeah makes sense to add it for invalidate_range_end
> > > too. I'll respin once I have the ack/nack from scheduler people.
> >
> > So I started to look into this, and I'm a bit confused. There's no
> > _nonblock version of this, so does this means blocking is never allowed,
> > or always allowed?
>
> RDMA has a mutex:
>
> ib_umem_notifier_invalidate_range_end
>   rbt_ib_umem_for_each_in_range
>    invalidate_range_start_trampoline
>     ib_umem_notifier_end_account
>       mutex_lock(&umem_odp->umem_mutex);
>
> I'm working to delete this path though!
>
> nonblocking or not follows the start, the same flag gets placed into
> the mmu_notifier_range struct passed to end.

Ok, makes sense.

I guess that also means the might_sleep (I started on that) in
invalidate_range_end also needs to be conditional? Or not bother with
a might_sleep in invalidate_range_end since you're working on removing
the last sleep in there?

> > From a quick look through implementations I've only seen spinlocks, and
> > one up_read. So I guess I should wrape this callback in some unconditional
> > non_block_start/end, but I'm not sure.
>
> For now, we should keep it the same as start, conditionally blocking.
>
> Hopefully before LPC I can send a RFC series that eliminates most
> invalidate_range_end users in favor of common locking..

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-22  8:42           ` Daniel Vetter
@ 2019-08-22 14:24             ` Jason Gunthorpe
  2019-08-22 14:27               ` Daniel Vetter
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-08-22 14:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Andrew Morton, Michal Hocko, David Rientjes,
	Christian König, Jérôme Glisse, Daniel Vetter

On Thu, Aug 22, 2019 at 10:42:39AM +0200, Daniel Vetter wrote:

> > RDMA has a mutex:
> >
> > ib_umem_notifier_invalidate_range_end
> >   rbt_ib_umem_for_each_in_range
> >    invalidate_range_start_trampoline
> >     ib_umem_notifier_end_account
> >       mutex_lock(&umem_odp->umem_mutex);
> >
> > I'm working to delete this path though!
> >
> > nonblocking or not follows the start, the same flag gets placed into
> > the mmu_notifier_range struct passed to end.
> 
> Ok, makes sense.
> 
> I guess that also means the might_sleep (I started on that) in
> invalidate_range_end also needs to be conditional? Or not bother with
> a might_sleep in invalidate_range_end since you're working on removing
> the last sleep in there?

I might suggest the same pattern as used for locked, the might_sleep
unconditionally on the start, and a 2nd might sleep after the IF in
__mmu_notifier_invalidate_range_end()

Observing that by audit all the callers already have the same locking
context for start/end

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable
  2019-08-22 14:24             ` Jason Gunthorpe
@ 2019-08-22 14:27               ` Daniel Vetter
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel Vetter @ 2019-08-22 14:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Andrew Morton, Michal Hocko, David Rientjes,
	Christian König, Jérôme Glisse, Daniel Vetter

On Thu, Aug 22, 2019 at 4:24 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Aug 22, 2019 at 10:42:39AM +0200, Daniel Vetter wrote:
>
> > > RDMA has a mutex:
> > >
> > > ib_umem_notifier_invalidate_range_end
> > >   rbt_ib_umem_for_each_in_range
> > >    invalidate_range_start_trampoline
> > >     ib_umem_notifier_end_account
> > >       mutex_lock(&umem_odp->umem_mutex);
> > >
> > > I'm working to delete this path though!
> > >
> > > nonblocking or not follows the start, the same flag gets placed into
> > > the mmu_notifier_range struct passed to end.
> >
> > Ok, makes sense.
> >
> > I guess that also means the might_sleep (I started on that) in
> > invalidate_range_end also needs to be conditional? Or not bother with
> > a might_sleep in invalidate_range_end since you're working on removing
> > the last sleep in there?
>
> I might suggest the same pattern as used for locked, the might_sleep
> unconditionally on the start, and a 2nd might sleep after the IF in
> __mmu_notifier_invalidate_range_end()
>
> Observing that by audit all the callers already have the same locking
> context for start/end

My question was more about enforcing that going forward, since you're
working to remove all the sleeps from invalidate_range_end. I don't
want to add debug annotations which are stricter than what the other
side actually expects. But since currently there is still sleeping
locks in invalidate_range_end I think I'll just stick them in both
places. You can then (re)move it when the cleanup lands.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-20 20:24   ` Daniel Vetter
@ 2019-08-22 23:14     ` Andrew Morton
  2019-08-23  8:34       ` Daniel Vetter
  2019-08-23  8:48     ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2019-08-22 23:14 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Daniel Vetter, Jason Gunthorpe, Peter Zijlstra, Ingo Molnar,
	Michal Hocko, David Rientjes, Christian König,
	Jérôme Glisse, Masahiro Yamada, Wei Wang,
	Andy Shevchenko, Thomas Gleixner, Jann Horn, Feng Tang,
	Kees Cook, Randy Dunlap, Daniel Vetter

On Tue, 20 Aug 2019 22:24:40 +0200 Daniel Vetter <daniel@ffwll.ch> wrote:

> Hi Peter,
> 
> Iirc you've been involved at least somewhat in discussing this. -mm folks
> are a bit undecided whether these new non_block semantics are a good idea.
> Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> are less enthusiastic. Jason said he's ok with merging the hmm side of
> this if scheduler folks ack. If not, then I'll respin with the
> preempt_disable/enable instead like in v1.

I became mollified once Michel explained the rationale.  I think it's
OK.  It's very specific to the oom reaper and hopefully won't be used
more widely(?).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-22 23:14     ` Andrew Morton
@ 2019-08-23  8:34       ` Daniel Vetter
  2019-08-23 12:12         ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2019-08-23  8:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Jason Gunthorpe, Peter Zijlstra, Ingo Molnar, Michal Hocko,
	David Rientjes, Christian König, Jérôme Glisse,
	Masahiro Yamada, Wei Wang, Andy Shevchenko, Thomas Gleixner,
	Jann Horn, Feng Tang, Kees Cook, Randy Dunlap, Daniel Vetter

On Fri, Aug 23, 2019 at 1:14 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 20 Aug 2019 22:24:40 +0200 Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > Hi Peter,
> >
> > Iirc you've been involved at least somewhat in discussing this. -mm folks
> > are a bit undecided whether these new non_block semantics are a good idea.
> > Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> > are less enthusiastic. Jason said he's ok with merging the hmm side of
> > this if scheduler folks ack. If not, then I'll respin with the
> > preempt_disable/enable instead like in v1.
>
> I became mollified once Michel explained the rationale.  I think it's
> OK.  It's very specific to the oom reaper and hopefully won't be used
> more widely(?).

Yeah, no plans for that from me. And I hope the comment above them now
explains why they exist, so people think twice before using it in
random places.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-20 20:24   ` Daniel Vetter
  2019-08-22 23:14     ` Andrew Morton
@ 2019-08-23  8:48     ` Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2019-08-23  8:48 UTC (permalink / raw)
  To: LKML, Linux MM, DRI Development, Intel Graphics Development,
	Jason Gunthorpe, Ingo Molnar, Andrew Morton, Michal Hocko,
	David Rientjes, Christian König, Jérôme Glisse,
	Masahiro Yamada, Wei Wang, Andy Shevchenko, Thomas Gleixner,
	Jann Horn, Feng Tang, Kees Cook, Randy Dunlap, Daniel Vetter

On Tue, Aug 20, 2019 at 10:24:40PM +0200, Daniel Vetter wrote:
> On Tue, Aug 20, 2019 at 10:19:01AM +0200, Daniel Vetter wrote:
> > In some special cases we must not block, but there's not a
> > spinlock, preempt-off, irqs-off or similar critical section already
> > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > pair to annotate these.
> > 
> > This will be used in the oom paths of mmu-notifiers, where blocking is
> > not allowed to make sure there's forward progress. Quoting Michal:
> > 
> > "The notifier is called from quite a restricted context - oom_reaper -
> > which shouldn't depend on any locks or sleepable conditionals. The code
> > should be swift as well but we mostly do care about it to make a forward
> > progress. Checking for sleepable context is the best thing we could come
> > up with that would describe these demands at least partially."
> > 
> > Peter also asked whether we want to catch spinlocks on top, but Michal
> > said those are less of a problem because spinlocks can't have an
> > indirect dependency upon the page allocator and hence close the loop
> > with the oom reaper.
> > 
> > Suggested by Michal Hocko.
> > 
> > v2:
> > - Improve commit message (Michal)
> > - Also check in schedule, not just might_sleep (Peter)
> > 
> > v3: It works better when I actually squash in the fixup I had lying
> > around :-/
> > 
> > v4: Pick the suggestion from Andrew Morton to give non_block_start/end
> > some good kerneldoc comments. I added that other blocking calls like
> > wait_event pose similar issues, since that's the other example we
> > discussed.
> > 
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > Cc: Wei Wang <wvw@google.com>
> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Feng Tang <feng.tang@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: linux-kernel@vger.kernel.org
> > Acked-by: Christian König <christian.koenig@amd.com> (v1)
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> Hi Peter,
> 
> Iirc you've been involved at least somewhat in discussing this. -mm folks
> are a bit undecided whether these new non_block semantics are a good idea.
> Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> are less enthusiastic. Jason said he's ok with merging the hmm side of
> this if scheduler folks ack. If not, then I'll respin with the
> preempt_disable/enable instead like in v1.
> 
> So ack/nack for this from the scheduler side?

Right, I had memories of seeing this before, and I just found a fairly
long discussion on this elsewhere in the vacation inbox (*groan*).

Yeah, this is something I can live with,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-23  8:34       ` Daniel Vetter
@ 2019-08-23 12:12         ` Jason Gunthorpe
  2019-08-23 12:22           ` Peter Zijlstra
  2019-08-23 13:42           ` Daniel Vetter
  0 siblings, 2 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2019-08-23 12:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Andrew Morton, LKML, Linux MM, DRI Development,
	Intel Graphics Development, Peter Zijlstra, Ingo Molnar,
	Michal Hocko, David Rientjes, Christian König,
	Jérôme Glisse, Masahiro Yamada, Wei Wang,
	Andy Shevchenko, Thomas Gleixner, Jann Horn, Feng Tang,
	Kees Cook, Randy Dunlap, Daniel Vetter

On Fri, Aug 23, 2019 at 10:34:01AM +0200, Daniel Vetter wrote:
> On Fri, Aug 23, 2019 at 1:14 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 20 Aug 2019 22:24:40 +0200 Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > Hi Peter,
> > >
> > > Iirc you've been involved at least somewhat in discussing this. -mm folks
> > > are a bit undecided whether these new non_block semantics are a good idea.
> > > Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> > > are less enthusiastic. Jason said he's ok with merging the hmm side of
> > > this if scheduler folks ack. If not, then I'll respin with the
> > > preempt_disable/enable instead like in v1.
> >
> > I became mollified once Michel explained the rationale.  I think it's
> > OK.  It's very specific to the oom reaper and hopefully won't be used
> > more widely(?).
> 
> Yeah, no plans for that from me. And I hope the comment above them now
> explains why they exist, so people think twice before using it in
> random places.

I still haven't heard a satisfactory answer why a whole new scheme is
needed and a simple:

   if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP))
        preempt_disable()

isn't sufficient to catch the problematic cases during debugging??
IMHO the fact preempt is changed by the above when debugging is not
material here. I think that information should be included in the
commit message at least.

But if sched people are happy then lets go ahead. Can you send a v2
with the check encompassing the invalidate_range_end?

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-23 12:12         ` Jason Gunthorpe
@ 2019-08-23 12:22           ` Peter Zijlstra
  2019-08-23 13:42           ` Daniel Vetter
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2019-08-23 12:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, Andrew Morton, LKML, Linux MM, DRI Development,
	Intel Graphics Development, Ingo Molnar, Michal Hocko,
	David Rientjes, Christian König, Jérôme Glisse,
	Masahiro Yamada, Wei Wang, Andy Shevchenko, Thomas Gleixner,
	Jann Horn, Feng Tang, Kees Cook, Randy Dunlap, Daniel Vetter

On Fri, Aug 23, 2019 at 09:12:34AM -0300, Jason Gunthorpe wrote:

> I still haven't heard a satisfactory answer why a whole new scheme is
> needed and a simple:
> 
>    if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP))
>         preempt_disable()
> 
> isn't sufficient to catch the problematic cases during debugging??
> IMHO the fact preempt is changed by the above when debugging is not
> material here. I think that information should be included in the
> commit message at least.

That has a much larger impact and actually changes behaviour, while the
relatively simple patch Daniel proposed only adds a warning but doesn't
affect behaviour.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-23 12:12         ` Jason Gunthorpe
  2019-08-23 12:22           ` Peter Zijlstra
@ 2019-08-23 13:42           ` Daniel Vetter
  2019-08-23 14:06             ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2019-08-23 13:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Morton, LKML, Linux MM, DRI Development,
	Intel Graphics Development, Peter Zijlstra, Ingo Molnar,
	Michal Hocko, David Rientjes, Christian König,
	Jérôme Glisse, Masahiro Yamada, Wei Wang,
	Andy Shevchenko, Thomas Gleixner, Jann Horn, Feng Tang,
	Kees Cook, Randy Dunlap, Daniel Vetter

On Fri, Aug 23, 2019 at 2:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Aug 23, 2019 at 10:34:01AM +0200, Daniel Vetter wrote:
> > On Fri, Aug 23, 2019 at 1:14 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Tue, 20 Aug 2019 22:24:40 +0200 Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > > Hi Peter,
> > > >
> > > > Iirc you've been involved at least somewhat in discussing this. -mm folks
> > > > are a bit undecided whether these new non_block semantics are a good idea.
> > > > Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> > > > are less enthusiastic. Jason said he's ok with merging the hmm side of
> > > > this if scheduler folks ack. If not, then I'll respin with the
> > > > preempt_disable/enable instead like in v1.
> > >
> > > I became mollified once Michel explained the rationale.  I think it's
> > > OK.  It's very specific to the oom reaper and hopefully won't be used
> > > more widely(?).
> >
> > Yeah, no plans for that from me. And I hope the comment above them now
> > explains why they exist, so people think twice before using it in
> > random places.
>
> I still haven't heard a satisfactory answer why a whole new scheme is
> needed and a simple:
>
>    if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP))
>         preempt_disable()
>
> isn't sufficient to catch the problematic cases during debugging??
> IMHO the fact preempt is changed by the above when debugging is not
> material here. I think that information should be included in the
> commit message at least.
>
> But if sched people are happy then lets go ahead. Can you send a v2
> with the check encompassing the invalidate_range_end?

Yes I will resend with this patch plus the next, amended as we
discussed, plus the might_sleep annotations. I'm assuming the lockdep
one will land, so not going to resend that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-23 13:42           ` Daniel Vetter
@ 2019-08-23 14:06             ` Peter Zijlstra
  2019-08-23 15:15               ` Daniel Vetter
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2019-08-23 14:06 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, Andrew Morton, LKML, Linux MM, DRI Development,
	Intel Graphics Development, Ingo Molnar, Michal Hocko,
	David Rientjes, Christian König, Jérôme Glisse,
	Masahiro Yamada, Wei Wang, Andy Shevchenko, Thomas Gleixner,
	Jann Horn, Feng Tang, Kees Cook, Randy Dunlap, Daniel Vetter

On Fri, Aug 23, 2019 at 03:42:47PM +0200, Daniel Vetter wrote:
> I'm assuming the lockdep one will land, so not going to resend that.

I was assuming you'd wake the might_lock_nested() along with the i915
user through the i915/drm tree. If want me to take some or all of that,
lemme know.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
  2019-08-23 14:06             ` Peter Zijlstra
@ 2019-08-23 15:15               ` Daniel Vetter
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel Vetter @ 2019-08-23 15:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Gunthorpe, Andrew Morton, LKML, Linux MM, DRI Development,
	Intel Graphics Development, Ingo Molnar, Michal Hocko,
	David Rientjes, Christian König, Jérôme Glisse,
	Masahiro Yamada, Wei Wang, Andy Shevchenko, Thomas Gleixner,
	Jann Horn, Feng Tang, Kees Cook, Randy Dunlap, Daniel Vetter

On Fri, Aug 23, 2019 at 4:06 PM Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Aug 23, 2019 at 03:42:47PM +0200, Daniel Vetter wrote:
> > I'm assuming the lockdep one will land, so not going to resend that.
>
> I was assuming you'd wake the might_lock_nested() along with the i915
> user through the i915/drm tree. If want me to take some or all of that,
> lemme know.

might_lock_nested() is a different patch series, that one will indeed
go in through the drm/i915 tree, thx for the ack there. What I meant
here is some mmu notifier lockdep map in this series that Jason said
he's going to pick up into hmm.git. I'm doing about 3 or 4 different
lockdep annotations series in parallel right now :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-08-23 15:15 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20  8:18 [PATCH 0/4] mmu notifier debug annotations/checks Daniel Vetter
2019-08-20  8:18 ` [PATCH 1/4] mm, notifier: Add a lockdep map for invalidate_range_start/end Daniel Vetter
2019-08-20 13:31   ` Jason Gunthorpe
2019-08-20  8:19 ` [PATCH 2/4] mm, notifier: Prime lockdep Daniel Vetter
2019-08-20 13:31   ` Jason Gunthorpe
2019-08-20  8:19 ` [PATCH 3/4] kernel.h: Add non_block_start/end() Daniel Vetter
2019-08-20 20:24   ` Daniel Vetter
2019-08-22 23:14     ` Andrew Morton
2019-08-23  8:34       ` Daniel Vetter
2019-08-23 12:12         ` Jason Gunthorpe
2019-08-23 12:22           ` Peter Zijlstra
2019-08-23 13:42           ` Daniel Vetter
2019-08-23 14:06             ` Peter Zijlstra
2019-08-23 15:15               ` Daniel Vetter
2019-08-23  8:48     ` Peter Zijlstra
2019-08-20  8:19 ` [PATCH 4/4] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter
2019-08-20 13:34   ` Jason Gunthorpe
2019-08-20 15:18     ` Daniel Vetter
2019-08-20 15:27       ` Jason Gunthorpe
2019-08-21  9:34         ` Daniel Vetter
2019-08-21 15:41       ` Daniel Vetter
2019-08-21 16:16         ` Jason Gunthorpe
2019-08-22  8:42           ` Daniel Vetter
2019-08-22 14:24             ` Jason Gunthorpe
2019-08-22 14:27               ` Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).