All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] mmu notifier debug checks v2
@ 2018-12-10 10:36 Daniel Vetter
  2018-12-10 10:36   ` Daniel Vetter
                   ` (7 more replies)
  0 siblings, 8 replies; 36+ messages in thread
From: Daniel Vetter @ 2018-12-10 10:36 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: DRI Development, LKML, linux-mm, Daniel Vetter

Hi all,

Here's v2 of my mmu notifier debug checks.

I think the last two patches could probably be extended to all callbacks,
but I'm not really clear on the exact rules. But happy to extend them if
there's interest.

This stuff helps us catch issues in the i915 mmu notifier implementation.

Thanks, Daniel

Daniel Vetter (4):
  mm: Check if mmu notifier callbacks are allowed to fail
  kernel.h: Add non_block_start/end()
  mm, notifier: Catch sleeping/blocking for !blockable
  mm, notifier: Add a lockdep map for invalidate_range_start

 include/linux/kernel.h       | 10 +++++++++-
 include/linux/mmu_notifier.h |  6 ++++++
 include/linux/sched.h        |  4 ++++
 kernel/sched/core.c          |  6 +++---
 mm/mmu_notifier.c            | 18 +++++++++++++++++-
 5 files changed, 39 insertions(+), 5 deletions(-)

-- 
2.20.0.rc1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2018-12-10 10:36 [PATCH 0/4] mmu notifier debug checks v2 Daniel Vetter
@ 2018-12-10 10:36   ` Daniel Vetter
  2018-12-10 10:36   ` Daniel Vetter
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2018-12-10 10:36 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: DRI Development, LKML, linux-mm, Daniel Vetter, Andrew Morton,
	Michal Hocko, Christian König, David Rientjes,
	Jérôme Glisse, Paolo Bonzini, Daniel Vetter

Just a bit of paranoia, since if we start pushing this deep into
callchains it's hard to spot all places where an mmu notifier
implementation might fail when it's not allowed to.

Inspired by some confusion we had discussing i915 mmu notifiers and
whether we could use the newly-introduced return value to handle some
corner cases. Until we realized that these are only for when a task
has been killed by the oom reaper.

An alternative approach would be to split the callback into two
versions, one with the int return value, and the other with void
return value like in older kernels. But that's a lot more churn for
fairly little gain I think.

Summary from the m-l discussion on why we want something at warning
level: This allows automated tooling in CI to catch bugs without
humans having to look at everything. If we just upgrade the existing
pr_info to a pr_warn, then we'll have false positives. And as-is, no
one will ever spot the problem since it's lost in the massive amounts
of overall dmesg noise.

v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
the problematic case (Michal Hocko).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 mm/mmu_notifier.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 5119ff846769..ccc22f21b735 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
 				pr_info("%pS callback failed with %d in %sblockable context.\n",
 						mn->ops->invalidate_range_start, _ret,
 						!blockable ? "non-" : "");
+				if (blockable)
+					pr_warn("%pS callback failure not allowed\n",
+						mn->ops->invalidate_range_start);
 				ret = _ret;
 			}
 		}
-- 
2.20.0.rc1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
@ 2018-12-10 10:36   ` Daniel Vetter
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2018-12-10 10:36 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: Michal Hocko, Daniel Vetter, Daniel Vetter, LKML,
	DRI Development, linux-mm, Jérôme Glisse,
	David Rientjes, Paolo Bonzini, Andrew Morton,
	Christian König

Just a bit of paranoia, since if we start pushing this deep into
callchains it's hard to spot all places where an mmu notifier
implementation might fail when it's not allowed to.

Inspired by some confusion we had discussing i915 mmu notifiers and
whether we could use the newly-introduced return value to handle some
corner cases. Until we realized that these are only for when a task
has been killed by the oom reaper.

An alternative approach would be to split the callback into two
versions, one with the int return value, and the other with void
return value like in older kernels. But that's a lot more churn for
fairly little gain I think.

Summary from the m-l discussion on why we want something at warning
level: This allows automated tooling in CI to catch bugs without
humans having to look at everything. If we just upgrade the existing
pr_info to a pr_warn, then we'll have false positives. And as-is, no
one will ever spot the problem since it's lost in the massive amounts
of overall dmesg noise.

v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
the problematic case (Michal Hocko).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 mm/mmu_notifier.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 5119ff846769..ccc22f21b735 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
 				pr_info("%pS callback failed with %d in %sblockable context.\n",
 						mn->ops->invalidate_range_start, _ret,
 						!blockable ? "non-" : "");
+				if (blockable)
+					pr_warn("%pS callback failure not allowed\n",
+						mn->ops->invalidate_range_start);
 				ret = _ret;
 			}
 		}
-- 
2.20.0.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/4] kernel.h: Add non_block_start/end()
  2018-12-10 10:36 [PATCH 0/4] mmu notifier debug checks v2 Daniel Vetter
@ 2018-12-10 10:36   ` Daniel Vetter
  2018-12-10 10:36   ` Daniel Vetter
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2018-12-10 10:36 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: DRI Development, LKML, linux-mm, Daniel Vetter, Andrew Morton,
	Michal Hocko, David Rientjes, Christian König,
	Jérôme Glisse, Daniel Vetter

In some special cases we must not block, but there's not a
spinlock, preempt-off, irqs-off or similar critical section already
that arms the might_sleep() debug checks. Add a non_block_start/end()
pair to annotate these.

This will be used in the oom paths of mmu-notifiers, where blocking is
not allowed to make sure there's forward progress.

Suggested by Michal Hocko.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/kernel.h | 10 +++++++++-
 include/linux/sched.h  |  4 ++++
 kernel/sched/core.c    |  6 +++---
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d6aac75b51ba..c2cf31515b3d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -251,7 +251,9 @@ extern int _cond_resched(void);
  * might_sleep - annotation for functions that can sleep
  *
  * this macro will print a stack trace if it is executed in an atomic
- * context (spinlock, irq-handler, ...).
+ * context (spinlock, irq-handler, ...). Additional sections where blocking is
+ * not allowed can be annotated with non_block_start() and non_block_end()
+ * pairs.
  *
  * This is a useful debugging help to be able to catch problems early and not
  * be bitten later when the calling function happens to sleep when it is not
@@ -260,6 +262,10 @@ extern int _cond_resched(void);
 # define might_sleep() \
 	do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
 # define sched_annotate_sleep()	(current->task_state_change = 0)
+# define non_block_start() \
+	do { current->non_block_count++; } while (0)
+# define non_block_end() \
+	do { WARN_ON(current->non_block_count-- == 0); } while (0)
 #else
   static inline void ___might_sleep(const char *file, int line,
 				   int preempt_offset) { }
@@ -267,6 +273,8 @@ extern int _cond_resched(void);
 				   int preempt_offset) { }
 # define might_sleep() do { might_resched(); } while (0)
 # define sched_annotate_sleep() do { } while (0)
+# define non_block_start() do { } while (0)
+# define non_block_end() do { } while (0)
 #endif
 
 #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ecffd4e37453..41249dbf8f27 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -916,6 +916,10 @@ struct task_struct {
 	struct mutex_waiter		*blocked_on;
 #endif
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+	int				non_block_count;
+#endif
+
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned int			irq_events;
 	unsigned long			hardirq_enable_ip;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6fedf3a98581..969d7a71f30c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6113,7 +6113,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 	rcu_sleep_check();
 
 	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
-	     !is_idle_task(current)) ||
+	     !is_idle_task(current) && !current->non_block_count) ||
 	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
 	    oops_in_progress)
 		return;
@@ -6129,8 +6129,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 		"BUG: sleeping function called from invalid context at %s:%d\n",
 			file, line);
 	printk(KERN_ERR
-		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
-			in_atomic(), irqs_disabled(),
+		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
+			in_atomic(), irqs_disabled(), current->non_block_count,
 			current->pid, current->comm);
 
 	if (task_stack_end_corrupted(current))
-- 
2.20.0.rc1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/4] kernel.h: Add non_block_start/end()
@ 2018-12-10 10:36   ` Daniel Vetter
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2018-12-10 10:36 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: Michal Hocko, Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, David Rientjes, Daniel Vetter,
	Andrew Morton, Christian König

In some special cases we must not block, but there's not a
spinlock, preempt-off, irqs-off or similar critical section already
that arms the might_sleep() debug checks. Add a non_block_start/end()
pair to annotate these.

This will be used in the oom paths of mmu-notifiers, where blocking is
not allowed to make sure there's forward progress.

Suggested by Michal Hocko.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/kernel.h | 10 +++++++++-
 include/linux/sched.h  |  4 ++++
 kernel/sched/core.c    |  6 +++---
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d6aac75b51ba..c2cf31515b3d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -251,7 +251,9 @@ extern int _cond_resched(void);
  * might_sleep - annotation for functions that can sleep
  *
  * this macro will print a stack trace if it is executed in an atomic
- * context (spinlock, irq-handler, ...).
+ * context (spinlock, irq-handler, ...). Additional sections where blocking is
+ * not allowed can be annotated with non_block_start() and non_block_end()
+ * pairs.
  *
  * This is a useful debugging help to be able to catch problems early and not
  * be bitten later when the calling function happens to sleep when it is not
@@ -260,6 +262,10 @@ extern int _cond_resched(void);
 # define might_sleep() \
 	do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
 # define sched_annotate_sleep()	(current->task_state_change = 0)
+# define non_block_start() \
+	do { current->non_block_count++; } while (0)
+# define non_block_end() \
+	do { WARN_ON(current->non_block_count-- == 0); } while (0)
 #else
   static inline void ___might_sleep(const char *file, int line,
 				   int preempt_offset) { }
@@ -267,6 +273,8 @@ extern int _cond_resched(void);
 				   int preempt_offset) { }
 # define might_sleep() do { might_resched(); } while (0)
 # define sched_annotate_sleep() do { } while (0)
+# define non_block_start() do { } while (0)
+# define non_block_end() do { } while (0)
 #endif
 
 #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ecffd4e37453..41249dbf8f27 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -916,6 +916,10 @@ struct task_struct {
 	struct mutex_waiter		*blocked_on;
 #endif
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+	int				non_block_count;
+#endif
+
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned int			irq_events;
 	unsigned long			hardirq_enable_ip;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6fedf3a98581..969d7a71f30c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6113,7 +6113,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 	rcu_sleep_check();
 
 	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
-	     !is_idle_task(current)) ||
+	     !is_idle_task(current) && !current->non_block_count) ||
 	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
 	    oops_in_progress)
 		return;
@@ -6129,8 +6129,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 		"BUG: sleeping function called from invalid context at %s:%d\n",
 			file, line);
 	printk(KERN_ERR
-		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
-			in_atomic(), irqs_disabled(),
+		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
+			in_atomic(), irqs_disabled(), current->non_block_count,
 			current->pid, current->comm);
 
 	if (task_stack_end_corrupted(current))
-- 
2.20.0.rc1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/4] mm, notifier: Catch sleeping/blocking for !blockable
  2018-12-10 10:36 [PATCH 0/4] mmu notifier debug checks v2 Daniel Vetter
  2018-12-10 10:36   ` Daniel Vetter
  2018-12-10 10:36   ` Daniel Vetter
@ 2018-12-10 10:36 ` Daniel Vetter
  2018-12-10 10:36 ` [PATCH 4/4] mm, notifier: Add a lockdep map for invalidate_range_start Daniel Vetter
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2018-12-10 10:36 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: DRI Development, LKML, linux-mm, Daniel Vetter, Andrew Morton,
	Michal Hocko, David Rientjes, Christian König,
	Jérôme Glisse, Daniel Vetter

We need to make sure implementations don't cheat and don't have a
possible schedule/blocking point deeply burried where review can't
catch it.

I'm not sure whether this is the best way to make sure all the
might_sleep() callsites trigger, and it's a bit ugly in the code flow.
But it gets the job done.

Inspired by an i915 patch series which did exactly that, because the
rules haven't been entirely clear to us.

v2: Use the shiny new non_block_start/end annotations instead of
abusing preempt_disable/enable.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 mm/mmu_notifier.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index ccc22f21b735..a50ed7d1ecef 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->invalidate_range_start) {
-			int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
+			int _ret;
+
+			if (!blockable)
+				non_block_start();
+			_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
+			if (!blockable)
+				non_block_end();
 			if (_ret) {
 				pr_info("%pS callback failed with %d in %sblockable context.\n",
 						mn->ops->invalidate_range_start, _ret,
-- 
2.20.0.rc1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/4] mm, notifier: Add a lockdep map for invalidate_range_start
  2018-12-10 10:36 [PATCH 0/4] mmu notifier debug checks v2 Daniel Vetter
                   ` (2 preceding siblings ...)
  2018-12-10 10:36 ` [PATCH 3/4] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter
@ 2018-12-10 10:36 ` Daniel Vetter
  2018-12-10 12:07 ` ✗ Fi.CI.CHECKPATCH: warning for mmu notifier debug checks v2 Patchwork
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2018-12-10 10:36 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: DRI Development, LKML, linux-mm, Daniel Vetter, Chris Wilson,
	Andrew Morton, David Rientjes, Jérôme Glisse,
	Michal Hocko, Christian König, Greg Kroah-Hartman,
	Mike Rapoport, Daniel Vetter

This is a similar idea to the fs_reclaim fake lockdep lock. It's
fairly easy to provoke a specific notifier to be run on a specific
range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for
all the various callchains that might lead to them. But both at the
same time is really hard to reliable hit, especially when you want to
exercise paths like direct reclaim or compaction, where it's not
easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep
to see a lot more dependencies, without having to actually hit them
in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled
this out for the invaliate_range_start callback. If there's
interest, we should probably roll this out to all of them. But my
undestanding of core mm is seriously lacking, and I'm not clear on
whether we need a lockdep map for each callback, or whether some can
be shared.

v2: Use lock_map_acquire/release() like fs_reclaim, to avoid confusion
with this being a real mutex (Chris Wilson).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/mmu_notifier.h | 6 ++++++
 mm/mmu_notifier.c            | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 9893a6432adf..19be442606c6 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -12,6 +12,10 @@ struct mmu_notifier_ops;
 
 #ifdef CONFIG_MMU_NOTIFIER
 
+#ifdef CONFIG_LOCKDEP
+extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
+#endif
+
 /*
  * The mmu notifier_mm structure is allocated and installed in
  * mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -267,8 +271,10 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
 static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
 				  unsigned long start, unsigned long end)
 {
+	lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
 	if (mm_has_notifiers(mm))
 		__mmu_notifier_invalidate_range_start(mm, start, end, true);
+	lock_map_release(&__mmu_notifier_invalidate_range_start_map);
 }
 
 static inline int mmu_notifier_invalidate_range_start_nonblock(struct mm_struct *mm,
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index a50ed7d1ecef..c91d58fe388b 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -23,6 +23,13 @@
 /* global SRCU for all MMs */
 DEFINE_STATIC_SRCU(srcu);
 
+#ifdef CONFIG_LOCKDEP
+struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
+	.name = "mmu_notifier_invalidate_range_start"
+};
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map);
+#endif
+
 /*
  * This function allows mmu_notifier::release callback to delay a call to
  * a function that will free appropriate resources. The function must be
-- 
2.20.0.rc1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2018-12-10 10:36   ` Daniel Vetter
@ 2018-12-10 10:44     ` Koenig, Christian
  -1 siblings, 0 replies; 36+ messages in thread
From: Koenig, Christian @ 2018-12-10 10:44 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development
  Cc: DRI Development, LKML, linux-mm, Andrew Morton, Michal Hocko,
	David Rientjes, Jérôme Glisse, Paolo Bonzini,
	Daniel Vetter

Patches #1 and #3 are Reviewed-by: Christian König 
<christian.koenig@amd.com>

Patch #2 is Acked-by: Christian König <christian.koenig@amd.com> because 
I can't judge if adding the counter in the thread structure is actually 
a good idea.

In patch #4 I honestly don't understand at all how this stuff works, so 
no-comment from my side on this.

Christian.

Am 10.12.18 um 11:36 schrieb Daniel Vetter:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
>
> Inspired by some confusion we had discussing i915 mmu notifiers and
> whether we could use the newly-introduced return value to handle some
> corner cases. Until we realized that these are only for when a task
> has been killed by the oom reaper.
>
> An alternative approach would be to split the callback into two
> versions, one with the int return value, and the other with void
> return value like in older kernels. But that's a lot more churn for
> fairly little gain I think.
>
> Summary from the m-l discussion on why we want something at warning
> level: This allows automated tooling in CI to catch bugs without
> humans having to look at everything. If we just upgrade the existing
> pr_info to a pr_warn, then we'll have false positives. And as-is, no
> one will ever spot the problem since it's lost in the massive amounts
> of overall dmesg noise.
>
> v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> the problematic case (Michal Hocko).
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   mm/mmu_notifier.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 5119ff846769..ccc22f21b735 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
>   				pr_info("%pS callback failed with %d in %sblockable context.\n",
>   						mn->ops->invalidate_range_start, _ret,
>   						!blockable ? "non-" : "");
> +				if (blockable)
> +					pr_warn("%pS callback failure not allowed\n",
> +						mn->ops->invalidate_range_start);
>   				ret = _ret;
>   			}
>   		}


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
@ 2018-12-10 10:44     ` Koenig, Christian
  0 siblings, 0 replies; 36+ messages in thread
From: Koenig, Christian @ 2018-12-10 10:44 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development
  Cc: Michal Hocko, Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, David Rientjes, Paolo Bonzini,
	Andrew Morton

Patches #1 and #3 are Reviewed-by: Christian König 
<christian.koenig@amd.com>

Patch #2 is Acked-by: Christian König <christian.koenig@amd.com> because 
I can't judge if adding the counter in the thread structure is actually 
a good idea.

In patch #4 I honestly don't understand at all how this stuff works, so 
no-comment from my side on this.

Christian.

Am 10.12.18 um 11:36 schrieb Daniel Vetter:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
>
> Inspired by some confusion we had discussing i915 mmu notifiers and
> whether we could use the newly-introduced return value to handle some
> corner cases. Until we realized that these are only for when a task
> has been killed by the oom reaper.
>
> An alternative approach would be to split the callback into two
> versions, one with the int return value, and the other with void
> return value like in older kernels. But that's a lot more churn for
> fairly little gain I think.
>
> Summary from the m-l discussion on why we want something at warning
> level: This allows automated tooling in CI to catch bugs without
> humans having to look at everything. If we just upgrade the existing
> pr_info to a pr_warn, then we'll have false positives. And as-is, no
> one will ever spot the problem since it's lost in the massive amounts
> of overall dmesg noise.
>
> v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> the problematic case (Michal Hocko).
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   mm/mmu_notifier.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 5119ff846769..ccc22f21b735 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
>   				pr_info("%pS callback failed with %d in %sblockable context.\n",
>   						mn->ops->invalidate_range_start, _ret,
>   						!blockable ? "non-" : "");
> +				if (blockable)
> +					pr_warn("%pS callback failure not allowed\n",
> +						mn->ops->invalidate_range_start);
>   				ret = _ret;
>   			}
>   		}

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for mmu notifier debug checks v2
  2018-12-10 10:36 [PATCH 0/4] mmu notifier debug checks v2 Daniel Vetter
                   ` (3 preceding siblings ...)
  2018-12-10 10:36 ` [PATCH 4/4] mm, notifier: Add a lockdep map for invalidate_range_start Daniel Vetter
@ 2018-12-10 12:07 ` Patchwork
  2018-12-10 12:28 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2018-12-10 12:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: mmu notifier debug checks v2
URL   : https://patchwork.freedesktop.org/series/53828/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
834029401554 mm: Check if mmu notifier callbacks are allowed to fail
-:56: WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>'

total: 0 errors, 1 warnings, 0 checks, 9 lines checked
735885c1e985 kernel.h: Add non_block_start/end()
-:47: WARNING:SINGLE_STATEMENT_DO_WHILE_MACRO: Single statement macros should not use a do {} while (0) loop
#47: FILE: include/linux/kernel.h:265:
+# define non_block_start() \
+	do { current->non_block_count++; } while (0)

-:49: WARNING:SINGLE_STATEMENT_DO_WHILE_MACRO: Single statement macros should not use a do {} while (0) loop
#49: FILE: include/linux/kernel.h:267:
+# define non_block_end() \
+	do { WARN_ON(current->non_block_count-- == 0); } while (0)

-:101: WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>'

total: 0 errors, 3 warnings, 0 checks, 56 lines checked
138e4dcc716f mm, notifier: Catch sleeping/blocking for !blockable
-:50: WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>'

total: 0 errors, 1 warnings, 0 checks, 14 lines checked
a1c311d5605e mm, notifier: Add a lockdep map for invalidate_range_start
-:88: WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>'

total: 0 errors, 1 warnings, 0 checks, 33 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* ✓ Fi.CI.BAT: success for mmu notifier debug checks v2
  2018-12-10 10:36 [PATCH 0/4] mmu notifier debug checks v2 Daniel Vetter
                   ` (4 preceding siblings ...)
  2018-12-10 12:07 ` ✗ Fi.CI.CHECKPATCH: warning for mmu notifier debug checks v2 Patchwork
@ 2018-12-10 12:28 ` Patchwork
  2018-12-10 15:54 ` ✗ Fi.CI.IGT: failure " Patchwork
  2018-12-10 16:47 ` ✗ Fi.CI.BAT: failure for mmu notifier debug checks v2 (rev2) Patchwork
  7 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2018-12-10 12:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: mmu notifier debug checks v2
URL   : https://patchwork.freedesktop.org/series/53828/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5292 -> Patchwork_11056
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/53828/revisions/1/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_11056:

### IGT changes ###

#### Warnings ####

  * igt@kms_busy@basic-flip-a:
    - {fi-kbl-7567u}:     PASS -> SKIP +2

  
Known issues
------------

  Here are the changes found in Patchwork_11056 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@cs-compute:
    - fi-kbl-8809g:       NOTRUN -> FAIL [fdo#108094]

  * igt@amdgpu/amd_prime@amd-to-i915:
    - fi-kbl-8809g:       NOTRUN -> FAIL [fdo#107341]

  * {igt@runner@aborted}:
    - {fi-icl-y}:         NOTRUN -> FAIL [fdo#108070]

  
#### Possible fixes ####

  * igt@amdgpu/amd_basic@userptr:
    - fi-kbl-8809g:       DMESG-WARN [fdo#108965] -> PASS

  * igt@gem_ctx_create@basic-files:
    - fi-bsw-kefka:       FAIL [fdo#108656] -> PASS

  * igt@i915_selftest@live_gem:
    - fi-bsw-n3050:       DMESG-WARN -> PASS

  * igt@i915_selftest@live_hangcheck:
    - fi-cfl-8109u:       INCOMPLETE [fdo#106070] -> PASS
    - fi-kbl-7560u:       INCOMPLETE [fdo#108044] -> PASS

  * igt@kms_frontbuffer_tracking@basic:
    - fi-byt-clapper:     FAIL [fdo#103167] -> PASS

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-b:
    - fi-byt-clapper:     FAIL [fdo#107362] -> PASS

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-cfl-8109u:       DMESG-WARN -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#106070]: https://bugs.freedesktop.org/show_bug.cgi?id=106070
  [fdo#107341]: https://bugs.freedesktop.org/show_bug.cgi?id=107341
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#108044]: https://bugs.freedesktop.org/show_bug.cgi?id=108044
  [fdo#108070]: https://bugs.freedesktop.org/show_bug.cgi?id=108070
  [fdo#108094]: https://bugs.freedesktop.org/show_bug.cgi?id=108094
  [fdo#108656]: https://bugs.freedesktop.org/show_bug.cgi?id=108656
  [fdo#108965]: https://bugs.freedesktop.org/show_bug.cgi?id=108965


Participating hosts (50 -> 45)
------------------------------

  Additional (1): fi-icl-y 
  Missing    (6): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-icl-u3 


Build changes
-------------

    * Linux: CI_DRM_5292 -> Patchwork_11056

  CI_DRM_5292: ec6b8cacbc8777a77119fa7af7e2930fe186091b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4744: 4579ac1d445cf39f6de474071b20db790db575bd @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_11056: a1c311d5605eac8919aa0c8fa62137b168ce0d18 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

a1c311d5605e mm, notifier: Add a lockdep map for invalidate_range_start
138e4dcc716f mm, notifier: Catch sleeping/blocking for !blockable
735885c1e985 kernel.h: Add non_block_start/end()
834029401554 mm: Check if mmu notifier callbacks are allowed to fail

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_11056/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2018-12-10 10:36   ` Daniel Vetter
@ 2018-12-10 13:27     ` Michal Hocko
  -1 siblings, 0 replies; 36+ messages in thread
From: Michal Hocko @ 2018-12-10 13:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, DRI Development, LKML, linux-mm,
	Andrew Morton, Christian König, David Rientjes,
	Jérôme Glisse, Paolo Bonzini, Daniel Vetter

On Mon 10-12-18 11:36:38, Daniel Vetter wrote:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
> 
> Inspired by some confusion we had discussing i915 mmu notifiers and
> whether we could use the newly-introduced return value to handle some
> corner cases. Until we realized that these are only for when a task
> has been killed by the oom reaper.
> 
> An alternative approach would be to split the callback into two
> versions, one with the int return value, and the other with void
> return value like in older kernels. But that's a lot more churn for
> fairly little gain I think.
> 
> Summary from the m-l discussion on why we want something at warning
> level: This allows automated tooling in CI to catch bugs without
> humans having to look at everything. If we just upgrade the existing
> pr_info to a pr_warn, then we'll have false positives. And as-is, no
> one will ever spot the problem since it's lost in the massive amounts
> of overall dmesg noise.

OK, fair enough. If this is going to help with testing then I do not
have any objections of course.

> v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> the problematic case (Michal Hocko).

Thanks!

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  mm/mmu_notifier.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 5119ff846769..ccc22f21b735 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
>  				pr_info("%pS callback failed with %d in %sblockable context.\n",
>  						mn->ops->invalidate_range_start, _ret,
>  						!blockable ? "non-" : "");
> +				if (blockable)
> +					pr_warn("%pS callback failure not allowed\n",
> +						mn->ops->invalidate_range_start);
>  				ret = _ret;
>  			}
>  		}
> -- 
> 2.20.0.rc1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
@ 2018-12-10 13:27     ` Michal Hocko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Hocko @ 2018-12-10 13:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, DRI Development, LKML, linux-mm,
	Andrew Morton, Christian König, David Rientjes,
	Jérôme Glisse, Paolo Bonzini, Daniel Vetter

On Mon 10-12-18 11:36:38, Daniel Vetter wrote:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
> 
> Inspired by some confusion we had discussing i915 mmu notifiers and
> whether we could use the newly-introduced return value to handle some
> corner cases. Until we realized that these are only for when a task
> has been killed by the oom reaper.
> 
> An alternative approach would be to split the callback into two
> versions, one with the int return value, and the other with void
> return value like in older kernels. But that's a lot more churn for
> fairly little gain I think.
> 
> Summary from the m-l discussion on why we want something at warning
> level: This allows automated tooling in CI to catch bugs without
> humans having to look at everything. If we just upgrade the existing
> pr_info to a pr_warn, then we'll have false positives. And as-is, no
> one will ever spot the problem since it's lost in the massive amounts
> of overall dmesg noise.

OK, fair enough. If this is going to help with testing then I do not
have any objections of course.

> v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> the problematic case (Michal Hocko).

Thanks!

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Christian K�nig" <christian.koenig@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "J�r�me Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  mm/mmu_notifier.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 5119ff846769..ccc22f21b735 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
>  				pr_info("%pS callback failed with %d in %sblockable context.\n",
>  						mn->ops->invalidate_range_start, _ret,
>  						!blockable ? "non-" : "");
> +				if (blockable)
> +					pr_warn("%pS callback failure not allowed\n",
> +						mn->ops->invalidate_range_start);
>  				ret = _ret;
>  			}
>  		}
> -- 
> 2.20.0.rc1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
  2018-12-10 10:36   ` Daniel Vetter
@ 2018-12-10 14:13     ` Michal Hocko
  -1 siblings, 0 replies; 36+ messages in thread
From: Michal Hocko @ 2018-12-10 14:13 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, DRI Development, LKML, linux-mm,
	Andrew Morton, David Rientjes, Christian König,
	Jérôme Glisse, Daniel Vetter, Peter Zijlstra

I do not see any scheduler guys Cced and it would be really great to get
their opinion here.

On Mon 10-12-18 11:36:39, Daniel Vetter wrote:
> In some special cases we must not block, but there's not a
> spinlock, preempt-off, irqs-off or similar critical section already
> that arms the might_sleep() debug checks. Add a non_block_start/end()
> pair to annotate these.
> 
> This will be used in the oom paths of mmu-notifiers, where blocking is
> not allowed to make sure there's forward progress.

Considering the only alternative would be to abuse
preempt_{disable,enable}, and that really has a different semantic, I
think this makes some sense. The cotext is preemptible but we do not
want notifier to sleep on any locks, WQ etc.

> Suggested by Michal Hocko.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  include/linux/kernel.h | 10 +++++++++-
>  include/linux/sched.h  |  4 ++++
>  kernel/sched/core.c    |  6 +++---
>  3 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index d6aac75b51ba..c2cf31515b3d 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -251,7 +251,9 @@ extern int _cond_resched(void);
>   * might_sleep - annotation for functions that can sleep
>   *
>   * this macro will print a stack trace if it is executed in an atomic
> - * context (spinlock, irq-handler, ...).
> + * context (spinlock, irq-handler, ...). Additional sections where blocking is
> + * not allowed can be annotated with non_block_start() and non_block_end()
> + * pairs.
>   *
>   * This is a useful debugging help to be able to catch problems early and not
>   * be bitten later when the calling function happens to sleep when it is not
> @@ -260,6 +262,10 @@ extern int _cond_resched(void);
>  # define might_sleep() \
>  	do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
>  # define sched_annotate_sleep()	(current->task_state_change = 0)
> +# define non_block_start() \
> +	do { current->non_block_count++; } while (0)
> +# define non_block_end() \
> +	do { WARN_ON(current->non_block_count-- == 0); } while (0)
>  #else
>    static inline void ___might_sleep(const char *file, int line,
>  				   int preempt_offset) { }
> @@ -267,6 +273,8 @@ extern int _cond_resched(void);
>  				   int preempt_offset) { }
>  # define might_sleep() do { might_resched(); } while (0)
>  # define sched_annotate_sleep() do { } while (0)
> +# define non_block_start() do { } while (0)
> +# define non_block_end() do { } while (0)
>  #endif
>  
>  #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ecffd4e37453..41249dbf8f27 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -916,6 +916,10 @@ struct task_struct {
>  	struct mutex_waiter		*blocked_on;
>  #endif
>  
> +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> +	int				non_block_count;
> +#endif
> +
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  	unsigned int			irq_events;
>  	unsigned long			hardirq_enable_ip;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 6fedf3a98581..969d7a71f30c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6113,7 +6113,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  	rcu_sleep_check();
>  
>  	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
> -	     !is_idle_task(current)) ||
> +	     !is_idle_task(current) && !current->non_block_count) ||
>  	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
>  	    oops_in_progress)
>  		return;
> @@ -6129,8 +6129,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  		"BUG: sleeping function called from invalid context at %s:%d\n",
>  			file, line);
>  	printk(KERN_ERR
> -		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
> -			in_atomic(), irqs_disabled(),
> +		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
> +			in_atomic(), irqs_disabled(), current->non_block_count,
>  			current->pid, current->comm);
>  
>  	if (task_stack_end_corrupted(current))
> -- 
> 2.20.0.rc1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
@ 2018-12-10 14:13     ` Michal Hocko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Hocko @ 2018-12-10 14:13 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, DRI Development, LKML, linux-mm,
	Andrew Morton, David Rientjes, Christian König,
	Jérôme Glisse, Daniel Vetter, Peter Zijlstra

I do not see any scheduler guys Cced and it would be really great to get
their opinion here.

On Mon 10-12-18 11:36:39, Daniel Vetter wrote:
> In some special cases we must not block, but there's not a
> spinlock, preempt-off, irqs-off or similar critical section already
> that arms the might_sleep() debug checks. Add a non_block_start/end()
> pair to annotate these.
> 
> This will be used in the oom paths of mmu-notifiers, where blocking is
> not allowed to make sure there's forward progress.

Considering the only alternative would be to abuse
preempt_{disable,enable}, and that really has a different semantic, I
think this makes some sense. The cotext is preemptible but we do not
want notifier to sleep on any locks, WQ etc.

> Suggested by Michal Hocko.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Christian K�nig" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "J�r�me Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  include/linux/kernel.h | 10 +++++++++-
>  include/linux/sched.h  |  4 ++++
>  kernel/sched/core.c    |  6 +++---
>  3 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index d6aac75b51ba..c2cf31515b3d 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -251,7 +251,9 @@ extern int _cond_resched(void);
>   * might_sleep - annotation for functions that can sleep
>   *
>   * this macro will print a stack trace if it is executed in an atomic
> - * context (spinlock, irq-handler, ...).
> + * context (spinlock, irq-handler, ...). Additional sections where blocking is
> + * not allowed can be annotated with non_block_start() and non_block_end()
> + * pairs.
>   *
>   * This is a useful debugging help to be able to catch problems early and not
>   * be bitten later when the calling function happens to sleep when it is not
> @@ -260,6 +262,10 @@ extern int _cond_resched(void);
>  # define might_sleep() \
>  	do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
>  # define sched_annotate_sleep()	(current->task_state_change = 0)
> +# define non_block_start() \
> +	do { current->non_block_count++; } while (0)
> +# define non_block_end() \
> +	do { WARN_ON(current->non_block_count-- == 0); } while (0)
>  #else
>    static inline void ___might_sleep(const char *file, int line,
>  				   int preempt_offset) { }
> @@ -267,6 +273,8 @@ extern int _cond_resched(void);
>  				   int preempt_offset) { }
>  # define might_sleep() do { might_resched(); } while (0)
>  # define sched_annotate_sleep() do { } while (0)
> +# define non_block_start() do { } while (0)
> +# define non_block_end() do { } while (0)
>  #endif
>  
>  #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ecffd4e37453..41249dbf8f27 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -916,6 +916,10 @@ struct task_struct {
>  	struct mutex_waiter		*blocked_on;
>  #endif
>  
> +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> +	int				non_block_count;
> +#endif
> +
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  	unsigned int			irq_events;
>  	unsigned long			hardirq_enable_ip;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 6fedf3a98581..969d7a71f30c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6113,7 +6113,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  	rcu_sleep_check();
>  
>  	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
> -	     !is_idle_task(current)) ||
> +	     !is_idle_task(current) && !current->non_block_count) ||
>  	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
>  	    oops_in_progress)
>  		return;
> @@ -6129,8 +6129,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  		"BUG: sleeping function called from invalid context at %s:%d\n",
>  			file, line);
>  	printk(KERN_ERR
> -		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
> -			in_atomic(), irqs_disabled(),
> +		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
> +			in_atomic(), irqs_disabled(), current->non_block_count,
>  			current->pid, current->comm);
>  
>  	if (task_stack_end_corrupted(current))
> -- 
> 2.20.0.rc1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
  2018-12-10 14:13     ` Michal Hocko
@ 2018-12-10 14:47       ` Peter Zijlstra
  -1 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2018-12-10 14:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Daniel Vetter, Intel Graphics Development, DRI Development, LKML,
	linux-mm, Andrew Morton, David Rientjes, Christian König,
	Jérôme Glisse, Daniel Vetter

On Mon, Dec 10, 2018 at 03:13:37PM +0100, Michal Hocko wrote:
> I do not see any scheduler guys Cced and it would be really great to get
> their opinion here.
> 
> On Mon 10-12-18 11:36:39, Daniel Vetter wrote:
> > In some special cases we must not block, but there's not a
> > spinlock, preempt-off, irqs-off or similar critical section already
> > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > pair to annotate these.
> > 
> > This will be used in the oom paths of mmu-notifiers, where blocking is
> > not allowed to make sure there's forward progress.
> 
> Considering the only alternative would be to abuse
> preempt_{disable,enable}, and that really has a different semantic, I
> think this makes some sense. The cotext is preemptible but we do not
> want notifier to sleep on any locks, WQ etc.

I'm confused... what is this supposed to do?

And what does 'block' mean here? Without preempt_disable/IRQ-off we're
subject to regular preemption and execution can stall for arbitrary
amounts of time.

The Changelog doesn't yield any clues.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
@ 2018-12-10 14:47       ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2018-12-10 14:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Daniel Vetter, Intel Graphics Development, LKML, DRI Development,
	linux-mm, Jérôme Glisse, David Rientjes, Daniel Vetter,
	Andrew Morton, Christian König

On Mon, Dec 10, 2018 at 03:13:37PM +0100, Michal Hocko wrote:
> I do not see any scheduler guys Cced and it would be really great to get
> their opinion here.
> 
> On Mon 10-12-18 11:36:39, Daniel Vetter wrote:
> > In some special cases we must not block, but there's not a
> > spinlock, preempt-off, irqs-off or similar critical section already
> > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > pair to annotate these.
> > 
> > This will be used in the oom paths of mmu-notifiers, where blocking is
> > not allowed to make sure there's forward progress.
> 
> Considering the only alternative would be to abuse
> preempt_{disable,enable}, and that really has a different semantic, I
> think this makes some sense. The cotext is preemptible but we do not
> want notifier to sleep on any locks, WQ etc.

I'm confused... what is this supposed to do?

And what does 'block' mean here? Without preempt_disable/IRQ-off we're
subject to regular preemption and execution can stall for arbitrary
amounts of time.

The Changelog doesn't yield any clues.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
  2018-12-10 14:47       ` Peter Zijlstra
  (?)
@ 2018-12-10 15:01       ` Michal Hocko
  2018-12-10 15:22         ` Peter Zijlstra
  -1 siblings, 1 reply; 36+ messages in thread
From: Michal Hocko @ 2018-12-10 15:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Vetter, Intel Graphics Development, DRI Development, LKML,
	linux-mm, Andrew Morton, David Rientjes, Christian König,
	Jérôme Glisse, Daniel Vetter

On Mon 10-12-18 15:47:11, Peter Zijlstra wrote:
> On Mon, Dec 10, 2018 at 03:13:37PM +0100, Michal Hocko wrote:
> > I do not see any scheduler guys Cced and it would be really great to get
> > their opinion here.
> > 
> > On Mon 10-12-18 11:36:39, Daniel Vetter wrote:
> > > In some special cases we must not block, but there's not a
> > > spinlock, preempt-off, irqs-off or similar critical section already
> > > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > > pair to annotate these.
> > > 
> > > This will be used in the oom paths of mmu-notifiers, where blocking is
> > > not allowed to make sure there's forward progress.
> > 
> > Considering the only alternative would be to abuse
> > preempt_{disable,enable}, and that really has a different semantic, I
> > think this makes some sense. The cotext is preemptible but we do not
> > want notifier to sleep on any locks, WQ etc.
> 
> I'm confused... what is this supposed to do?
> 
> And what does 'block' mean here? Without preempt_disable/IRQ-off we're
> subject to regular preemption and execution can stall for arbitrary
> amounts of time.

The notifier is called from quite a restricted context - oom_reaper - 
which shouldn't depend on any locks or sleepable conditionals. The code
should be swift as well but we mostly do care about it to make a forward
progress. Checking for sleepable context is the best thing we could come
up with that would describe these demands at least partially.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
  2018-12-10 15:01       ` Michal Hocko
@ 2018-12-10 15:22         ` Peter Zijlstra
  2018-12-10 16:20           ` Michal Hocko
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2018-12-10 15:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Daniel Vetter, Intel Graphics Development, DRI Development, LKML,
	linux-mm, Andrew Morton, David Rientjes, Christian König,
	Jérôme Glisse, Daniel Vetter

On Mon, Dec 10, 2018 at 04:01:59PM +0100, Michal Hocko wrote:
> On Mon 10-12-18 15:47:11, Peter Zijlstra wrote:
> > On Mon, Dec 10, 2018 at 03:13:37PM +0100, Michal Hocko wrote:
> > > I do not see any scheduler guys Cced and it would be really great to get
> > > their opinion here.
> > > 
> > > On Mon 10-12-18 11:36:39, Daniel Vetter wrote:
> > > > In some special cases we must not block, but there's not a
> > > > spinlock, preempt-off, irqs-off or similar critical section already
> > > > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > > > pair to annotate these.
> > > > 
> > > > This will be used in the oom paths of mmu-notifiers, where blocking is
> > > > not allowed to make sure there's forward progress.
> > > 
> > > Considering the only alternative would be to abuse
> > > preempt_{disable,enable}, and that really has a different semantic, I
> > > think this makes some sense. The cotext is preemptible but we do not
> > > want notifier to sleep on any locks, WQ etc.
> > 
> > I'm confused... what is this supposed to do?
> > 
> > And what does 'block' mean here? Without preempt_disable/IRQ-off we're
> > subject to regular preemption and execution can stall for arbitrary
> > amounts of time.
> 
> The notifier is called from quite a restricted context - oom_reaper - 
> which shouldn't depend on any locks or sleepable conditionals. 

You want to exclude spinlocks too? We could maybe frob something with
lockdep if you need that?

> The code
> should be swift as well but we mostly do care about it to make a forward
> progress. Checking for sleepable context is the best thing we could come
> up with that would describe these demands at least partially.

OK, no real objections to the thing.  Just so long we're all on the same
page as to what it does and doesn't do ;-)

I suppose you could extend the check to include schedule_debug() as
well, maybe something like:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f66920173370..b1aaa278f1af 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
 /*
  * Various schedule()-time debugging checks and statistics:
  */
-static inline void schedule_debug(struct task_struct *prev)
+static inline void schedule_debug(struct task_struct *prev, bool preempt)
 {
 #ifdef CONFIG_SCHED_STACK_END_CHECK
 	if (task_stack_end_corrupted(prev))
 		panic("corrupted stack end detected inside scheduler\n");
 #endif
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+	if (!preempt && prev->state && prev->non_block_count)
+		// splat
+#endif
+
 	if (unlikely(in_atomic_preempt_off())) {
 		__schedule_bug(prev);
 		preempt_count_set(PREEMPT_DISABLED);
@@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt)
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
 
-	schedule_debug(prev);
+	schedule_debug(prev, preempt);
 
 	if (sched_feat(HRTICK))
 		hrtick_clear(rq);


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* ✗ Fi.CI.IGT: failure for mmu notifier debug checks v2
  2018-12-10 10:36 [PATCH 0/4] mmu notifier debug checks v2 Daniel Vetter
                   ` (5 preceding siblings ...)
  2018-12-10 12:28 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-12-10 15:54 ` Patchwork
  2018-12-10 16:47 ` ✗ Fi.CI.BAT: failure for mmu notifier debug checks v2 (rev2) Patchwork
  7 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2018-12-10 15:54 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: mmu notifier debug checks v2
URL   : https://patchwork.freedesktop.org/series/53828/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5292_full -> Patchwork_11056_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_11056_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_11056_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_11056_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-kbl:          PASS -> DMESG-WARN +8

  * igt@gem_userptr_blits@map-fixed-invalidate-busy:
    - shard-glk:          PASS -> DMESG-WARN +6

  * igt@gem_userptr_blits@map-fixed-invalidate-busy-gup:
    - shard-skl:          NOTRUN -> DMESG-WARN

  * igt@gem_userptr_blits@map-fixed-invalidate-gup:
    - shard-apl:          PASS -> DMESG-WARN +6

  * igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy:
    - {shard-iclb}:       PASS -> DMESG-WARN +6

  * igt@gem_userptr_blits@map-fixed-invalidate-overlap-gup:
    - shard-skl:          PASS -> DMESG-WARN +7

  * igt@gem_userptr_blits@sync-unmap:
    - shard-hsw:          PASS -> DMESG-WARN +6

  * igt@gem_userptr_blits@sync-unmap-cycles:
    - shard-snb:          PASS -> DMESG-WARN +8

  * {igt@runner@aborted}:
    - shard-glk:          NOTRUN -> FAIL
    - shard-hsw:          NOTRUN -> FAIL
    - shard-snb:          NOTRUN -> ( 2 FAIL )
    - shard-kbl:          NOTRUN -> ( 2 FAIL )
    - shard-skl:          NOTRUN -> ( 2 FAIL )
    - {shard-iclb}:       NOTRUN -> ( 2 FAIL ) [fdo#108654]
    - shard-apl:          NOTRUN -> FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_11056_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_workarounds@suspend-resume:
    - {shard-iclb}:       PASS -> INCOMPLETE [fdo#107713]

  * igt@kms_busy@extended-modeset-hang-newfb-render-a:
    - {shard-iclb}:       PASS -> DMESG-WARN [fdo#107956]

  * igt@kms_busy@extended-modeset-hang-newfb-render-b:
    - {shard-iclb}:       NOTRUN -> DMESG-WARN [fdo#107956]

  * igt@kms_chv_cursor_fail@pipe-a-64x64-bottom-edge:
    - shard-skl:          NOTRUN -> FAIL [fdo#104671]

  * igt@kms_color@pipe-a-degamma:
    - shard-apl:          PASS -> FAIL [fdo#104782] / [fdo#108145]

  * igt@kms_cursor_crc@cursor-128x128-offscreen:
    - shard-skl:          NOTRUN -> FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-256x256-sliding:
    - {shard-iclb}:       NOTRUN -> FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-size-change:
    - shard-apl:          PASS -> FAIL [fdo#103232]

  * igt@kms_draw_crc@draw-method-xrgb2101010-blt-xtiled:
    - shard-skl:          NOTRUN -> FAIL [fdo#103184]

  * igt@kms_draw_crc@draw-method-xrgb2101010-mmap-gtt-xtiled:
    - {shard-iclb}:       PASS -> WARN [fdo#108336] +2

  * igt@kms_draw_crc@draw-method-xrgb8888-mmap-wc-ytiled:
    - shard-skl:          PASS -> FAIL [fdo#103184]

  * igt@kms_draw_crc@draw-method-xrgb8888-pwrite-ytiled:
    - shard-skl:          NOTRUN -> FAIL [fdo#108222]

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          PASS -> FAIL [fdo#105363]

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-cpu:
    - shard-apl:          PASS -> FAIL [fdo#103167] +3

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render:
    - shard-glk:          PASS -> FAIL [fdo#103167]

  * igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack:
    - shard-skl:          NOTRUN -> FAIL [fdo#105682]

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-draw-pwrite:
    - {shard-iclb}:       PASS -> DMESG-FAIL [fdo#107724] +3

  * igt@kms_frontbuffer_tracking@fbcpsr-rgb101010-draw-mmap-gtt:
    - shard-skl:          NOTRUN -> FAIL [fdo#103167] / [fdo#105682]

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-indfb-draw-mmap-cpu:
    - shard-skl:          NOTRUN -> FAIL [fdo#103167] +1

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-blt:
    - {shard-iclb}:       PASS -> FAIL [fdo#103167] +4

  * {igt@kms_plane@pixel-format-pipe-b-planes-source-clamping}:
    - {shard-iclb}:       NOTRUN -> FAIL [fdo#108948]

  * igt@kms_plane@plane-position-hole-dpms-pipe-a-planes:
    - {shard-iclb}:       PASS -> DMESG-WARN [fdo#107724] / [fdo#108336] +14

  * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-max:
    - shard-glk:          PASS -> FAIL [fdo#108145]

  * igt@kms_plane_alpha_blend@pipe-c-alpha-7efc:
    - shard-skl:          NOTRUN -> FAIL [fdo#107815] / [fdo#108145] +2

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-x:
    - {shard-iclb}:       PASS -> FAIL [fdo#103166]

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-x:
    - shard-apl:          PASS -> FAIL [fdo#103166]

  * igt@kms_setmode@basic:
    - shard-apl:          PASS -> FAIL [fdo#99912]
    - {shard-iclb}:       PASS -> FAIL [fdo#99912]
    - shard-hsw:          PASS -> FAIL [fdo#99912]
    - shard-kbl:          PASS -> FAIL [fdo#99912]

  * igt@pm_backlight@basic-brightness:
    - {shard-iclb}:       PASS -> DMESG-WARN [fdo#107724] +30

  * igt@pm_rpm@modeset-lpsp-stress:
    - {shard-iclb}:       PASS -> DMESG-WARN [fdo#108654]

  
#### Possible fixes ####

  * igt@gem_ppgtt@blt-vs-render-ctxn:
    - shard-skl:          TIMEOUT [fdo#108039] -> PASS

  * igt@gem_userptr_blits@readonly-unsync:
    - shard-skl:          TIMEOUT [fdo#108887] -> PASS

  * igt@kms_busy@extended-modeset-hang-oldfb-with-reset-render-a:
    - {shard-iclb}:       DMESG-WARN [fdo#107724] -> PASS +16

  * igt@kms_busy@extended-pageflip-hang-newfb-render-c:
    - shard-apl:          DMESG-WARN [fdo#107956] -> PASS

  * igt@kms_cursor_crc@cursor-64x64-suspend:
    - shard-glk:          FAIL [fdo#103232] -> PASS +1

  * igt@kms_draw_crc@draw-method-xrgb2101010-blt-xtiled:
    - {shard-iclb}:       WARN [fdo#108336] -> PASS +2

  * igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled:
    - shard-skl:          FAIL [fdo#103184] -> PASS

  * igt@kms_flip_tiling@flip-to-x-tiled:
    - shard-skl:          FAIL [fdo#108134] -> PASS

  * igt@kms_flip_tiling@flip-y-tiled:
    - shard-skl:          FAIL [fdo#108303] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-cpu:
    - {shard-iclb}:       DMESG-FAIL [fdo#107724] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt:
    - shard-apl:          FAIL [fdo#103167] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-move:
    - {shard-iclb}:       FAIL [fdo#103167] -> PASS +4

  * igt@kms_frontbuffer_tracking@fbc-rgb565-draw-render:
    - shard-glk:          FAIL [fdo#103167] -> PASS +1

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-indfb-draw-mmap-cpu:
    - {shard-iclb}:       DMESG-WARN [fdo#107724] / [fdo#108336] -> PASS +5

  * {igt@kms_plane@pixel-format-pipe-c-planes-source-clamping}:
    - shard-apl:          FAIL [fdo#108948] -> PASS

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes:
    - {shard-iclb}:       INCOMPLETE [fdo#107713] -> PASS

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          FAIL [fdo#107815] -> PASS

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-x:
    - shard-apl:          FAIL [fdo#103166] -> PASS +2

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-x:
    - {shard-iclb}:       FAIL [fdo#103166] -> PASS +1

  
#### Warnings ####

  * igt@i915_selftest@live_contexts:
    - {shard-iclb}:       DMESG-FAIL [fdo#108569] -> INCOMPLETE [fdo#108315]

  * igt@i915_suspend@shrink:
    - shard-skl:          DMESG-WARN [fdo#108784] -> INCOMPLETE [fdo#106886]
    - shard-glk:          DMESG-WARN [fdo#108784] -> INCOMPLETE [fdo#103359] / [fdo#106886] / [k.org#198133]

  * igt@kms_cursor_crc@cursor-256x256-suspend:
    - {shard-iclb}:       INCOMPLETE [fdo#107713] -> FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-256x85-random:
    - {shard-iclb}:       FAIL [fdo#103232] -> DMESG-WARN [fdo#107724] / [fdo#108336]

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-draw-mmap-gtt:
    - {shard-iclb}:       FAIL [fdo#103167] -> DMESG-FAIL [fdo#107724]

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-mmap-wc:
    - {shard-iclb}:       FAIL [fdo#103167] -> DMESG-WARN [fdo#107724] / [fdo#108336] +1

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103184]: https://bugs.freedesktop.org/show_bug.cgi?id=103184
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#104671]: https://bugs.freedesktop.org/show_bug.cgi?id=104671
  [fdo#104782]: https://bugs.freedesktop.org/show_bug.cgi?id=104782
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105682]: https://bugs.freedesktop.org/show_bug.cgi?id=105682
  [fdo#106886]: https://bugs.freedesktop.org/show_bug.cgi?id=106886
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#107815]: https://bugs.freedesktop.org/show_bug.cgi?id=107815
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108039]: https://bugs.freedesktop.org/show_bug.cgi?id=108039
  [fdo#108134]: https://bugs.freedesktop.org/show_bug.cgi?id=108134
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108222]: https://bugs.freedesktop.org/show_bug.cgi?id=108222
  [fdo#108303]: https://bugs.freedesktop.org/show_bug.cgi?id=108303
  [fdo#108315]: https://bugs.freedesktop.org/show_bug.cgi?id=108315
  [fdo#108336]: https://bugs.freedesktop.org/show_bug.cgi?id=108336
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#108654]: https://bugs.freedesktop.org/show_bug.cgi?id=108654
  [fdo#108784]: https://bugs.freedesktop.org/show_bug.cgi?id=108784
  [fdo#108887]: https://bugs.freedesktop.org/show_bug.cgi?id=108887
  [fdo#108948]: https://bugs.freedesktop.org/show_bug.cgi?id=108948
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (7 -> 7)
------------------------------

  No changes in participating hosts


Build changes
-------------

    * Linux: CI_DRM_5292 -> Patchwork_11056

  CI_DRM_5292: ec6b8cacbc8777a77119fa7af7e2930fe186091b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4744: 4579ac1d445cf39f6de474071b20db790db575bd @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_11056: a1c311d5605eac8919aa0c8fa62137b168ce0d18 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_11056/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
  2018-12-10 15:22         ` Peter Zijlstra
@ 2018-12-10 16:20           ` Michal Hocko
  2018-12-10 16:30               ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Hocko @ 2018-12-10 16:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Vetter, Intel Graphics Development, DRI Development, LKML,
	linux-mm, Andrew Morton, David Rientjes, Christian König,
	Jérôme Glisse, Daniel Vetter

On Mon 10-12-18 16:22:53, Peter Zijlstra wrote:
> On Mon, Dec 10, 2018 at 04:01:59PM +0100, Michal Hocko wrote:
> > On Mon 10-12-18 15:47:11, Peter Zijlstra wrote:
> > > On Mon, Dec 10, 2018 at 03:13:37PM +0100, Michal Hocko wrote:
> > > > I do not see any scheduler guys Cced and it would be really great to get
> > > > their opinion here.
> > > > 
> > > > On Mon 10-12-18 11:36:39, Daniel Vetter wrote:
> > > > > In some special cases we must not block, but there's not a
> > > > > spinlock, preempt-off, irqs-off or similar critical section already
> > > > > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > > > > pair to annotate these.
> > > > > 
> > > > > This will be used in the oom paths of mmu-notifiers, where blocking is
> > > > > not allowed to make sure there's forward progress.
> > > > 
> > > > Considering the only alternative would be to abuse
> > > > preempt_{disable,enable}, and that really has a different semantic, I
> > > > think this makes some sense. The cotext is preemptible but we do not
> > > > want notifier to sleep on any locks, WQ etc.
> > > 
> > > I'm confused... what is this supposed to do?
> > > 
> > > And what does 'block' mean here? Without preempt_disable/IRQ-off we're
> > > subject to regular preemption and execution can stall for arbitrary
> > > amounts of time.
> > 
> > The notifier is called from quite a restricted context - oom_reaper - 
> > which shouldn't depend on any locks or sleepable conditionals. 
> 
> You want to exclude spinlocks too? We could maybe frob something with
> lockdep if you need that?

Spinlocks are less of a problem because you cannot have a (in)direct
dependency on the page allocator that would deadlock. Spinlocks, or
preemption disabled in general should be short enough to guarantee a
forward progress.

> > The code
> > should be swift as well but we mostly do care about it to make a forward
> > progress. Checking for sleepable context is the best thing we could come
> > up with that would describe these demands at least partially.
> 
> OK, no real objections to the thing.  Just so long we're all on the same
> page as to what it does and doesn't do ;-)

I am not really sure whether there are other potential users besides
this one and whether the check as such is justified.

> I suppose you could extend the check to include schedule_debug() as
> well, maybe something like:

Do you mean to make the check cheaper?

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f66920173370..b1aaa278f1af 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
>  /*
>   * Various schedule()-time debugging checks and statistics:
>   */
> -static inline void schedule_debug(struct task_struct *prev)
> +static inline void schedule_debug(struct task_struct *prev, bool preempt)
>  {
>  #ifdef CONFIG_SCHED_STACK_END_CHECK
>  	if (task_stack_end_corrupted(prev))
>  		panic("corrupted stack end detected inside scheduler\n");
>  #endif
>  
> +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> +	if (!preempt && prev->state && prev->non_block_count)
> +		// splat
> +#endif
> +
>  	if (unlikely(in_atomic_preempt_off())) {
>  		__schedule_bug(prev);
>  		preempt_count_set(PREEMPT_DISABLED);
> @@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt)
>  	rq = cpu_rq(cpu);
>  	prev = rq->curr;
>  
> -	schedule_debug(prev);
> +	schedule_debug(prev, preempt);
>  
>  	if (sched_feat(HRTICK))
>  		hrtick_clear(rq);

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
  2018-12-10 16:20           ` Michal Hocko
@ 2018-12-10 16:30               ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2018-12-10 16:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Daniel Vetter, Intel Graphics Development, DRI Development, LKML,
	linux-mm, Andrew Morton, David Rientjes, Christian König,
	Jérôme Glisse, Daniel Vetter

On Mon, Dec 10, 2018 at 05:20:10PM +0100, Michal Hocko wrote:
> > OK, no real objections to the thing.  Just so long we're all on the same
> > page as to what it does and doesn't do ;-)
> 
> I am not really sure whether there are other potential users besides
> this one and whether the check as such is justified.

It's a debug option...

> > I suppose you could extend the check to include schedule_debug() as
> > well, maybe something like:
> 
> Do you mean to make the check cheaper?

Nah, so the patch only touched might_sleep(), the below touches
schedule().

If there were a patch that hits schedule() without going through a
might_sleep() (rare in practise I think, but entirely possible) then you
won't get a splat without something like the below on top.

> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f66920173370..b1aaa278f1af 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
> >  /*
> >   * Various schedule()-time debugging checks and statistics:
> >   */
> > -static inline void schedule_debug(struct task_struct *prev)
> > +static inline void schedule_debug(struct task_struct *prev, bool preempt)
> >  {
> >  #ifdef CONFIG_SCHED_STACK_END_CHECK
> >  	if (task_stack_end_corrupted(prev))
> >  		panic("corrupted stack end detected inside scheduler\n");
> >  #endif
> >  
> > +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> > +	if (!preempt && prev->state && prev->non_block_count)
> > +		// splat
> > +#endif
> > +
> >  	if (unlikely(in_atomic_preempt_off())) {
> >  		__schedule_bug(prev);
> >  		preempt_count_set(PREEMPT_DISABLED);
> > @@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt)
> >  	rq = cpu_rq(cpu);
> >  	prev = rq->curr;
> >  
> > -	schedule_debug(prev);
> > +	schedule_debug(prev, preempt);
> >  
> >  	if (sched_feat(HRTICK))
> >  		hrtick_clear(rq);
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
@ 2018-12-10 16:30               ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2018-12-10 16:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Daniel Vetter, Intel Graphics Development, LKML, DRI Development,
	linux-mm, Jérôme Glisse, David Rientjes, Daniel Vetter,
	Andrew Morton, Christian König

On Mon, Dec 10, 2018 at 05:20:10PM +0100, Michal Hocko wrote:
> > OK, no real objections to the thing.  Just so long we're all on the same
> > page as to what it does and doesn't do ;-)
> 
> I am not really sure whether there are other potential users besides
> this one and whether the check as such is justified.

It's a debug option...

> > I suppose you could extend the check to include schedule_debug() as
> > well, maybe something like:
> 
> Do you mean to make the check cheaper?

Nah, so the patch only touched might_sleep(), the below touches
schedule().

If there were a patch that hits schedule() without going through a
might_sleep() (rare in practise I think, but entirely possible) then you
won't get a splat without something like the below on top.

> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f66920173370..b1aaa278f1af 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
> >  /*
> >   * Various schedule()-time debugging checks and statistics:
> >   */
> > -static inline void schedule_debug(struct task_struct *prev)
> > +static inline void schedule_debug(struct task_struct *prev, bool preempt)
> >  {
> >  #ifdef CONFIG_SCHED_STACK_END_CHECK
> >  	if (task_stack_end_corrupted(prev))
> >  		panic("corrupted stack end detected inside scheduler\n");
> >  #endif
> >  
> > +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> > +	if (!preempt && prev->state && prev->non_block_count)
> > +		// splat
> > +#endif
> > +
> >  	if (unlikely(in_atomic_preempt_off())) {
> >  		__schedule_bug(prev);
> >  		preempt_count_set(PREEMPT_DISABLED);
> > @@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt)
> >  	rq = cpu_rq(cpu);
> >  	prev = rq->curr;
> >  
> > -	schedule_debug(prev);
> > +	schedule_debug(prev, preempt);
> >  
> >  	if (sched_feat(HRTICK))
> >  		hrtick_clear(rq);
> 
> -- 
> Michal Hocko
> SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* ✗ Fi.CI.BAT: failure for mmu notifier debug checks v2 (rev2)
  2018-12-10 10:36 [PATCH 0/4] mmu notifier debug checks v2 Daniel Vetter
                   ` (6 preceding siblings ...)
  2018-12-10 15:54 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2018-12-10 16:47 ` Patchwork
  7 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2018-12-10 16:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: intel-gfx

== Series Details ==

Series: mmu notifier debug checks v2 (rev2)
URL   : https://patchwork.freedesktop.org/series/53828/
State : failure

== Summary ==

CALL    scripts/checksyscalls.sh
  DESCEND  objtool
  CHK     include/generated/compile.h
  CC      kernel/sched/core.o
kernel/sched/core.c: In function ‘schedule_debug’:
kernel/sched/core.c:3289:37: error: ‘struct task_struct’ has no member named ‘non_block_count’
  if (!preempt && prev->state && prev->non_block_count)
                                     ^~
scripts/Makefile.build:291: recipe for target 'kernel/sched/core.o' failed
make[2]: *** [kernel/sched/core.o] Error 1
scripts/Makefile.build:516: recipe for target 'kernel/sched' failed
make[1]: *** [kernel/sched] Error 2
Makefile:1060: recipe for target 'kernel' failed
make: *** [kernel] Error 2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
  2018-12-10 16:30               ` Peter Zijlstra
@ 2018-12-12 10:26                 ` Daniel Vetter
  -1 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2018-12-12 10:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michal Hocko, Daniel Vetter, Intel Graphics Development,
	DRI Development, LKML, linux-mm, Andrew Morton, David Rientjes,
	Christian König, Jérôme Glisse, Daniel Vetter

On Mon, Dec 10, 2018 at 05:30:09PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 10, 2018 at 05:20:10PM +0100, Michal Hocko wrote:
> > > OK, no real objections to the thing.  Just so long we're all on the same
> > > page as to what it does and doesn't do ;-)
> > 
> > I am not really sure whether there are other potential users besides
> > this one and whether the check as such is justified.
> 
> It's a debug option...
> 
> > > I suppose you could extend the check to include schedule_debug() as
> > > well, maybe something like:
> > 
> > Do you mean to make the check cheaper?
> 
> Nah, so the patch only touched might_sleep(), the below touches
> schedule().
> 
> If there were a patch that hits schedule() without going through a
> might_sleep() (rare in practise I think, but entirely possible) then you
> won't get a splat without something like the below on top.

We have a bunch of schedule() calls in i915, for e.g. waiting for multiple
events at the same time (when we want to unblock if any of them fire). And
there's no might_sleep in these cases afaict. Adding the check in
schedule() sounds useful, I'll include your snippet in v2. Plus try a bit
better to explain in the commit message why Michal suggested these.

Thanks, Daniel

> 
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index f66920173370..b1aaa278f1af 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
> > >  /*
> > >   * Various schedule()-time debugging checks and statistics:
> > >   */
> > > -static inline void schedule_debug(struct task_struct *prev)
> > > +static inline void schedule_debug(struct task_struct *prev, bool preempt)
> > >  {
> > >  #ifdef CONFIG_SCHED_STACK_END_CHECK
> > >  	if (task_stack_end_corrupted(prev))
> > >  		panic("corrupted stack end detected inside scheduler\n");
> > >  #endif
> > >  
> > > +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> > > +	if (!preempt && prev->state && prev->non_block_count)
> > > +		// splat
> > > +#endif
> > > +
> > >  	if (unlikely(in_atomic_preempt_off())) {
> > >  		__schedule_bug(prev);
> > >  		preempt_count_set(PREEMPT_DISABLED);
> > > @@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt)
> > >  	rq = cpu_rq(cpu);
> > >  	prev = rq->curr;
> > >  
> > > -	schedule_debug(prev);
> > > +	schedule_debug(prev, preempt);
> > >  
> > >  	if (sched_feat(HRTICK))
> > >  		hrtick_clear(rq);
> > 
> > -- 
> > Michal Hocko
> > SUSE Labs

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
@ 2018-12-12 10:26                 ` Daniel Vetter
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2018-12-12 10:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Vetter, Intel Graphics Development, LKML, DRI Development,
	Michal Hocko, linux-mm, Jérôme Glisse, David Rientjes,
	Daniel Vetter, Andrew Morton, Christian König

On Mon, Dec 10, 2018 at 05:30:09PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 10, 2018 at 05:20:10PM +0100, Michal Hocko wrote:
> > > OK, no real objections to the thing.  Just so long we're all on the same
> > > page as to what it does and doesn't do ;-)
> > 
> > I am not really sure whether there are other potential users besides
> > this one and whether the check as such is justified.
> 
> It's a debug option...
> 
> > > I suppose you could extend the check to include schedule_debug() as
> > > well, maybe something like:
> > 
> > Do you mean to make the check cheaper?
> 
> Nah, so the patch only touched might_sleep(), the below touches
> schedule().
> 
> If there were a patch that hits schedule() without going through a
> might_sleep() (rare in practise I think, but entirely possible) then you
> won't get a splat without something like the below on top.

We have a bunch of schedule() calls in i915, for e.g. waiting for multiple
events at the same time (when we want to unblock if any of them fire). And
there's no might_sleep in these cases afaict. Adding the check in
schedule() sounds useful, I'll include your snippet in v2. Plus try a bit
better to explain in the commit message why Michal suggested these.

Thanks, Daniel

> 
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index f66920173370..b1aaa278f1af 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
> > >  /*
> > >   * Various schedule()-time debugging checks and statistics:
> > >   */
> > > -static inline void schedule_debug(struct task_struct *prev)
> > > +static inline void schedule_debug(struct task_struct *prev, bool preempt)
> > >  {
> > >  #ifdef CONFIG_SCHED_STACK_END_CHECK
> > >  	if (task_stack_end_corrupted(prev))
> > >  		panic("corrupted stack end detected inside scheduler\n");
> > >  #endif
> > >  
> > > +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> > > +	if (!preempt && prev->state && prev->non_block_count)
> > > +		// splat
> > > +#endif
> > > +
> > >  	if (unlikely(in_atomic_preempt_off())) {
> > >  		__schedule_bug(prev);
> > >  		preempt_count_set(PREEMPT_DISABLED);
> > > @@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt)
> > >  	rq = cpu_rq(cpu);
> > >  	prev = rq->curr;
> > >  
> > > -	schedule_debug(prev);
> > > +	schedule_debug(prev, preempt);
> > >  
> > >  	if (sched_feat(HRTICK))
> > >  		hrtick_clear(rq);
> > 
> > -- 
> > Michal Hocko
> > SUSE Labs

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2019-06-19 20:42             ` Jason Gunthorpe
@ 2019-06-19 21:20               ` Daniel Vetter
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2019-06-19 21:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Michal Hocko, Daniel Vetter,
	Intel Graphics Development, LKML, DRI Development, Linux MM,
	David Rientjes, Paolo Bonzini, Andrew Morton,
	Christian König

On Wed, Jun 19, 2019 at 10:42 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 19, 2019 at 10:18:43PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 19, 2019 at 10:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> > > > On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > > > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > > > > implementation might fail when it's not allowed to.
> > > > > > > >
> > > > > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > > > > whether we could use the newly-introduced return value to handle some
> > > > > > > > corner cases. Until we realized that these are only for when a task
> > > > > > > > has been killed by the oom reaper.
> > > > > > > >
> > > > > > > > An alternative approach would be to split the callback into two
> > > > > > > > versions, one with the int return value, and the other with void
> > > > > > > > return value like in older kernels. But that's a lot more churn for
> > > > > > > > fairly little gain I think.
> > > > > > > >
> > > > > > > > Summary from the m-l discussion on why we want something at warning
> > > > > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > > > > humans having to look at everything. If we just upgrade the existing
> > > > > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > > > > of overall dmesg noise.
> > > > > > > >
> > > > > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > > > > the problematic case (Michal Hocko).
> > > > >
> > > > > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > > > > like syzkaller to report a bug, while a random pr_warn probably will
> > > > > not.
> > > > >
> > > > > I do agree the backtrace is not useful here, but we don't have a
> > > > > warn-no-backtrace version..
> > > > >
> > > > > IMHO, kernel/driver bugs should always be reported by WARN &
> > > > > friends. We never expect to see the print, so why do we care how big
> > > > > it is?
> > > > >
> > > > > Also note that WARN integrates an unlikely() into it so the codegen is
> > > > > automatically a bit more optimal that the if & pr_warn combination.
> > > >
> > > > Where do you make a difference between a WARN without backtrace and a
> > > > pr_warn? They're both dumped at the same log-level ...
> > >
> > > WARN panics the kernel when you set
> > >
> > > /proc/sys/kernel/panic_on_warn
> > >
> > > So auto testing tools can set that and get a clean detection that the
> > > kernel has failed the test in some way.
> > >
> > > Otherwise you are left with frail/ugly grepping of dmesg.
> >
> > Hm right.
> >
> > Anyway, I'm happy to repaint the bikeshed in any color that's desired,
> > if that helps with landing it. WARN_WITHOUT_BACKTRACE might take a bit
> > longer (need to find a bit of time, plus it'll definitely attract more
> > comments).
>
> I was actually just writing something very similar when looking at the
> hmm things..
>
> Also, is the test backwards?

Yes, in the last rebase I screwed things up :-/
-Daniel

> mmu_notifier_range_blockable() == true means the callback must return
> zero
>
> mmu_notififer_range_blockable() == false means the callback can return
> 0 or -EAGAIN.
>
> Suggest this:
>
>                                 pr_info("%pS callback failed with %d in %sblockable context.\n",
>                                         mn->ops->invalidate_range_start, _ret,
>                                         !mmu_notifier_range_blockable(range) ? "non-" : "");
> +                               WARN_ON(mmu_notifier_range_blockable(range) ||
> +                                       _ret != -EAGAIN);
>                                 ret = _ret;
>                         }
>                 }
>
> To express the API invariant.
>
> Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2019-06-19 20:18           ` Daniel Vetter
@ 2019-06-19 20:42             ` Jason Gunthorpe
  2019-06-19 21:20               ` Daniel Vetter
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Gunthorpe @ 2019-06-19 20:42 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jerome Glisse, Michal Hocko, Daniel Vetter,
	Intel Graphics Development, LKML, DRI Development, Linux MM,
	David Rientjes, Paolo Bonzini, Andrew Morton,
	Christian König

On Wed, Jun 19, 2019 at 10:18:43PM +0200, Daniel Vetter wrote:
> On Wed, Jun 19, 2019 at 10:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> > > On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > > > implementation might fail when it's not allowed to.
> > > > > > >
> > > > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > > > whether we could use the newly-introduced return value to handle some
> > > > > > > corner cases. Until we realized that these are only for when a task
> > > > > > > has been killed by the oom reaper.
> > > > > > >
> > > > > > > An alternative approach would be to split the callback into two
> > > > > > > versions, one with the int return value, and the other with void
> > > > > > > return value like in older kernels. But that's a lot more churn for
> > > > > > > fairly little gain I think.
> > > > > > >
> > > > > > > Summary from the m-l discussion on why we want something at warning
> > > > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > > > humans having to look at everything. If we just upgrade the existing
> > > > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > > > of overall dmesg noise.
> > > > > > >
> > > > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > > > the problematic case (Michal Hocko).
> > > >
> > > > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > > > like syzkaller to report a bug, while a random pr_warn probably will
> > > > not.
> > > >
> > > > I do agree the backtrace is not useful here, but we don't have a
> > > > warn-no-backtrace version..
> > > >
> > > > IMHO, kernel/driver bugs should always be reported by WARN &
> > > > friends. We never expect to see the print, so why do we care how big
> > > > it is?
> > > >
> > > > Also note that WARN integrates an unlikely() into it so the codegen is
> > > > automatically a bit more optimal that the if & pr_warn combination.
> > >
> > > Where do you make a difference between a WARN without backtrace and a
> > > pr_warn? They're both dumped at the same log-level ...
> >
> > WARN panics the kernel when you set
> >
> > /proc/sys/kernel/panic_on_warn
> >
> > So auto testing tools can set that and get a clean detection that the
> > kernel has failed the test in some way.
> >
> > Otherwise you are left with frail/ugly grepping of dmesg.
> 
> Hm right.
> 
> Anyway, I'm happy to repaint the bikeshed in any color that's desired,
> if that helps with landing it. WARN_WITHOUT_BACKTRACE might take a bit
> longer (need to find a bit of time, plus it'll definitely attract more
> comments).

I was actually just writing something very similar when looking at the
hmm things..

Also, is the test backwards?

mmu_notifier_range_blockable() == true means the callback must return
zero

mmu_notififer_range_blockable() == false means the callback can return
0 or -EAGAIN.

Suggest this:

                                pr_info("%pS callback failed with %d in %sblockable context.\n",
                                        mn->ops->invalidate_range_start, _ret,
                                        !mmu_notifier_range_blockable(range) ? "non-" : "");
+                               WARN_ON(mmu_notifier_range_blockable(range) ||
+                                       _ret != -EAGAIN);
                                ret = _ret;
                        }
                }

To express the API invariant.

Jason

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2019-06-19 20:13         ` Jason Gunthorpe
@ 2019-06-19 20:18           ` Daniel Vetter
  2019-06-19 20:42             ` Jason Gunthorpe
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Vetter @ 2019-06-19 20:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Michal Hocko, Daniel Vetter,
	Intel Graphics Development, LKML, DRI Development, Linux MM,
	David Rientjes, Paolo Bonzini, Andrew Morton,
	Christian König

On Wed, Jun 19, 2019 at 10:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > > implementation might fail when it's not allowed to.
> > > > > >
> > > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > > whether we could use the newly-introduced return value to handle some
> > > > > > corner cases. Until we realized that these are only for when a task
> > > > > > has been killed by the oom reaper.
> > > > > >
> > > > > > An alternative approach would be to split the callback into two
> > > > > > versions, one with the int return value, and the other with void
> > > > > > return value like in older kernels. But that's a lot more churn for
> > > > > > fairly little gain I think.
> > > > > >
> > > > > > Summary from the m-l discussion on why we want something at warning
> > > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > > humans having to look at everything. If we just upgrade the existing
> > > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > > of overall dmesg noise.
> > > > > >
> > > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > > the problematic case (Michal Hocko).
> > >
> > > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > > like syzkaller to report a bug, while a random pr_warn probably will
> > > not.
> > >
> > > I do agree the backtrace is not useful here, but we don't have a
> > > warn-no-backtrace version..
> > >
> > > IMHO, kernel/driver bugs should always be reported by WARN &
> > > friends. We never expect to see the print, so why do we care how big
> > > it is?
> > >
> > > Also note that WARN integrates an unlikely() into it so the codegen is
> > > automatically a bit more optimal that the if & pr_warn combination.
> >
> > Where do you make a difference between a WARN without backtrace and a
> > pr_warn? They're both dumped at the same log-level ...
>
> WARN panics the kernel when you set
>
> /proc/sys/kernel/panic_on_warn
>
> So auto testing tools can set that and get a clean detection that the
> kernel has failed the test in some way.
>
> Otherwise you are left with frail/ugly grepping of dmesg.

Hm right.

Anyway, I'm happy to repaint the bikeshed in any color that's desired,
if that helps with landing it. WARN_WITHOUT_BACKTRACE might take a bit
longer (need to find a bit of time, plus it'll definitely attract more
comments).

Michal?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2019-06-19 19:57       ` Daniel Vetter
@ 2019-06-19 20:13         ` Jason Gunthorpe
  2019-06-19 20:18           ` Daniel Vetter
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Gunthorpe @ 2019-06-19 20:13 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jerome Glisse, Michal Hocko, Daniel Vetter,
	Intel Graphics Development, LKML, DRI Development, Linux MM,
	David Rientjes, Paolo Bonzini, Andrew Morton,
	Christian König

On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > implementation might fail when it's not allowed to.
> > > > >
> > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > whether we could use the newly-introduced return value to handle some
> > > > > corner cases. Until we realized that these are only for when a task
> > > > > has been killed by the oom reaper.
> > > > >
> > > > > An alternative approach would be to split the callback into two
> > > > > versions, one with the int return value, and the other with void
> > > > > return value like in older kernels. But that's a lot more churn for
> > > > > fairly little gain I think.
> > > > >
> > > > > Summary from the m-l discussion on why we want something at warning
> > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > humans having to look at everything. If we just upgrade the existing
> > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > of overall dmesg noise.
> > > > >
> > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > the problematic case (Michal Hocko).
> >
> > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > like syzkaller to report a bug, while a random pr_warn probably will
> > not.
> >
> > I do agree the backtrace is not useful here, but we don't have a
> > warn-no-backtrace version..
> >
> > IMHO, kernel/driver bugs should always be reported by WARN &
> > friends. We never expect to see the print, so why do we care how big
> > it is?
> >
> > Also note that WARN integrates an unlikely() into it so the codegen is
> > automatically a bit more optimal that the if & pr_warn combination.
> 
> Where do you make a difference between a WARN without backtrace and a
> pr_warn? They're both dumped at the same log-level ...

WARN panics the kernel when you set 

/proc/sys/kernel/panic_on_warn

So auto testing tools can set that and get a clean detection that the
kernel has failed the test in some way.

Otherwise you are left with frail/ugly grepping of dmesg.

Jason

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2019-06-19 16:50     ` Jason Gunthorpe
@ 2019-06-19 19:57       ` Daniel Vetter
  2019-06-19 20:13         ` Jason Gunthorpe
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Vetter @ 2019-06-19 19:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Michal Hocko, Daniel Vetter,
	Intel Graphics Development, LKML, DRI Development, Linux MM,
	David Rientjes, Paolo Bonzini, Andrew Morton,
	Christian König

On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > callchains it's hard to spot all places where an mmu notifier
> > > > implementation might fail when it's not allowed to.
> > > >
> > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > whether we could use the newly-introduced return value to handle some
> > > > corner cases. Until we realized that these are only for when a task
> > > > has been killed by the oom reaper.
> > > >
> > > > An alternative approach would be to split the callback into two
> > > > versions, one with the int return value, and the other with void
> > > > return value like in older kernels. But that's a lot more churn for
> > > > fairly little gain I think.
> > > >
> > > > Summary from the m-l discussion on why we want something at warning
> > > > level: This allows automated tooling in CI to catch bugs without
> > > > humans having to look at everything. If we just upgrade the existing
> > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > one will ever spot the problem since it's lost in the massive amounts
> > > > of overall dmesg noise.
> > > >
> > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > the problematic case (Michal Hocko).
>
> I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> like syzkaller to report a bug, while a random pr_warn probably will
> not.
>
> I do agree the backtrace is not useful here, but we don't have a
> warn-no-backtrace version..
>
> IMHO, kernel/driver bugs should always be reported by WARN &
> friends. We never expect to see the print, so why do we care how big
> it is?
>
> Also note that WARN integrates an unlikely() into it so the codegen is
> automatically a bit more optimal that the if & pr_warn combination.

Where do you make a difference between a WARN without backtrace and a
pr_warn? They're both dumped at the same log-level ...

I can easily throw an unlikely around this here if that's the only
thing that's blocking the merge.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2019-06-18 15:22     ` Daniel Vetter
  (?)
@ 2019-06-19 16:50     ` Jason Gunthorpe
  2019-06-19 19:57       ` Daniel Vetter
  -1 siblings, 1 reply; 36+ messages in thread
From: Jason Gunthorpe @ 2019-06-19 16:50 UTC (permalink / raw)
  To: Jerome Glisse, Michal Hocko, Daniel Vetter,
	Intel Graphics Development, LKML, DRI Development, Linux MM,
	David Rientjes, Paolo Bonzini, Andrew Morton,
	Christian König

On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > Just a bit of paranoia, since if we start pushing this deep into
> > > callchains it's hard to spot all places where an mmu notifier
> > > implementation might fail when it's not allowed to.
> > > 
> > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > whether we could use the newly-introduced return value to handle some
> > > corner cases. Until we realized that these are only for when a task
> > > has been killed by the oom reaper.
> > > 
> > > An alternative approach would be to split the callback into two
> > > versions, one with the int return value, and the other with void
> > > return value like in older kernels. But that's a lot more churn for
> > > fairly little gain I think.
> > > 
> > > Summary from the m-l discussion on why we want something at warning
> > > level: This allows automated tooling in CI to catch bugs without
> > > humans having to look at everything. If we just upgrade the existing
> > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > one will ever spot the problem since it's lost in the massive amounts
> > > of overall dmesg noise.
> > > 
> > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > the problematic case (Michal Hocko).

I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
like syzkaller to report a bug, while a random pr_warn probably will
not.

I do agree the backtrace is not useful here, but we don't have a
warn-no-backtrace version..

IMHO, kernel/driver bugs should always be reported by WARN &
friends. We never expect to see the print, so why do we care how big
it is?

Also note that WARN integrates an unlikely() into it so the codegen is
automatically a bit more optimal that the if & pr_warn combination.

Jason

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2019-05-21 15:44 ` Jerome Glisse
@ 2019-06-18 15:22     ` Daniel Vetter
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2019-06-18 15:22 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Daniel Vetter, Michal Hocko, Daniel Vetter,
	Intel Graphics Development, LKML, DRI Development, Linux MM,
	David Rientjes, Paolo Bonzini, Andrew Morton,
	Christian König

On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > Just a bit of paranoia, since if we start pushing this deep into
> > callchains it's hard to spot all places where an mmu notifier
> > implementation might fail when it's not allowed to.
> > 
> > Inspired by some confusion we had discussing i915 mmu notifiers and
> > whether we could use the newly-introduced return value to handle some
> > corner cases. Until we realized that these are only for when a task
> > has been killed by the oom reaper.
> > 
> > An alternative approach would be to split the callback into two
> > versions, one with the int return value, and the other with void
> > return value like in older kernels. But that's a lot more churn for
> > fairly little gain I think.
> > 
> > Summary from the m-l discussion on why we want something at warning
> > level: This allows automated tooling in CI to catch bugs without
> > humans having to look at everything. If we just upgrade the existing
> > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > one will ever spot the problem since it's lost in the massive amounts
> > of overall dmesg noise.
> > 
> > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > the problematic case (Michal Hocko).
> > 
> > v3: Rebase on top of Glisse's arg rework.
> > 
> > v4: More rebase on top of Glisse reworking everything.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

-mm folks, is this (entire series of 4 patches) planned to land in the 5.3
merge window? Or do you want more reviews/testing/polish?

I think with all the hmm rework going on, a bit more validation and checks
in this tricky area would help.

Thanks, Daniel

> 
> > ---
> >  mm/mmu_notifier.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index ee36068077b6..c05e406a7cd7 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -181,6 +181,9 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> >  				pr_info("%pS callback failed with %d in %sblockable context.\n",
> >  					mn->ops->invalidate_range_start, _ret,
> >  					!mmu_notifier_range_blockable(range) ? "non-" : "");
> > +				if (!mmu_notifier_range_blockable(range))
> > +					pr_warn("%pS callback failure not allowed\n",
> > +						mn->ops->invalidate_range_start);
> >  				ret = _ret;
> >  			}
> >  		}
> > -- 
> > 2.20.1
> > 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
@ 2019-06-18 15:22     ` Daniel Vetter
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2019-06-18 15:22 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michal Hocko, Daniel Vetter, Intel Graphics Development, LKML,
	DRI Development, Linux MM, Paolo Bonzini, David Rientjes,
	Daniel Vetter, Andrew Morton, Christian König

On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > Just a bit of paranoia, since if we start pushing this deep into
> > callchains it's hard to spot all places where an mmu notifier
> > implementation might fail when it's not allowed to.
> > 
> > Inspired by some confusion we had discussing i915 mmu notifiers and
> > whether we could use the newly-introduced return value to handle some
> > corner cases. Until we realized that these are only for when a task
> > has been killed by the oom reaper.
> > 
> > An alternative approach would be to split the callback into two
> > versions, one with the int return value, and the other with void
> > return value like in older kernels. But that's a lot more churn for
> > fairly little gain I think.
> > 
> > Summary from the m-l discussion on why we want something at warning
> > level: This allows automated tooling in CI to catch bugs without
> > humans having to look at everything. If we just upgrade the existing
> > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > one will ever spot the problem since it's lost in the massive amounts
> > of overall dmesg noise.
> > 
> > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > the problematic case (Michal Hocko).
> > 
> > v3: Rebase on top of Glisse's arg rework.
> > 
> > v4: More rebase on top of Glisse reworking everything.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

-mm folks, is this (entire series of 4 patches) planned to land in the 5.3
merge window? Or do you want more reviews/testing/polish?

I think with all the hmm rework going on, a bit more validation and checks
in this tricky area would help.

Thanks, Daniel

> 
> > ---
> >  mm/mmu_notifier.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index ee36068077b6..c05e406a7cd7 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -181,6 +181,9 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> >  				pr_info("%pS callback failed with %d in %sblockable context.\n",
> >  					mn->ops->invalidate_range_start, _ret,
> >  					!mmu_notifier_range_blockable(range) ? "non-" : "");
> > +				if (!mmu_notifier_range_blockable(range))
> > +					pr_warn("%pS callback failure not allowed\n",
> > +						mn->ops->invalidate_range_start);
> >  				ret = _ret;
> >  			}
> >  		}
> > -- 
> > 2.20.1
> > 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
  2019-05-20 21:39 [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail Daniel Vetter
@ 2019-05-21 15:44 ` Jerome Glisse
  2019-06-18 15:22     ` Daniel Vetter
  0 siblings, 1 reply; 36+ messages in thread
From: Jerome Glisse @ 2019-05-21 15:44 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Intel Graphics Development, LKML, Linux MM,
	Andrew Morton, Michal Hocko, Christian König,
	David Rientjes, Paolo Bonzini, Daniel Vetter

On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
> 
> Inspired by some confusion we had discussing i915 mmu notifiers and
> whether we could use the newly-introduced return value to handle some
> corner cases. Until we realized that these are only for when a task
> has been killed by the oom reaper.
> 
> An alternative approach would be to split the callback into two
> versions, one with the int return value, and the other with void
> return value like in older kernels. But that's a lot more churn for
> fairly little gain I think.
> 
> Summary from the m-l discussion on why we want something at warning
> level: This allows automated tooling in CI to catch bugs without
> humans having to look at everything. If we just upgrade the existing
> pr_info to a pr_warn, then we'll have false positives. And as-is, no
> one will ever spot the problem since it's lost in the massive amounts
> of overall dmesg noise.
> 
> v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> the problematic case (Michal Hocko).
> 
> v3: Rebase on top of Glisse's arg rework.
> 
> v4: More rebase on top of Glisse reworking everything.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

> ---
>  mm/mmu_notifier.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index ee36068077b6..c05e406a7cd7 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -181,6 +181,9 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
>  				pr_info("%pS callback failed with %d in %sblockable context.\n",
>  					mn->ops->invalidate_range_start, _ret,
>  					!mmu_notifier_range_blockable(range) ? "non-" : "");
> +				if (!mmu_notifier_range_blockable(range))
> +					pr_warn("%pS callback failure not allowed\n",
> +						mn->ops->invalidate_range_start);
>  				ret = _ret;
>  			}
>  		}
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail
@ 2019-05-20 21:39 Daniel Vetter
  2019-05-21 15:44 ` Jerome Glisse
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Vetter @ 2019-05-20 21:39 UTC (permalink / raw)
  To: DRI Development
  Cc: Intel Graphics Development, LKML, Linux MM, Daniel Vetter,
	Andrew Morton, Michal Hocko, Christian König,
	David Rientjes, Jérôme Glisse, Paolo Bonzini,
	Daniel Vetter

Just a bit of paranoia, since if we start pushing this deep into
callchains it's hard to spot all places where an mmu notifier
implementation might fail when it's not allowed to.

Inspired by some confusion we had discussing i915 mmu notifiers and
whether we could use the newly-introduced return value to handle some
corner cases. Until we realized that these are only for when a task
has been killed by the oom reaper.

An alternative approach would be to split the callback into two
versions, one with the int return value, and the other with void
return value like in older kernels. But that's a lot more churn for
fairly little gain I think.

Summary from the m-l discussion on why we want something at warning
level: This allows automated tooling in CI to catch bugs without
humans having to look at everything. If we just upgrade the existing
pr_info to a pr_warn, then we'll have false positives. And as-is, no
one will ever spot the problem since it's lost in the massive amounts
of overall dmesg noise.

v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
the problematic case (Michal Hocko).

v3: Rebase on top of Glisse's arg rework.

v4: More rebase on top of Glisse reworking everything.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 mm/mmu_notifier.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index ee36068077b6..c05e406a7cd7 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -181,6 +181,9 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 				pr_info("%pS callback failed with %d in %sblockable context.\n",
 					mn->ops->invalidate_range_start, _ret,
 					!mmu_notifier_range_blockable(range) ? "non-" : "");
+				if (!mmu_notifier_range_blockable(range))
+					pr_warn("%pS callback failure not allowed\n",
+						mn->ops->invalidate_range_start);
 				ret = _ret;
 			}
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-06-19 21:20 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-10 10:36 [PATCH 0/4] mmu notifier debug checks v2 Daniel Vetter
2018-12-10 10:36 ` [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail Daniel Vetter
2018-12-10 10:36   ` Daniel Vetter
2018-12-10 10:44   ` Koenig, Christian
2018-12-10 10:44     ` Koenig, Christian
2018-12-10 13:27   ` Michal Hocko
2018-12-10 13:27     ` Michal Hocko
2018-12-10 10:36 ` [PATCH 2/4] kernel.h: Add non_block_start/end() Daniel Vetter
2018-12-10 10:36   ` Daniel Vetter
2018-12-10 14:13   ` Michal Hocko
2018-12-10 14:13     ` Michal Hocko
2018-12-10 14:47     ` Peter Zijlstra
2018-12-10 14:47       ` Peter Zijlstra
2018-12-10 15:01       ` Michal Hocko
2018-12-10 15:22         ` Peter Zijlstra
2018-12-10 16:20           ` Michal Hocko
2018-12-10 16:30             ` Peter Zijlstra
2018-12-10 16:30               ` Peter Zijlstra
2018-12-12 10:26               ` Daniel Vetter
2018-12-12 10:26                 ` Daniel Vetter
2018-12-10 10:36 ` [PATCH 3/4] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter
2018-12-10 10:36 ` [PATCH 4/4] mm, notifier: Add a lockdep map for invalidate_range_start Daniel Vetter
2018-12-10 12:07 ` ✗ Fi.CI.CHECKPATCH: warning for mmu notifier debug checks v2 Patchwork
2018-12-10 12:28 ` ✓ Fi.CI.BAT: success " Patchwork
2018-12-10 15:54 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-12-10 16:47 ` ✗ Fi.CI.BAT: failure for mmu notifier debug checks v2 (rev2) Patchwork
2019-05-20 21:39 [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail Daniel Vetter
2019-05-21 15:44 ` Jerome Glisse
2019-06-18 15:22   ` Daniel Vetter
2019-06-18 15:22     ` Daniel Vetter
2019-06-19 16:50     ` Jason Gunthorpe
2019-06-19 19:57       ` Daniel Vetter
2019-06-19 20:13         ` Jason Gunthorpe
2019-06-19 20:18           ` Daniel Vetter
2019-06-19 20:42             ` Jason Gunthorpe
2019-06-19 21:20               ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.