[RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers
@ 2017-08-30  8:46 ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2017-08-30  8:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Argangeli, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Andrea has noticed that the oom_reaper doesn't invalidate the range
via mmu notifiers (mmu_notifier_invalidate_range_start,
mmu_notifier_invalidate_range_end) and that can corrupt the memory
of the kvm guest for example. As the callback is allowed to sleep
and the implementation is out of hand of the MM it is safer to simply
bail out if there is an mmu notifier registered. In order to not
fail too early make the mm_has_notifiers check under the oom_lock
and have a little nap before failing to give the current oom victim some
more time to exit.

Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Noticed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
Andrea has pointed this out [1] while a different (but similar) bug has been
discussed. This is an ugly hack to plug the potential memory corruption but
we definitely want a better fix longterm.

Does this sound like a viable option for now?

[1] http://lkml.kernel.org/r/20170829140924.GB21615@redhat.com

 include/linux/mmu_notifier.h |  5 +++++
 mm/oom_kill.c                | 15 +++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index c91b3bcd158f..947f21b451d2 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -420,6 +420,11 @@ extern void mmu_notifier_synchronize(void);
 
 #else /* CONFIG_MMU_NOTIFIER */
 
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+	return 0;
+}
+
 static inline void mmu_notifier_release(struct mm_struct *mm)
 {
 }
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e026712..45f1a0c3dd90 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -40,6 +40,7 @@
 #include <linux/ratelimit.h>
 #include <linux/kthread.h>
 #include <linux/init.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -488,6 +489,20 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	 */
 	mutex_lock(&oom_lock);
 
+	/*
+	 * If the mm has notifiers then we would need to invalidate them around
+	 * unmap_page_range and that is risky because notifiers can sleep and
+	 * what they do is basically undeterministic. So let's have a short sleep
+	 * to give the oom victim some more time.
+	 * TODO: we really want to get rid of this ugly hack and make sure that
+	 * notifiers cannot block for unbounded amount of time and add
+	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range
+	 */
+	if (mm_has_notifiers(mm)) {
+		schedule_timeout_idle(HZ);
+		goto unlock_oom;
+	}
+
 	if (!down_read_trylock(&mm->mmap_sem)) {
 		ret = false;
 		trace_skip_task_reaping(tsk->pid);
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers
@ 2017-08-30  8:46 ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2017-08-30  8:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Argangeli, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Andrea has noticed that the oom_reaper doesn't invalidate the range
via mmu notifiers (mmu_notifier_invalidate_range_start,
mmu_notifier_invalidate_range_end) and that can corrupt the memory
of the kvm guest for example. As the callback is allowed to sleep
and the implementation is out of hand of the MM it is safer to simply
bail out if there is an mmu notifier registered. In order to not
fail too early make the mm_has_notifiers check under the oom_lock
and have a little nap before failing to give the current oom victim some
more time to exit.

Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Noticed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
Andrea has pointed this out [1] while a different (but similar) bug has been
discussed. This is an ugly hack to plug the potential memory corruption but
we definitely want a better fix longterm.

Does this sound like a viable option for now?

[1] http://lkml.kernel.org/r/20170829140924.GB21615@redhat.com

 include/linux/mmu_notifier.h |  5 +++++
 mm/oom_kill.c                | 15 +++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index c91b3bcd158f..947f21b451d2 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -420,6 +420,11 @@ extern void mmu_notifier_synchronize(void);
 
 #else /* CONFIG_MMU_NOTIFIER */
 
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+	return 0;
+}
+
 static inline void mmu_notifier_release(struct mm_struct *mm)
 {
 }
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e026712..45f1a0c3dd90 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -40,6 +40,7 @@
 #include <linux/ratelimit.h>
 #include <linux/kthread.h>
 #include <linux/init.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -488,6 +489,20 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	 */
 	mutex_lock(&oom_lock);
 
+	/*
+	 * If the mm has notifiers then we would need to invalidate them around
+	 * unmap_page_range and that is risky because notifiers can sleep and
+	 * what they do is basically undeterministic. So let's have a short sleep
+	 * to give the oom victim some more time.
+	 * TODO: we really want to get rid of this ugly hack and make sure that
+	 * notifiers cannot block for unbounded amount of time and add
+	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range
+	 */
+	if (mm_has_notifiers(mm)) {
+		schedule_timeout_idle(HZ);
+		goto unlock_oom;
+	}
+
 	if (!down_read_trylock(&mm->mmap_sem)) {
 		ret = false;
 		trace_skip_task_reaping(tsk->pid);
-- 
2.13.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers
  2017-08-30  8:46 ` Michal Hocko
@ 2017-08-30  9:09   ` Michal Hocko
  -1 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2017-08-30  9:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Argangeli, linux-mm, LKML

On Wed 30-08-17 10:46:00, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Andrea has noticed that the oom_reaper doesn't invalidate the range
> via mmu notifiers (mmu_notifier_invalidate_range_start,
> mmu_notifier_invalidate_range_end) and that can corrupt the memory
> of the kvm guest for example.

Forgot to mention that tlb_flush_mmu_tlbonly already invokes mmu
notifiers but that is not sufficient as per Andrea:
: mmu_notifier_invalidate_range cannot be used in replacement of
: mmu_notifier_invalidate_range_start/end. For KVM
: mmu_notifier_invalidate_range is a noop and rightfully so. A MMU
: notifier implementation has to implement either
: ->invalidate_range method or the invalidate_range_start/end
: methods, not both. And if you implement invalidate_range_start/end
: like KVM is forced to do, calling mmu_notifier_invalidate_range in
: common code is a noop for KVM.
: 
: For those MMU notifiers that can get away only implementing
: ->invalidate_range, the ->invalidate_range is implicitly called by
: mmu_notifier_invalidate_range_end(). And only those secondary MMUs
: that share the same pagetable with the primary MMU (like AMD
: iommuv2) can get away only implementing ->invalidate_range.

> As the callback is allowed to sleep
> and the implementation is out of hand of the MM it is safer to simply
> bail out if there is an mmu notifier registered. In order to not
> fail too early make the mm_has_notifiers check under the oom_lock
> and have a little nap before failing to give the current oom victim some
> more time to exit.
> 
> Fixes: aac453635549 ("mm, oom: introduce oom reaper")
> Noticed-by: Andrea Arcangeli <aarcange@redhat.com>
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> Andrea has pointed this out [1] while a different (but similar) bug has been
> discussed. This is an ugly hack to plug the potential memory corruption but
> we definitely want a better fix longterm.

> Does this sound like a viable option for now?
> 
> [1] http://lkml.kernel.org/r/20170829140924.GB21615@redhat.com
> 
>  include/linux/mmu_notifier.h |  5 +++++
>  mm/oom_kill.c                | 15 +++++++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index c91b3bcd158f..947f21b451d2 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -420,6 +420,11 @@ extern void mmu_notifier_synchronize(void);
>  
>  #else /* CONFIG_MMU_NOTIFIER */
>  
> +static inline int mm_has_notifiers(struct mm_struct *mm)
> +{
> +	return 0;
> +}
> +
>  static inline void mmu_notifier_release(struct mm_struct *mm)
>  {
>  }
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 99736e026712..45f1a0c3dd90 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -40,6 +40,7 @@
>  #include <linux/ratelimit.h>
>  #include <linux/kthread.h>
>  #include <linux/init.h>
> +#include <linux/mmu_notifier.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -488,6 +489,20 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
>  	 */
>  	mutex_lock(&oom_lock);
>  
> +	/*
> +	 * If the mm has notifiers then we would need to invalidate them around
> +	 * unmap_page_range and that is risky because notifiers can sleep and
> +	 * what they do is basically undeterministic. So let's have a short sleep
> +	 * to give the oom victim some more time.
> +	 * TODO: we really want to get rid of this ugly hack and make sure that
> +	 * notifiers cannot block for unbounded amount of time and add
> +	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range
> +	 */
> +	if (mm_has_notifiers(mm)) {
> +		schedule_timeout_idle(HZ);
> +		goto unlock_oom;
> +	}
> +
>  	if (!down_read_trylock(&mm->mmap_sem)) {
>  		ret = false;
>  		trace_skip_task_reaping(tsk->pid);
> -- 
> 2.13.2
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers
@ 2017-08-30  9:09   ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2017-08-30  9:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Argangeli, linux-mm, LKML

On Wed 30-08-17 10:46:00, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Andrea has noticed that the oom_reaper doesn't invalidate the range
> via mmu notifiers (mmu_notifier_invalidate_range_start,
> mmu_notifier_invalidate_range_end) and that can corrupt the memory
> of the kvm guest for example.

Forgot to mention that tlb_flush_mmu_tlbonly already invokes mmu
notifiers but that is not sufficient as per Andrea:
: mmu_notifier_invalidate_range cannot be used in replacement of
: mmu_notifier_invalidate_range_start/end. For KVM
: mmu_notifier_invalidate_range is a noop and rightfully so. A MMU
: notifier implementation has to implement either
: ->invalidate_range method or the invalidate_range_start/end
: methods, not both. And if you implement invalidate_range_start/end
: like KVM is forced to do, calling mmu_notifier_invalidate_range in
: common code is a noop for KVM.
: 
: For those MMU notifiers that can get away only implementing
: ->invalidate_range, the ->invalidate_range is implicitly called by
: mmu_notifier_invalidate_range_end(). And only those secondary MMUs
: that share the same pagetable with the primary MMU (like AMD
: iommuv2) can get away only implementing ->invalidate_range.

> As the callback is allowed to sleep
> and the implementation is out of hand of the MM it is safer to simply
> bail out if there is an mmu notifier registered. In order to not
> fail too early make the mm_has_notifiers check under the oom_lock
> and have a little nap before failing to give the current oom victim some
> more time to exit.
> 
> Fixes: aac453635549 ("mm, oom: introduce oom reaper")
> Noticed-by: Andrea Arcangeli <aarcange@redhat.com>
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> Andrea has pointed this out [1] while a different (but similar) bug has been
> discussed. This is an ugly hack to plug the potential memory corruption but
> we definitely want a better fix longterm.

> Does this sound like a viable option for now?
> 
> [1] http://lkml.kernel.org/r/20170829140924.GB21615@redhat.com
> 
>  include/linux/mmu_notifier.h |  5 +++++
>  mm/oom_kill.c                | 15 +++++++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index c91b3bcd158f..947f21b451d2 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -420,6 +420,11 @@ extern void mmu_notifier_synchronize(void);
>  
>  #else /* CONFIG_MMU_NOTIFIER */
>  
> +static inline int mm_has_notifiers(struct mm_struct *mm)
> +{
> +	return 0;
> +}
> +
>  static inline void mmu_notifier_release(struct mm_struct *mm)
>  {
>  }
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 99736e026712..45f1a0c3dd90 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -40,6 +40,7 @@
>  #include <linux/ratelimit.h>
>  #include <linux/kthread.h>
>  #include <linux/init.h>
> +#include <linux/mmu_notifier.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -488,6 +489,20 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
>  	 */
>  	mutex_lock(&oom_lock);
>  
> +	/*
> +	 * If the mm has notifiers then we would need to invalidate them around
> +	 * unmap_page_range and that is risky because notifiers can sleep and
> +	 * what they do is basically undeterministic. So let's have a short sleep
> +	 * to give the oom victim some more time.
> +	 * TODO: we really want to get rid of this ugly hack and make sure that
> +	 * notifiers cannot block for unbounded amount of time and add
> +	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range
> +	 */
> +	if (mm_has_notifiers(mm)) {
> +		schedule_timeout_idle(HZ);
> +		goto unlock_oom;
> +	}
> +
>  	if (!down_read_trylock(&mm->mmap_sem)) {
>  		ret = false;
>  		trace_skip_task_reaping(tsk->pid);
> -- 
> 2.13.2
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers
  2017-08-30  8:46 ` Michal Hocko
@ 2017-08-30 17:49   ` Andrea Arcangeli
  -1 siblings, 0 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2017-08-30 17:49 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, linux-mm, LKML, Michal Hocko

Hello Michal,

On Wed, Aug 30, 2017 at 10:46:00AM +0200, Michal Hocko wrote:
> +	 * TODO: we really want to get rid of this ugly hack and make sure that
> +	 * notifiers cannot block for unbounded amount of time and add
> +	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range

KVM already should be ok in that respect. However the major reason to
prefer mmu_notifier_invalidate_range_start/end is those can block and
schedule waiting for stuff happening behind the PCI bus easily. So I'm
not sure if the TODO is good idea to keep.

> +	 */
> +	if (mm_has_notifiers(mm)) {
> +		schedule_timeout_idle(HZ);

Why the schedule_timeout? What's the difference with the OOM
reaper going to sleep again in the main loop instead?

> +		goto unlock_oom;
> +	}

mm_has_notifiers stops changing after obtaining the mmap_sem for
reading. See the do_mmu_notifier_register. So it's better to put the
mm_has_notifiers check immediately after the below:

>  	if (!down_read_trylock(&mm->mmap_sem)) {
>  		ret = false;
>  		trace_skip_task_reaping(tsk->pid);

If we succeed taking the mmap_sem for reading then we read a stable
value out of mm_has_notifiers and be sure it won't be set from under
us.

Otherwise the patch looks fine including the incremental comment about
why the mmu_notifier_invalidate_range in MMU gather wasn't enough.

Thanks!
Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers
@ 2017-08-30 17:49   ` Andrea Arcangeli
  0 siblings, 0 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2017-08-30 17:49 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, linux-mm, LKML, Michal Hocko

Hello Michal,

On Wed, Aug 30, 2017 at 10:46:00AM +0200, Michal Hocko wrote:
> +	 * TODO: we really want to get rid of this ugly hack and make sure that
> +	 * notifiers cannot block for unbounded amount of time and add
> +	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range

KVM already should be ok in that respect. However the major reason to
prefer mmu_notifier_invalidate_range_start/end is those can block and
schedule waiting for stuff happening behind the PCI bus easily. So I'm
not sure if the TODO is good idea to keep.

> +	 */
> +	if (mm_has_notifiers(mm)) {
> +		schedule_timeout_idle(HZ);

Why the schedule_timeout? What's the difference with the OOM
reaper going to sleep again in the main loop instead?

> +		goto unlock_oom;
> +	}

mm_has_notifiers stops changing after obtaining the mmap_sem for
reading. See the do_mmu_notifier_register. So it's better to put the
mm_has_notifiers check immediately after the below:

>  	if (!down_read_trylock(&mm->mmap_sem)) {
>  		ret = false;
>  		trace_skip_task_reaping(tsk->pid);

If we succeed taking the mmap_sem for reading then we read a stable
value out of mm_has_notifiers and be sure it won't be set from under
us.

Otherwise the patch looks fine including the incremental comment about
why the mmu_notifier_invalidate_range in MMU gather wasn't enough.

Thanks!
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers
  2017-08-30 17:49   ` Andrea Arcangeli
@ 2017-08-31  5:29     ` Michal Hocko
  -1 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2017-08-31  5:29 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-mm, LKML

On Wed 30-08-17 19:49:04, Andrea Arcangeli wrote:
> Hello Michal,
> 
> On Wed, Aug 30, 2017 at 10:46:00AM +0200, Michal Hocko wrote:
> > +	 * TODO: we really want to get rid of this ugly hack and make sure that
> > +	 * notifiers cannot block for unbounded amount of time and add
> > +	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range
> 
> KVM already should be ok in that respect. However the major reason to
> prefer mmu_notifier_invalidate_range_start/end is those can block and
> schedule waiting for stuff happening behind the PCI bus easily. So I'm
> not sure if the TODO is good idea to keep.

Long term, I was thinking about a flag to reflect that all registered
notifiers are oom safe (aka they do not depend on memory allocations
or any locks which depend on an allocation) and then we can call into
notifiers. So the check would end up
	if (!mm_has_safe_notifiers(mm))
		...
 
> > +	 */
> > +	if (mm_has_notifiers(mm)) {
> > +		schedule_timeout_idle(HZ);
> 
> Why the schedule_timeout? What's the difference with the OOM
> reaper going to sleep again in the main loop instead?

Well, this is what I had initially - basically to return false here
and rely on oom_reap_task to retry. But my current understanding is that
mm_has_notifiers is likely to be a semi-permanent state (once set it
won't likely go away) so I figured it would be better to simply wait
here and fail right away. If my assumption is not correct then I will
simply return false here.

> 
> > +		goto unlock_oom;
> > +	}
> 
> mm_has_notifiers stops changing after obtaining the mmap_sem for
> reading. See the do_mmu_notifier_register. So it's better to put the
> mm_has_notifiers check immediately after the below:
> 
> >  	if (!down_read_trylock(&mm->mmap_sem)) {
> >  		ret = false;
> >  		trace_skip_task_reaping(tsk->pid);
> 
> If we succeed taking the mmap_sem for reading then we read a stable
> value out of mm_has_notifiers and be sure it won't be set from under
> us.

OK, I will move it.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers
@ 2017-08-31  5:29     ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2017-08-31  5:29 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-mm, LKML

On Wed 30-08-17 19:49:04, Andrea Arcangeli wrote:
> Hello Michal,
> 
> On Wed, Aug 30, 2017 at 10:46:00AM +0200, Michal Hocko wrote:
> > +	 * TODO: we really want to get rid of this ugly hack and make sure that
> > +	 * notifiers cannot block for unbounded amount of time and add
> > +	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range
> 
> KVM already should be ok in that respect. However the major reason to
> prefer mmu_notifier_invalidate_range_start/end is those can block and
> schedule waiting for stuff happening behind the PCI bus easily. So I'm
> not sure if the TODO is good idea to keep.

Long term, I was thinking about a flag to reflect that all registered
notifiers are oom safe (aka they do not depend on memory allocations
or any locks which depend on an allocation) and then we can call into
notifiers. So the check would end up
	if (!mm_has_safe_notifiers(mm))
		...
 
> > +	 */
> > +	if (mm_has_notifiers(mm)) {
> > +		schedule_timeout_idle(HZ);
> 
> Why the schedule_timeout? What's the difference with the OOM
> reaper going to sleep again in the main loop instead?

Well, this is what I had initially - basically to return false here
and rely on oom_reap_task to retry. But my current understanding is that
mm_has_notifiers is likely to be a semi-permanent state (once set it
won't likely go away) so I figured it would be better to simply wait
here and fail right away. If my assumption is not correct then I will
simply return false here.

> 
> > +		goto unlock_oom;
> > +	}
> 
> mm_has_notifiers stops changing after obtaining the mmap_sem for
> reading. See the do_mmu_notifier_register. So it's better to put the
> mm_has_notifiers check immediately after the below:
> 
> >  	if (!down_read_trylock(&mm->mmap_sem)) {
> >  		ret = false;
> >  		trace_skip_task_reaping(tsk->pid);
> 
> If we succeed taking the mmap_sem for reading then we read a stable
> value out of mm_has_notifiers and be sure it won't be set from under
> us.

OK, I will move it.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-08-31  5:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30  8:46 [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers Michal Hocko
2017-08-30  8:46 ` Michal Hocko
2017-08-30  9:09 ` Michal Hocko
2017-08-30  9:09   ` Michal Hocko
2017-08-30 17:49 ` Andrea Arcangeli
2017-08-30 17:49   ` Andrea Arcangeli
2017-08-31  5:29   ` Michal Hocko
2017-08-31  5:29     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.