All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, oom: Always call tlb_finish_mmu().
@ 2018-08-23 11:30 Tetsuo Handa
  2018-08-23 11:59 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Tetsuo Handa @ 2018-08-23 11:30 UTC (permalink / raw)
  To: Michal Hocko; +Cc: David Rientjes, Andrew Morton, linux-mm, Tetsuo Handa

Commit 93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
notifiers") added "continue;" without calling tlb_finish_mmu(). I don't
know whether tlb_flush_pending imbalance causes problems other than
extra cost, but at least it looks strange.

More worrisome part in that patch is that I don't know whether using
trylock if blockable == false at entry is really sufficient. For example,
since __gnttab_unmap_refs_async() from gnttab_unmap_refs_async() from
gnttab_unmap_refs_sync() from __unmap_grant_pages() from
unmap_grant_pages() from unmap_if_in_range() from mn_invl_range_start()
involves schedule_delayed_work() which could be blocked on memory
allocation under OOM situation, wait_for_completion() from
gnttab_unmap_refs_sync() might deadlock? I don't know...

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/oom_kill.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b5b25e4..4f431c1 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -522,6 +522,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
 
 			tlb_gather_mmu(&tlb, mm, start, end);
 			if (mmu_notifier_invalidate_range_start_nonblock(mm, start, end)) {
+				tlb_finish_mmu(&tlb, start, end);
 				ret = false;
 				continue;
 			}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm, oom: Always call tlb_finish_mmu().
  2018-08-23 11:30 [PATCH] mm, oom: Always call tlb_finish_mmu() Tetsuo Handa
@ 2018-08-23 11:59 ` Michal Hocko
  2018-08-23 13:48   ` Tetsuo Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2018-08-23 11:59 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: David Rientjes, Andrew Morton, linux-mm

On Thu 23-08-18 20:30:48, Tetsuo Handa wrote:
> Commit 93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
> notifiers") added "continue;" without calling tlb_finish_mmu(). I don't
> know whether tlb_flush_pending imbalance causes problems other than
> extra cost, but at least it looks strange.

tlb_flush_pending has mm scope and it would confuse
mm_tlb_flush_pending. At least ptep_clear_flush could get confused and
flush unnecessarily for prot_none entries AFAICS. Other paths shouldn't
trigger for oom victims. Even ptep_clear_flush is unlikely to happen.
So nothing really earth shattering but I do agree that it looks weird
and should be fixed.

> More worrisome part in that patch is that I don't know whether using
> trylock if blockable == false at entry is really sufficient. For example,
> since __gnttab_unmap_refs_async() from gnttab_unmap_refs_async() from
> gnttab_unmap_refs_sync() from __unmap_grant_pages() from
> unmap_grant_pages() from unmap_if_in_range() from mn_invl_range_start()
> involves schedule_delayed_work() which could be blocked on memory
> allocation under OOM situation, wait_for_completion() from
> gnttab_unmap_refs_sync() might deadlock? I don't know...

Not really sure why this is in the changelog as it is unrelated to the
fix. Anyway let me try to check...

OK, so I've added in_range(map, start, end) check to not go that
direction. But for some reason that check doesn't consider blockable
value. So it looks definitely wrong. I must have screwed up when
rebasing or something. Thanks for catching that up. I will send a fix.

> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  mm/oom_kill.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index b5b25e4..4f431c1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -522,6 +522,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
>  
>  			tlb_gather_mmu(&tlb, mm, start, end);
>  			if (mmu_notifier_invalidate_range_start_nonblock(mm, start, end)) {
> +				tlb_finish_mmu(&tlb, start, end);
>  				ret = false;
>  				continue;
>  			}
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm, oom: Always call tlb_finish_mmu().
  2018-08-23 11:59 ` Michal Hocko
@ 2018-08-23 13:48   ` Tetsuo Handa
  2018-08-23 14:02     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Tetsuo Handa @ 2018-08-23 13:48 UTC (permalink / raw)
  To: Michal Hocko; +Cc: David Rientjes, Andrew Morton, linux-mm

On 2018/08/23 20:59, Michal Hocko wrote:
> On Thu 23-08-18 20:30:48, Tetsuo Handa wrote:
>> Commit 93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
>> notifiers") added "continue;" without calling tlb_finish_mmu(). I don't
>> know whether tlb_flush_pending imbalance causes problems other than
>> extra cost, but at least it looks strange.
> 
> tlb_flush_pending has mm scope and it would confuse
> mm_tlb_flush_pending. At least ptep_clear_flush could get confused and
> flush unnecessarily for prot_none entries AFAICS. Other paths shouldn't
> trigger for oom victims. Even ptep_clear_flush is unlikely to happen.
> So nothing really earth shattering but I do agree that it looks weird
> and should be fixed.

OK. But what is the reason we call tlb_gather_mmu() before
mmu_notifier_invalidate_range_start_nonblock() ?
I want that the fix explains why we can't do

-			tlb_gather_mmu(&tlb, mm, start, end);
 			if (mmu_notifier_invalidate_range_start_nonblock(mm, start, end)) {
 				ret = false;
 				continue;
 			}
+			tlb_gather_mmu(&tlb, mm, start, end);

instead.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm, oom: Always call tlb_finish_mmu().
  2018-08-23 13:48   ` Tetsuo Handa
@ 2018-08-23 14:02     ` Michal Hocko
  2018-08-23 14:11       ` [PATCH v2] mm, oom: Fix missing tlb_finish_mmu() in __oom_reap_task_mm() Tetsuo Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2018-08-23 14:02 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: David Rientjes, Andrew Morton, linux-mm

On Thu 23-08-18 22:48:22, Tetsuo Handa wrote:
> On 2018/08/23 20:59, Michal Hocko wrote:
> > On Thu 23-08-18 20:30:48, Tetsuo Handa wrote:
> >> Commit 93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
> >> notifiers") added "continue;" without calling tlb_finish_mmu(). I don't
> >> know whether tlb_flush_pending imbalance causes problems other than
> >> extra cost, but at least it looks strange.
> > 
> > tlb_flush_pending has mm scope and it would confuse
> > mm_tlb_flush_pending. At least ptep_clear_flush could get confused and
> > flush unnecessarily for prot_none entries AFAICS. Other paths shouldn't
> > trigger for oom victims. Even ptep_clear_flush is unlikely to happen.
> > So nothing really earth shattering but I do agree that it looks weird
> > and should be fixed.
> 
> OK. But what is the reason we call tlb_gather_mmu() before
> mmu_notifier_invalidate_range_start_nonblock() ?
> I want that the fix explains why we can't do
> 
> -			tlb_gather_mmu(&tlb, mm, start, end);
>  			if (mmu_notifier_invalidate_range_start_nonblock(mm, start, end)) {
>  				ret = false;
>  				continue;
>  			}
> +			tlb_gather_mmu(&tlb, mm, start, end);

This should be indeed doable because mmu notifiers have no way to know
about tlb_gather. I have no idea why we used to have tlb_gather_mmu like
that before. Most probably a C&P from munmap path where it didn't make
any difference either. A quick check shows that tlb_flush_pending is the
only mm scope thing and none of the notifiers really depend on it.

I would be calmer if both paths were in sync in that regards. So I think
it would be better to go with your previous version first. Maybe it
makes sense to switch the order but I do not really see a huge win for
doing so.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] mm, oom: Fix missing tlb_finish_mmu() in __oom_reap_task_mm().
  2018-08-23 14:02     ` Michal Hocko
@ 2018-08-23 14:11       ` Tetsuo Handa
  2018-08-23 19:23         ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Tetsuo Handa @ 2018-08-23 14:11 UTC (permalink / raw)
  To: Michal Hocko; +Cc: David Rientjes, Andrew Morton, linux-mm

Commit 93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
notifiers") added "continue;" without calling tlb_finish_mmu(). It should
not cause a critical problem but fix anyway because it looks strange.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/oom_kill.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b5b25e4..4f431c1 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -522,6 +522,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
 
 			tlb_gather_mmu(&tlb, mm, start, end);
 			if (mmu_notifier_invalidate_range_start_nonblock(mm, start, end)) {
+				tlb_finish_mmu(&tlb, start, end);
 				ret = false;
 				continue;
 			}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm, oom: Fix missing tlb_finish_mmu() in __oom_reap_task_mm().
  2018-08-23 14:11       ` [PATCH v2] mm, oom: Fix missing tlb_finish_mmu() in __oom_reap_task_mm() Tetsuo Handa
@ 2018-08-23 19:23         ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2018-08-23 19:23 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: David Rientjes, Andrew Morton, linux-mm

On Thu 23-08-18 23:11:26, Tetsuo Handa wrote:
> Commit 93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
> notifiers") added "continue;" without calling tlb_finish_mmu(). It should
> not cause a critical problem but fix anyway because it looks strange.

I would suggest the following wording instead

93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
notifiers") has added an ability to skip over vmas with blockable mmu
notifiers. This however didn't call tlb_finish_mmu as it should. As
a result inc_tlb_flush_pending has been called without its pairing
dec_tlb_flush_pending and all callers mm_tlb_flush_pending would flush
even though this is not really needed. This alone is not harmful and
it seems there shouldn't be any such callers for oom victims at all but
there is no real reason to skip tlb_finish_mmu on early skip either so
call it.

> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

In any case
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/oom_kill.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index b5b25e4..4f431c1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -522,6 +522,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
>  
>  			tlb_gather_mmu(&tlb, mm, start, end);
>  			if (mmu_notifier_invalidate_range_start_nonblock(mm, start, end)) {
> +				tlb_finish_mmu(&tlb, start, end);
>  				ret = false;
>  				continue;
>  			}
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-23 19:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-23 11:30 [PATCH] mm, oom: Always call tlb_finish_mmu() Tetsuo Handa
2018-08-23 11:59 ` Michal Hocko
2018-08-23 13:48   ` Tetsuo Handa
2018-08-23 14:02     ` Michal Hocko
2018-08-23 14:11       ` [PATCH v2] mm, oom: Fix missing tlb_finish_mmu() in __oom_reap_task_mm() Tetsuo Handa
2018-08-23 19:23         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.