All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	viro@zeniv.linux.org.uk, akpm@linux-foundation.org,
	kirill.shutemov@linux.intel.com, riel@redhat.com
Subject: Re: [PATCH] mm: Introduce i_mmap_lock_write_killable().
Date: Wed, 28 Mar 2018 21:26:43 +0900	[thread overview]
Message-ID: <201803282126.GBC56799.tOFVFSOHJLFOQM@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20180328110513.GH9275@dhcp22.suse.cz>

Michal Hocko wrote:
> > > I am not saying this is wrong, I would have to think about that much
> > > more because mmap_sem tends to be used on many surprising places and the
> > > write lock just hide them all.
> > 
> > Then, an alternative approach which interrupts without downgrading is shown
> > below. But I'm not sure.
> 
> Failing the whole dup_mmap might be quite reasonable, yes. I haven't
> checked your particular patch because this code path needs much more
> time than I can give this, though.

I think that interrupting at

diff --git a/kernel/fork.c b/kernel/fork.c
index 1e8c9a7..851c675 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -514,6 +514,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		if (tmp->vm_ops && tmp->vm_ops->open)
 			tmp->vm_ops->open(tmp);
 
+		if (!retval && fatal_signal_pending(current))
+			retval = -EINTR;
+
 		if (retval)
 			goto out;
 	}
-- 

is safe because there is no difference (except the error code) between above
change and hitting "goto fail_nomem;" path after "mpnt = mpnt->vm_next;".

Therefore, I think that interrupting at

diff --git a/kernel/fork.c b/kernel/fork.c
index 851c675..2706acc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -508,7 +508,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		rb_parent = &tmp->vm_rb;
 
 		mm->map_count++;
-		if (!(tmp->vm_flags & VM_WIPEONFORK))
+		if (fatal_signal_pending(current))
+			retval = -EINTR;
+		else if (!(tmp->vm_flags & VM_WIPEONFORK))
 			retval = copy_page_range(mm, oldmm, mpnt);
 
 		if (tmp->vm_ops && tmp->vm_ops->open)
-- 

is also safe because handling of copy_page_range() failure is already
scheduled by mmput().

Thus, I think that there are locations where it is known to be safely and consistently
interruptible inside "for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next)" loop.

> On Wed 28-03-18 19:23:20, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 27-03-18 20:19:30, Tetsuo Handa wrote:
> > > > If the OOM victim is holding mm->mmap_sem held for write, and if the OOM
> > > > victim can interrupt operations which need mm->mmap_sem held for write,
> > > > we can downgrade mm->mmap_sem upon SIGKILL and the OOM reaper will be
> > > > able to reap the OOM victim's memory.
> > > 
> > > This really begs for much better explanation. Why is it safe?
> > 
> > Basic idea is
> > 
> >   bool downgraded = false;
> >   down_write(mmap_sem);
> >   for (something_1_that_might_depend_mmap_sem_held_for_write;
> >        something_2_that_might_depend_mmap_sem_held_for_write;
> >        something_3_that_might_depend_mmap_sem_held_for_write) {
> >      something_4_that_might_depend_mmap_sem_held_for_write();
> >      if (fatal_signal_pending(current)) {
> >         downgrade_write(mmap_sem);
> >         downgraded = true;
> >         break;
> >      }
> >      something_5_that_might_depend_mmap_sem_held_for_write();
> >   }
> >   if (!downgraded)
> >     up_write(mmap_sem);
> >   else
> >     up_read(mmap_sem);
> > 
> > . That is, try to interrupt critical sections at locations where it is
> > known to be safe and consistent.
> 
> Please explain why those places are safe to interrupt.

Because (regarding the downgrade_write() approach), as far as I know,
the current thread does not access memory which needs to be protected with
mmap_sem held for write.

> 
> > >                                                               Are you
> > > assuming that the killed task will not perform any changes on the
> > > address space?
> > 
> > If somebody drops mmap_sem held for write is not safe, how can the OOM
> > reaper work safely?
> > 
> > The OOM reaper is assuming that the thread who got mmap_sem held for write
> > is responsible to complete critical sections before dropping mmap_sem held
> > for write, isn't it?
> > 
> > Then, how an attempt to perform changes on the address space can become a
> > problem given that the thread who got mmap_sem held for write is responsible
> > to complete critical sections before dropping mmap_sem held for write?
> 
> ENOPARSE. How does this have anything to do with oom_reaper.

The oom_reaper can work safely as long as mmap_sem held for write is released
at safely and consistently interruptible locations.

>                                                              Sure you
> want to _help_ the oom_reaper to do its job but you are dropping the
> lock in the downgrading the lock in the middle of dup_mmap and that is
> what we are dicussing here.

Yes.

>                             So please explain why it is safe. It is
> really not straightforward.

So, please explain why it is not safe. How can releasing mmap_sem held for
write at safely and consistently interruptible locations be not safe?

> 
> > >                What about ongoing page faults or other operations deeper
> > > in the call chain.
> > 
> > Even if there are ongoing page faults or other operations deeper in the call
> > chain, there should be no problem as long as the thread who got mmap_sem
> > held for write is responsible to complete critical sections before dropping
> > mmap_sem held for write.
> > 
> > >                    Why they are safe to change things for the child
> > > during the copy?
> > 
> > In this patch, the current thread who got mmap_sem held for write (which is
> > likely an OOM victim thread) downgrades mmap_sem, with an assumption that
> > current thread no longer accesses memory which might depend on mmap_sem held
> > for write.
> > 
> > dup_mmap() duplicates current->mm and to-be-duplicated mm is not visible yet.
> > If dup_mmap() failed, to-be-duplicated incomplete mm is discarded via mmput()
> > in dup_mm() rather than assigned to the child. Thus, this patch should not
> > change things which are visible to the child during the copy.
> > 
> > What we need to be careful is making changes to current->mm.
> > I'm assuming that current->mm->mmap_sem held for read is enough for
> > i_mmap_lock_write()/flush_dcache_mmap_lock()/vma_interval_tree_insert_after()/
> > flush_dcache_mmap_unlock()/i_mmap_unlock_write()/is_vm_hugetlb_page()/
> > reset_vma_resv_huge_pages()/__vma_link_rb(). But I'm not sure.
> 
> But as soon as you downgrade the lock then all other threads can
> interfere and perform page faults or update respecive mappings. Does
> this matter? If not then why?
> 

Why does this matter?

I don't know what "update respecive mappings" means.
Is that about mmap()/munmap() which need mmap_sem held for write?
Since mmap_sem is still held for read, operations which needs
mmap_sem held for write cannot happen.

Anyway, as long as I downgrade the mmap_sem at safely and consistently
interruptible locations, there cannot be a problem.

  reply	other threads:[~2018-03-28 12:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-27 11:19 [PATCH] mm: Introduce i_mmap_lock_write_killable() Tetsuo Handa
2018-03-27 14:52 ` Michal Hocko
2018-03-28 10:23   ` Tetsuo Handa
2018-03-28 11:05     ` Michal Hocko
2018-03-28 12:26       ` Tetsuo Handa [this message]
2018-03-28 12:39         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201803282126.GBC56799.tOFVFSOHJLFOQM@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=riel@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.