linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>,
	adobriyan@gmail.com, willy@infradead.org, mguzik@redhat.com,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [v3 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
Date: Thu, 12 Apr 2018 14:18:01 +0200	[thread overview]
Message-ID: <20180412121801.GE23400@dhcp22.suse.cz> (raw)
In-Reply-To: <8c19f1fb-7baf-fef3-032d-4e93cfc63932@linux.alibaba.com>

On Tue 10-04-18 11:28:13, Yang Shi wrote:
> 
> 
> On 4/10/18 9:21 AM, Yang Shi wrote:
> > 
> > 
> > On 4/10/18 5:28 AM, Cyrill Gorcunov wrote:
> > > On Tue, Apr 10, 2018 at 01:10:01PM +0200, Michal Hocko wrote:
> > > > > Because do_brk does vma manipulations, for this reason it's
> > > > > running under down_write_killable(&mm->mmap_sem). Or you
> > > > > mean something else?
> > > > Yes, all we need the new lock for is to get a consistent view on brk
> > > > values. I am simply asking whether there is something fundamentally
> > > > wrong by doing the update inside the new lock while keeping the
> > > > original
> > > > mmap_sem locking in the brk path. That would allow us to drop the
> > > > mmap_sem lock in the proc path when looking at brk values.
> > > Michal gimme some time. I guess  we might do so, but I need some
> > > spare time to take more precise look into the code, hopefully today
> > > evening. Also I've a suspicion that we've wracked check_data_rlimit
> > > with this new lock in prctl. Need to verify it again.
> > 
> > I see you guys points. We might be able to move the drop of mmap_sem
> > before setting mm->brk in sys_brk since mmap_sem should be used to
> > protect vma manipulation only, then protect the value modify with the
> > new arg_lock. Then we can eliminate mmap_sem stuff in prctl path, and it
> > also prevents from wrecking check_data_rlimit.
> > 
> > At the first glance, it looks feasible to me. Will look into deeper
> > later.
> 
> A further look told me this might be *not* feasible.
> 
> It looks the new lock will not break check_data_rlimit since in my patch
> both start_brk and brk is protected by mmap_sem. The code flow might look
> like below:
> 
> CPU A                             CPU B
> --------                       --------
> prctl                               sys_brk
>                                       down_write
> check_data_rlimit           check_data_rlimit (need mm->start_brk)
>                                       set brk
> down_write                    up_write
> set start_brk
> set brk
> up_write
> 
> 
> If CPU A gets the mmap_sem first, it will set start_brk and brk, then CPU B
> will check with the new start_brk. And, prctl doesn't care if sys_brk is run
> before it since it gets the new start_brk and brk from parameter.
> 
> If we protect start_brk and brk with the new lock, sys_brk might get old
> start_brk, then sys_brk might break rlimit check silently, is that right?
> 
> So, it looks using new lock in prctl and keeping mmap_sem in brk path has
> race condition.

OK, I've admittedly didn't give it too much time to think about. Maybe
we do something clever to remove the race but can we start at least by
reducing the write lock to read on prctl side and use the dedicated
spinlock for updating values? That should close the above race AFAICS
and the read lock would be much more friendly to other VM operations.

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2018-04-12 12:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-09 21:52 [v3 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct Yang Shi
2018-04-10  8:48 ` Cyrill Gorcunov
2018-04-10  9:09 ` Michal Hocko
2018-04-10  9:40   ` Cyrill Gorcunov
2018-04-10 10:42     ` Michal Hocko
2018-04-10 11:02       ` Cyrill Gorcunov
2018-04-10 11:10         ` Michal Hocko
2018-04-10 12:28           ` Cyrill Gorcunov
2018-04-10 16:21             ` Yang Shi
2018-04-10 18:28               ` Yang Shi
2018-04-10 19:17                 ` Cyrill Gorcunov
2018-04-10 19:33                   ` Yang Shi
2018-04-10 20:06                     ` Cyrill Gorcunov
2018-04-12 12:18                 ` Michal Hocko [this message]
2018-04-12 16:20                   ` Yang Shi
2018-04-13  6:56                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180412121801.GE23400@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gorcunov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mguzik@redhat.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).