Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Pengfei Li <fly@kernel.page>
Cc: akpm@linux-foundation.org, bmt@zurich.ibm.com,
	dledford@redhat.com,  willy@infradead.org, vbabka@suse.cz,
	kirill.shutemov@linux.intel.com,  jgg@ziepe.ca,
	alex.williamson@redhat.com, cohuck@redhat.com,
	 daniel.m.jordan@oracle.com, dbueso@suse.de, jglisse@redhat.com,
	 jhubbard@nvidia.com, ldufour@linux.ibm.com,
	Liam.Howlett@oracle.com,  peterz@infradead.org, cl@linux.com,
	jack@suse.cz, rientjes@google.com,  walken@google.com,
	hughd@google.com, kvm@vger.kernel.org,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 2/2] mm, util: account_locked_vm() does not hold mmap_lock
Date: Wed, 29 Jul 2020 12:21:11 -0700 (PDT)
Message-ID: <alpine.LSU.2.11.2007291121280.4649@eggly.anvils> (raw)
In-Reply-To: <20200726080224.205470-2-fly@kernel.page>

On Sun, 26 Jul 2020, Pengfei Li wrote:

> Since mm->locked_vm is already an atomic counter, account_locked_vm()
> does not need to hold mmap_lock.

I am worried that this patch, already added to mmotm, along with its
1/2 making locked_vm an atomic64, might be rushed into v5.9 with just
that two-line commit description, and no discussion at all.

locked_vm belongs fundamentally to mm/mlock.c, and the lock to guard
it is mmap_lock; and mlock() has some complicated stuff to do under
that lock while it decides how to adjust locked_vm.

It is very easy to convert an unsigned long to an atomic64_t, but
"atomic read, check limit and do stuff, atomic add" does not give
the same guarantee as holding the right lock around it all.

(At the very least, __account_locked_vm() in 1/2 should be changed to
replace its atomic64_add by an atomic64_cmpxchg, to enforce the limit
that it just checked.  But that will be no more than lipstick on a pig,
when the right lock that everyone else agrees upon is not being held.)

Now, it can be argued that our locked_vm and pinned_vm maintenance
is so random and deficient, and too difficult to keep right across
a sprawl of drivers, that we should just be grateful for those that
do volunteer to subject themselves to RLIMIT_MEMLOCK limitation,
and never mind if it's a little racy.

And it may well be that all those who have made considerable efforts
in the past to improve the situation, have more interesting things to
devote their time to, and would prefer not to get dragged back here.

But let's at least give this a little more visibility, and hope
to hear opinions one way or the other from those who care.

Hugh

> 
> Signed-off-by: Pengfei Li <fly@kernel.page>
> ---
>  drivers/vfio/vfio_iommu_type1.c |  8 ++------
>  mm/util.c                       | 15 +++------------
>  2 files changed, 5 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 78013be07fe7..53818fce78a6 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -376,12 +376,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>  	if (!mm)
>  		return -ESRCH; /* process exited */
>  
> -	ret = mmap_write_lock_killable(mm);
> -	if (!ret) {
> -		ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task,
> -					  dma->lock_cap);
> -		mmap_write_unlock(mm);
> -	}
> +	ret = __account_locked_vm(mm, abs(npage), npage > 0,
> +					dma->task, dma->lock_cap);
>  
>  	if (async)
>  		mmput(mm);
> diff --git a/mm/util.c b/mm/util.c
> index 473add0dc275..320fdd537aea 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -424,8 +424,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
>   * @task:        task used to check RLIMIT_MEMLOCK
>   * @bypass_rlim: %true if checking RLIMIT_MEMLOCK should be skipped
>   *
> - * Assumes @task and @mm are valid (i.e. at least one reference on each), and
> - * that mmap_lock is held as writer.
> + * Assumes @task and @mm are valid (i.e. at least one reference on each).
>   *
>   * Return:
>   * * 0       on success
> @@ -437,8 +436,6 @@ int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>  	unsigned long locked_vm, limit;
>  	int ret = 0;
>  
> -	mmap_assert_write_locked(mm);
> -
>  	locked_vm = atomic64_read(&mm->locked_vm);
>  	if (inc) {
>  		if (!bypass_rlim) {
> @@ -476,17 +473,11 @@ EXPORT_SYMBOL_GPL(__account_locked_vm);
>   */
>  int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc)
>  {
> -	int ret;
> -
>  	if (pages == 0 || !mm)
>  		return 0;
>  
> -	mmap_write_lock(mm);
> -	ret = __account_locked_vm(mm, pages, inc, current,
> -				  capable(CAP_IPC_LOCK));
> -	mmap_write_unlock(mm);
> -
> -	return ret;
> +	return __account_locked_vm(mm, pages, inc,
> +					current, capable(CAP_IPC_LOCK));
>  }
>  EXPORT_SYMBOL_GPL(account_locked_vm);
>  
> -- 
> 2.26.2


  reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-26  8:02 [PATCH 1/2] mm: make mm->locked_vm an atomic64 counter Pengfei Li
2020-07-26  8:02 ` [PATCH 2/2] mm, util: account_locked_vm() does not hold mmap_lock Pengfei Li
2020-07-29 19:21   ` Hugh Dickins [this message]
2020-07-30 20:57     ` Daniel Jordan
2020-08-02 11:23       ` Pengfei Li
2020-08-02 11:07     ` Pengfei Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2007291121280.4649@eggly.anvils \
    --to=hughd@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=bmt@zurich.ibm.com \
    --cc=cl@linux.com \
    --cc=cohuck@redhat.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dbueso@suse.de \
    --cc=dledford@redhat.com \
    --cc=fly@kernel.page \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=ldufour@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git