All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: kernel-team@fb.com, hannes@cmpxchg.org,
	linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	riel@redhat.com, jack@suse.cz
Subject: Re: [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations
Date: Tue, 4 Dec 2018 14:50:17 -0800	[thread overview]
Message-ID: <20181204145017.62d952c2a209975aa5888acf@linux-foundation.org> (raw)
In-Reply-To: <20181130195812.19536-4-josef@toxicpanda.com>

On Fri, 30 Nov 2018 14:58:11 -0500 Josef Bacik <josef@toxicpanda.com> wrote:

> Currently we only drop the mmap_sem if there is contention on the page
> lock.  The idea is that we issue readahead and then go to lock the page
> while it is under IO and we want to not hold the mmap_sem during the IO.
> 
> The problem with this is the assumption that the readahead does
> anything.  In the case that the box is under extreme memory or IO
> pressure we may end up not reading anything at all for readahead, which
> means we will end up reading in the page under the mmap_sem.

Please describe here why this is considered to be a problem. 
Application stalling, I assume?  Some description of in-the-field
observations would be appropriate.  ie, how serious is the problem
whcih is being addressed.

> Instead rework filemap fault path to drop the mmap sem at any point that
> we may do IO or block for an extended period of time.  This includes
> while issuing readahead, locking the page, or needing to call ->readpage
> because readahead did not occur.  Then once we have a fully uptodate
> page we can return with VM_FAULT_RETRY and come back again to find our
> nicely in-cache page that was gotten outside of the mmap_sem.
> 
> ...
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2304,28 +2304,44 @@ EXPORT_SYMBOL(generic_file_read_iter);
>  
>  #ifdef CONFIG_MMU
>  #define MMAP_LOTSAMISS  (100)
> +static struct file *maybe_unlock_mmap_for_io(struct file *fpin,
> +					     struct vm_area_struct *vma,
> +					     int flags)
> +{
> +	if (fpin)
> +		return fpin;
> +	if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) ==
> +	    FAULT_FLAG_ALLOW_RETRY) {
> +		fpin = get_file(vma->vm_file);
> +		up_read(&vma->vm_mm->mmap_sem);
> +	}
> +	return fpin;
> +}

A code comment would be nice.  What it does and, especially, why it does it.

> -	if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) {
> -		put_page(page);
> -		return ret | VM_FAULT_RETRY;
> +	/*
> +	 * We are open-coding lock_page_or_retry here because we want to do the
> +	 * readpage if necessary while the mmap_sem is dropped.  If there
> +	 * happens to be a lock on the page but it wasn't being faulted in we'd
> +	 * come back around without ALLOW_RETRY set and then have to do the IO
> +	 * under the mmap_sem, which would be a bummer.

Expanding on "a bummer" would help here ;)

> +	 */
> +	if (!trylock_page(page)) {
> +		fpin = maybe_unlock_mmap_for_io(fpin, vmf->vma, vmf->flags);
> +		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> +			goto out_retry;
> +		if (vmf->flags & FAULT_FLAG_KILLABLE) {
> +			if (__lock_page_killable(page)) {
> +				/*
> +				 * If we don't have the right flags for
> +				 * maybe_unlock_mmap_for_io to do it's thing we

"its"

> +				 * still need to drop the sem and return
> +				 * VM_FAULT_RETRY so the upper layer checks the
> +				 * signal and takes the appropriate action.
> +				 */
> +				if (!fpin)
> +					up_read(&vmf->vma->vm_mm->mmap_sem);
> +				goto out_retry;
> +			}
> +		} else
> +			__lock_page(page);
>  	}
>  
>  	/* Did it get truncated? */


  reply	other threads:[~2018-12-04 22:50 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-30 19:58 [PATCH 0/4][V4] drop the mmap_sem when doing IO in the fault path Josef Bacik
2018-11-30 19:58 ` [PATCH 1/4] mm: infrastructure for page fault page caching Josef Bacik
2018-12-04 22:49   ` Andrew Morton
2018-11-30 19:58 ` [PATCH 2/4] filemap: kill page_cache_read usage in filemap_fault Josef Bacik
2018-12-05 21:52   ` Johannes Weiner
2018-12-07  9:57   ` Jan Kara
2018-12-07 10:37     ` Jan Kara
2018-11-30 19:58 ` [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations Josef Bacik
2018-12-04 22:50   ` Andrew Morton [this message]
2018-12-05 22:23   ` Johannes Weiner
2018-12-07 11:01   ` Jan Kara
2018-12-10 18:44     ` Josef Bacik
2018-12-11  9:40       ` Jan Kara
2018-12-11 16:08         ` Josef Bacik
2018-12-11 16:38           ` Jan Kara
2018-11-30 19:58 ` [PATCH 4/4] mm: use the cached page for filemap_fault Josef Bacik
2018-12-04 22:50   ` Andrew Morton
2018-12-05 14:58     ` Josef Bacik
2018-12-07 11:03       ` Jan Kara
2018-12-04 22:49 ` [PATCH 0/4][V4] drop the mmap_sem when doing IO in the fault path Andrew Morton
2018-12-06 22:24   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181204145017.62d952c2a209975aa5888acf@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.