linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: kernel-team@fb.com, hannes@cmpxchg.org,
	linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	riel@redhat.com, jack@suse.cz
Subject: Re: [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations
Date: Tue, 4 Dec 2018 14:50:17 -0800	[thread overview]
Message-ID: <20181204145017.62d952c2a209975aa5888acf@linux-foundation.org> (raw)
In-Reply-To: <20181130195812.19536-4-josef@toxicpanda.com>

On Fri, 30 Nov 2018 14:58:11 -0500 Josef Bacik <josef@toxicpanda.com> wrote:

> Currently we only drop the mmap_sem if there is contention on the page
> lock.  The idea is that we issue readahead and then go to lock the page
> while it is under IO and we want to not hold the mmap_sem during the IO.
> 
> The problem with this is the assumption that the readahead does
> anything.  In the case that the box is under extreme memory or IO
> pressure we may end up not reading anything at all for readahead, which
> means we will end up reading in the page under the mmap_sem.

Please describe here why this is considered to be a problem. 
Application stalling, I assume?  Some description of in-the-field
observations would be appropriate.  ie, how serious is the problem
whcih is being addressed.

> Instead rework filemap fault path to drop the mmap sem at any point that
> we may do IO or block for an extended period of time.  This includes
> while issuing readahead, locking the page, or needing to call ->readpage
> because readahead did not occur.  Then once we have a fully uptodate
> page we can return with VM_FAULT_RETRY and come back again to find our
> nicely in-cache page that was gotten outside of the mmap_sem.
> 
> ...
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2304,28 +2304,44 @@ EXPORT_SYMBOL(generic_file_read_iter);
>  
>  #ifdef CONFIG_MMU
>  #define MMAP_LOTSAMISS  (100)
> +static struct file *maybe_unlock_mmap_for_io(struct file *fpin,
> +					     struct vm_area_struct *vma,
> +					     int flags)
> +{
> +	if (fpin)
> +		return fpin;
> +	if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) ==
> +	    FAULT_FLAG_ALLOW_RETRY) {
> +		fpin = get_file(vma->vm_file);
> +		up_read(&vma->vm_mm->mmap_sem);
> +	}
> +	return fpin;
> +}

A code comment would be nice.  What it does and, especially, why it does it.

> -	if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) {
> -		put_page(page);
> -		return ret | VM_FAULT_RETRY;
> +	/*
> +	 * We are open-coding lock_page_or_retry here because we want to do the
> +	 * readpage if necessary while the mmap_sem is dropped.  If there
> +	 * happens to be a lock on the page but it wasn't being faulted in we'd
> +	 * come back around without ALLOW_RETRY set and then have to do the IO
> +	 * under the mmap_sem, which would be a bummer.

Expanding on "a bummer" would help here ;)

> +	 */
> +	if (!trylock_page(page)) {
> +		fpin = maybe_unlock_mmap_for_io(fpin, vmf->vma, vmf->flags);
> +		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> +			goto out_retry;
> +		if (vmf->flags & FAULT_FLAG_KILLABLE) {
> +			if (__lock_page_killable(page)) {
> +				/*
> +				 * If we don't have the right flags for
> +				 * maybe_unlock_mmap_for_io to do it's thing we

"its"

> +				 * still need to drop the sem and return
> +				 * VM_FAULT_RETRY so the upper layer checks the
> +				 * signal and takes the appropriate action.
> +				 */
> +				if (!fpin)
> +					up_read(&vmf->vma->vm_mm->mmap_sem);
> +				goto out_retry;
> +			}
> +		} else
> +			__lock_page(page);
>  	}
>  
>  	/* Did it get truncated? */


  reply	other threads:[~2018-12-04 22:50 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-30 19:58 [PATCH 0/4][V4] drop the mmap_sem when doing IO in the fault path Josef Bacik
2018-11-30 19:58 ` [PATCH 1/4] mm: infrastructure for page fault page caching Josef Bacik
2018-12-04 22:49   ` Andrew Morton
2018-11-30 19:58 ` [PATCH 2/4] filemap: kill page_cache_read usage in filemap_fault Josef Bacik
2018-12-05 21:52   ` Johannes Weiner
2018-12-07  9:57   ` Jan Kara
2018-12-07 10:37     ` Jan Kara
2018-11-30 19:58 ` [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations Josef Bacik
2018-12-04 22:50   ` Andrew Morton [this message]
2018-12-05 22:23   ` Johannes Weiner
2018-12-07 11:01   ` Jan Kara
2018-12-10 18:44     ` Josef Bacik
2018-12-11  9:40       ` Jan Kara
2018-12-11 16:08         ` Josef Bacik
2018-12-11 16:38           ` Jan Kara
2018-11-30 19:58 ` [PATCH 4/4] mm: use the cached page for filemap_fault Josef Bacik
2018-12-04 22:50   ` Andrew Morton
2018-12-05 14:58     ` Josef Bacik
2018-12-07 11:03       ` Jan Kara
2018-12-04 22:49 ` [PATCH 0/4][V4] drop the mmap_sem when doing IO in the fault path Andrew Morton
2018-12-06 22:24   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181204145017.62d952c2a209975aa5888acf@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).