linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Davidlohr Bueso <dave@stgolabs.net>
Subject: Re: [PATCH] mm/hugetlb: Unmap pages if page fault raced with hole punch
Date: Fri, 30 Oct 2015 13:56:36 -0700	[thread overview]
Message-ID: <5633D984.7080307@oracle.com> (raw)
In-Reply-To: <56339EBA.4070508@oracle.com>

On 10/30/2015 09:45 AM, Mike Kravetz wrote:
> On 10/29/2015 08:32 PM, Hugh Dickins wrote:
>> On Thu, 29 Oct 2015, Mike Kravetz wrote:
>>
>>> This patch is a combination of:
>>> [PATCH v2 4/4] mm/hugetlb: Unmap pages to remove if page fault raced
>>> 	with hole punch  and,
>>> [PATCH] mm/hugetlb: i_mmap_lock_write before unmapping in
>>> 	remove_inode_hugepages
>>> This patch can replace the entire series:
>>> [PATCH v2 0/4] hugetlbfs fallocate hole punch race with page faults
>>> 	and
>>> [PATCH] mm/hugetlb: i_mmap_lock_write before unmapping in
>>> 	remove_inode_hugepages
>>> It is being provided in an effort to possibly make tree management easier.
>>>
>>> Page faults can race with fallocate hole punch.  If a page fault happens
>>> between the unmap and remove operations, the page is not removed and
>>> remains within the hole.  This is not the desired behavior.
>>>
>>> If this race is detected and a page is mapped, the remove operation
>>> (remove_inode_hugepages) will unmap the page before removing.  The unmap
>>> within remove_inode_hugepages occurs with the hugetlb_fault_mutex held
>>> so that no other faults can occur until the page is removed.
>>>
>>> The (unmodified) routine hugetlb_vmdelete_list was moved ahead of
>>> remove_inode_hugepages to satisfy the new reference.
>>>
>>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>>
>> Sorry, I came here to give this a quick Ack, but find I cannot:
>> you're adding to the remove_inode_hugepages() loop (heading towards
>> 4.3 final), but its use of "next" looks wrong to me already.
> 
> You are correct, the (current) code is wrong.
> 
> The hugetlbfs fallocate code started with shmem as an example.  Some
> of the complexities of that code are not needed in hugetlbfs.  However,
> some remnants were left.
> 
> I'll create a patch to fix the existing code, then when that is acceptable
> refactor this patch.
> 
>>
>> Doesn't "next" need to be assigned from page->index much earlier?
>> If there's a hole in the file (which there very well might be, since
>> you've just implemented holepunch!), doesn't it do the wrong thing?
> 
> Yes, I think it will.
> 
>>
>> And the loop itself is a bit weird, though that probably doesn't
>> matter very much: I said before, seeing the "while (next < end)",
>> that it's a straightforward scan from start to end, and sometimes
>> it would work that way; but buried inside is "next = start; continue;"
> 
> Correct, that next = start should not be there.

The 'next = start' code is actually from the original truncate_hugepages
routine.  This functionality was combined with that needed for hole punch
to create remove_inode_hugepages().

The following code was in truncate_hugepages:

	next = start;
	while (1) {
		if (!pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
			if (next == start)
				break;
			next = start;
			continue;
		}


So, in the truncate case pages starting at 'start' are deleted until
pagevec_lookup fails.  Then, we call pagevec_lookup() again.  If no
pages are found we are done.  Else, we repeat the whole process.

Does anyone recall the reason for going back and looking for pages at
index'es already deleted?  Git doesn't help as that was part of initial
commit.  My thought is that truncate can race with page faults.  The
truncate code sets inode offset before unmapping and deleting pages.
So, faults after the new offset is set should fail.  But, I suppose a
fault could race with setting offset and deleting of pages.  Does this
sound right?  Or, is there some other reason I am missing?

I would like to continue having remove_inode_hugepages handle both the
truncate and hole punch case.  So, what to make sure the code correctly
handles both cases.

-- 
Mike Kravetz

  reply	other threads:[~2015-10-30 20:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-29 22:33 [PATCH] mm/hugetlb: Unmap pages if page fault raced with hole punch Mike Kravetz
2015-10-30  3:32 ` Hugh Dickins
2015-10-30 16:45   ` Mike Kravetz
2015-10-30 20:56     ` Mike Kravetz [this message]
2015-11-09  7:42       ` Hugh Dickins
2015-11-09 22:55         ` Mike Kravetz
2015-11-10 22:41           ` Mike Kravetz
2015-11-14  0:36             ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5633D984.7080307@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).