linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Michal Hocko <mhocko@kernel.org>, Hugh Dickins <hughd@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Prakash Sangappa <prakash.sangappa@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	stable@vger.kernel.org
Subject: Re: [PATCH 2/3] hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
Date: Mon, 17 Dec 2018 10:42:17 -0800	[thread overview]
Message-ID: <f6fd9491-4b3d-16ca-f606-025c78756936@oracle.com> (raw)
In-Reply-To: <27f8893b-57b3-088d-2d48-9e8acc5987bd@linux.ibm.com>

On 12/17/18 2:25 AM, Aneesh Kumar K.V wrote:
> On 12/4/18 1:38 AM, Mike Kravetz wrote:
>> hugetlbfs page faults can race with truncate and hole punch operations.
>> Current code in the page fault path attempts to handle this by 'backing
>> out' operations if we encounter the race.  One obvious omission in the
>> current code is removing a page newly added to the page cache.  This is
>> pretty straight forward to address, but there is a more subtle and
>> difficult issue of backing out hugetlb reservations.  To handle this
>> correctly, the 'reservation state' before page allocation needs to be
>> noted so that it can be properly backed out.  There are four distinct
>> possibilities for reservation state: shared/reserved, shared/no-resv,
>> private/reserved and private/no-resv.  Backing out a reservation may
>> require memory allocation which could fail so that needs to be taken
>> into account as well.
>>
>> Instead of writing the required complicated code for this rare
>> occurrence, just eliminate the race.  i_mmap_rwsem is now held in read
>> mode for the duration of page fault processing.  Hold i_mmap_rwsem
>> longer in truncation and hold punch code to cover the call to
>> remove_inode_hugepages.
>>
>> Cc: <stable@vger.kernel.org>
>> Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>   fs/hugetlbfs/inode.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 32920a10100e..3244147fc42b 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -505,8 +505,8 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t
>> offset)
>>       i_mmap_lock_write(mapping);
>>       if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
>>           hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
>> -    i_mmap_unlock_write(mapping);
>>       remove_inode_hugepages(inode, offset, LLONG_MAX);
>> +    i_mmap_unlock_write(mapping);
>>       return 0;
>>   }
> 
> 
> We used to do remove_inode_hugepages()
> 
>     mutex_lock(&hugetlb_fault_mutex_table[hash]);
>     i_mmap_lock_write(mapping);
>     hugetlb_vmdelete_list(&mapping->i_mmap,
>     i_mmap_unlock_write(mapping);
> 
> did we change the lock ordering with this patch?

Thanks for taking a look.

Yes, we did take locks in that order in the 'if (unlikely(page_mapped(page)))'
case within remove_inode_hugepages.  That ordering was important as the
fault_mutex prevented faults while unmapping the page in all potential
mappings.

With the change above, we will be holding i_mmap_rwsem in write mode while
calling remove_inode_hugepages.  The page fault code (modified in previous
patch) acquires i_mmap_rwsem in read mode.  Therefore, no page faults can
occur and, that 'if (unlikely(page_mapped(page)))' case within
remove_inode_hugepages will never happen.  The now dead code is removed in
the subsequent patch.

As you suggested in a comment to the subsequent patch, it would be better to
combine the patches and remove the dead code when it becomes dead.  I will
work on that.  Actually some of the code in patch 3 applies to patch 1 and
some applies to patch 2.  So, it will not be simply combining patch 2 and 3.

-- 
Mike Kravetz

  reply	other threads:[~2018-12-17 18:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 20:08 [PATCH 0/3] hugetlbfs: use i_mmap_rwsem for better synchronization Mike Kravetz
2018-12-03 20:08 ` [PATCH 1/3] hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization Mike Kravetz
2018-12-04 13:54   ` Sasha Levin
2018-12-03 20:08 ` [PATCH 2/3] hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race Mike Kravetz
2018-12-04 13:54   ` Sasha Levin
2018-12-17 10:25   ` Aneesh Kumar K.V
2018-12-17 18:42     ` Mike Kravetz [this message]
2018-12-18  0:17       ` Mike Kravetz
2018-12-18 22:10         ` Andrew Morton
2018-12-18 22:34           ` Mike Kravetz
2019-06-14 21:56   ` Sasha Levin
2019-06-14 23:33     ` Mike Kravetz
2019-06-15 22:38       ` Sasha Levin
2018-12-03 20:08 ` [PATCH 3/3] hugetlbfs: remove unnecessary code after i_mmap_rwsem synchronization Mike Kravetz
2018-12-04 13:54   ` Sasha Levin
2018-12-17 10:34   ` Aneesh Kumar K.V
2018-12-14 21:22 ` [PATCH 0/3] hugetlbfs: use i_mmap_rwsem for better synchronization Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6fd9491-4b3d-16ca-f606-025c78756936@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dave@stgolabs.net \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=prakash.sangappa@oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).