All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>, Hugh Dickins <hughd@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH RFC 0/1] hugetlbfs: fix truncate/fault races
Date: Sun,  7 Oct 2018 16:38:47 -0700	[thread overview]
Message-ID: <20181007233848.13397-1-mike.kravetz@oracle.com> (raw)

Our DB team noticed negative hugetlb reserved page counts during development
testing.  Related meminfo fields were as follows on one system:

HugePages_Total:   47143
HugePages_Free:    45610
HugePages_Rsvd:    18446744073709551613
HugePages_Surp:        0
Hugepagesize:       2048 kB 

Code inspection revealed that the most likely cause were races with truncate
and page faults.  In fact, I could write a not too complicated program to
cause the races and recreate the issue.

Way back in 2006, Hugh Dickins created a patch (ebed4bfc8da8) with this
message:

"[PATCH] hugetlb: fix absurd HugePages_Rsvd
    
 If you truncated an mmap'ed hugetlbfs file, then faulted on the truncated
 area, /proc/meminfo's HugePages_Rsvd wrapped hugely "negative".  Reinstate my
 preliminary i_size check before attempting to allocate the page (though this
 only fixes the most obvious case: more work will be needed here)."

Looks like we need to do more work.

While looking at the code, there were many issues to correctly handle racing
and back out changes partially made.  Instead, why not just introduce a
rw mutex to prevent the races.  Page faults would take the mutex in read mode
to allow multiple faults in parallel as it works today.  Truncate code would
take the mutex in write mode and prevent faults for the duration of truncate
processing.  This seems almost too obvious.  Something must be wrong with this
approach, or others would have employed it earlier.

The following patch describes the current race in detail and adds the mutex
to prevent truncate/fault races.

Mike Kravetz (1):
  hugetlbfs: introduce truncation/fault mutex to avoid races

 fs/hugetlbfs/inode.c    | 24 ++++++++++++++++++++----
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c            | 25 +++++++++++++++++++------
 mm/userfaultfd.c        |  8 +++++++-
 4 files changed, 47 insertions(+), 11 deletions(-)

-- 
2.17.1


             reply	other threads:[~2018-10-07 23:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-07 23:38 Mike Kravetz [this message]
2018-10-07 23:38 ` [PATCH RFC 1/1] hugetlbfs: introduce truncation/fault mutex to avoid races Mike Kravetz
2018-10-08  8:03   ` Kirill A. Shutemov
2018-10-09  0:20     ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181007233848.13397-1-mike.kravetz@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dave@stgolabs.net \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.