linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Prakash Sangappa <prakash.sangappa@oracle.com>
To: Dave Hansen <dave.hansen@intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH RFC] hugetlbfs 'noautofill' mount option
Date: Wed, 3 May 2017 12:02:59 -0700	[thread overview]
Message-ID: <7677d20e-5d53-1fb7-5dac-425edda70b7b@oracle.com> (raw)
In-Reply-To: <22557bf3-14bb-de02-7b1b-a79873c583f1@intel.com>

On 5/2/17 4:43 PM, Dave Hansen wrote:

> On 05/02/2017 04:34 PM, Prakash Sangappa wrote:
>> Similarly, a madvise() option also requires additional system call by every
>> process mapping the file, this is considered a overhead for the database.
> How long-lived are these processes?  For a database, I'd assume that
> this would happen a single time, or a single time per mmap() at process
> startup time.  Such a syscall would be doing something on the order of
> taking mmap_sem, walking the VMA tree, setting a bit per VMA, and
> unlocking.  That's a pretty cheap one-time cost...
Plus a call into the filesystem (a_ops?) to check if the underlying 
filesystem
supports not filling holes to mapped access before setting the bit per vma.
Although the overhead may not be that bad.

Database processes can exit and new once started, for instance, depending on
database activity.


>> If we do consider a new madvise() option, will it be acceptable
>> since this will be specifically for hugetlbfs file mappings?
> Ideally, it would be something that is *not* specifically for hugetlbfs.
>   MADV_NOAUTOFILL, for instance, could be defined to SIGSEGV whenever
> memory is touched that was not populated with MADV_WILLNEED, mlock(), etc...

If this is a generic advice type, necessary support will have to be 
implemented
in various filesystems which can support this.

The proposed behavior for 'noautofill' was to not fill holes in 
files(like sparse files).
In the page fault path, mm would not know if the mmapped address on which
the fault occurred, is over a hole in the file or just that the page is 
not available
in the page cache. The underlying filesystem would be called and it 
determines
if it is a hole and that is where it would fail and not fill the hole, 
if this support is added.
Normally, filesystem which support sparse files(holes in file) 
automatically fill the hole
when accessed. Then there is the issue of file system block size and 
page size. If the
block sizes are smaller then page size, it could mean the noautofill 
would only work
if the hole size is equal to  or a multiple of, page size?

In case of hugetlbfs it is much straight forward. Since this filesystem 
is not like a normal
filesystems and and the file sizes are multiple of huge pages. The hole 
will be a multiple
of the huge page size. For this reason then should the advise be 
specific to hugetlbfs?


>
>> If so,
>> would a new flag to mmap() call itself be acceptable, which would
>> define the proposed behavior?. That way no additional system calls
>> need to be made.
> I don't feel super strongly about it, but I guess an mmap() flag could
> work too.
>

Same goes with the mmap call, if it is a generic flag.

-Prakash.

  reply	other threads:[~2017-05-03 19:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <326e38dd-b4a8-e0ca-6ff7-af60e8045c74@oracle.com>
2017-05-01 18:00 ` [PATCH RFC] hugetlbfs 'noautofill' mount option Prakash Sangappa
2017-05-02 10:53   ` Anshuman Khandual
2017-05-02 16:07     ` Prakash Sangappa
2017-05-02 21:32   ` Dave Hansen
2017-05-02 23:34     ` Prakash Sangappa
2017-05-02 23:43       ` Dave Hansen
2017-05-03 19:02         ` Prakash Sangappa [this message]
2017-05-08  5:57           ` Prakash Sangappa
2017-05-08 15:58           ` Dave Hansen
2017-05-08 22:12             ` prakash.sangappa
2017-05-09  8:58               ` Christoph Hellwig
2017-05-09 20:59                 ` Prakash Sangappa
2017-05-16 16:51                   ` Prakash Sangappa
2017-06-16 13:15                   ` Andrea Arcangeli
2017-06-20 23:35                     ` Prakash Sangappa
2017-06-27 20:57                       ` Prakash Sangappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7677d20e-5d53-1fb7-5dac-425edda70b7b@oracle.com \
    --to=prakash.sangappa@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).