All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Stultz <john.stultz@linaro.org>
To: Minchan Kim <minchan@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Android Kernel Team <kernel-team@android.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
	Hugh Dickins <hughd@google.com>, Dave Hansen <dave@sr71.net>,
	Rik van Riel <riel@redhat.com>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Neil Brown <neilb@suse.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Hommey <mh@glandium.org>, Taras Glek <tglek@mozilla.com>,
	Jan Kara <jack@suse.cz>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Michel Lespinasse <walken@google.com>,
	Keith Packard <keithp@keithp.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 2/4] MADV_VOLATILE: Add MADV_VOLATILE/NONVOLATILE hooks and handle marking vmas
Date: Thu, 08 May 2014 17:24:41 -0700	[thread overview]
Message-ID: <536C2049.6020308@linaro.org> (raw)
In-Reply-To: <20140509000752.GD25951@bbox>

On 05/08/2014 05:07 PM, Minchan Kim wrote:
> On Thu, May 08, 2014 at 04:43:07PM -0700, John Stultz wrote:
>> On 05/08/2014 04:12 PM, Minchan Kim wrote:
>>> On Thu, May 08, 2014 at 09:38:40AM -0700, John Stultz wrote:
>>>> On 05/07/2014 06:21 PM, Minchan Kim wrote:
>>>>> Hey John,
>>>>>
>>>>> On Tue, Apr 29, 2014 at 02:21:21PM -0700, John Stultz wrote:
>>>>>> This patch introduces MADV_VOLATILE/NONVOLATILE flags to madvise(),
>>>>>> which allows for specifying ranges of memory as volatile, and able
>>>>>> to be discarded by the system.
>>>>>>
>>>>>> This initial patch simply adds flag handling to madvise, and the
>>>>>> vma handling, splitting and merging the vmas as needed, and marking
>>>>>> them with VM_VOLATILE.
>>>>>>
>>>>>> No purging or discarding of volatile ranges is done at this point.
>>>>>>
>>>>>> This a simplified implementation which reuses some of the logic
>>>>>> from Minchan's earlier efforts. So credit to Minchan for his work.
>>>>> Remove purged argument is really good thing but I'm not sure merging
>>>>> the feature into madvise syscall is good idea.
>>>>> My concern is how we support user who don't want SIGBUS.
>>>>> I believe we should support them because someuser(ex, sanitizer) really
>>>>> want to avoid MADV_NONVOLATILE call right before overwriting their cache
>>>>> (ex, If there was purged page for cyclic cache, user should call NONVOLATILE
>>>>> right before overwriting to avoid SIGBUS).
>>>> So... Why not use MADV_FREE then for this case?
>>> MADV_FREE is one-shot operation. I mean we should call it again to make
>>> them lazyfree while vrange could preserve volatility.
>>> Pz, think about thread-sanitizer usecase. They do mmap 70TB once start up
>>> and want to mark the range as volatile. If they uses MADV_FREE instead of
>>> volatile, they should mark 70TB as lazyfree periodically, which is terrible
>>> because MADV_FREE's cost is O(N).
>> I still have had difficulty seeing the thread-sanitizer usage as a
>> generic enough model for other applications. I realize they want to
>> avoid marking and unmarking ranges (and they want that marking and
>> unmarking to be very cheap), but the zero-fill purged page (while still
>> preserving volatility) causes lots of *very* strange behavior:
>  
> I don't think it's for only thread-sanitizer.
> Pz, think following usecase.
>
> Let's assume big volatile cache.
> If there is request for cache, it should find a object in a cache
> and if it found, it should call vrange(NOVOLATILE) right before
> passing it to the user and investigate it was purged or not.
> If it wasn't purged, cache manager could pass the object to the user.
> But it's circular cache so if there is no request from user, cache manager
> always overwrites objects so it could encounter SIGBUS easily
> so as current sematic, cache manager always should call vrange(NOVOLATILE)
> right before the overwriting. Otherwise, it should register SIGBUS handler
> to unmark volatile by page unit. SIGH.
>
> If we support zero-fill, cache manager could overwrite object without
> SIGBUS handling or vrange(NOVOLATILE) call right before overwriting.
> Just what we need is vrange(NOVOLATILE) call right before passing it
> to user.

But that wouldn't work. If the page was purged half way through writing
it, we end up with a page of half zero data and half written data. What
would the page state be at that point? Purged? Not purged?

* If its not purged (since a write was done to the page after being
zero-filled), we will silently return to the user corrupted data.

* If it is considered purged, how do we store that data? Since we
currently detect purged pages by checking if they are present when we
mark non-volatile.


This sort of zero-fill behavior on volatile pages only seems to make
sense if pages are written atomically.

The SIGBUS handling solution you SIGH'ed at above actually seems
reasonable, because it would allow the page to be safely filled
atomically (marking it non-volatile, filling it and then re-marking it
volatile). Sure it would cost more, fast and wrong isn't really a valid
option.



>
>> * How do general applications know the difference between a purged page
>> and a valid empty page?
>> * When reading/writing a page, what happens if half-way the application
>> is preempted, and the page is purged?
>> * If a volatile page is purged, then zero-filled on a read or write,
>> what is its purged state when we're marking it non-volatile?
> Maybe above scenario goes your questions to VOID.

I'm not sure I understand this.


>
>> These use cases don't seem completely baked, or maybe I've just not been
>> able to comprehend them yet. But I don't quite understand the desire to
>> prioritize this style of usage over other simpler and more well
>> established usage?
> I think it's one of typical usecase of vrange syscall.

I apologize if I'm seeming stubborn, but I just can't see how it would
work sanely.

thanks
-john


WARNING: multiple messages have this Message-ID (diff)
From: John Stultz <john.stultz@linaro.org>
To: Minchan Kim <minchan@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Android Kernel Team <kernel-team@android.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
	Hugh Dickins <hughd@google.com>, Dave Hansen <dave@sr71.net>,
	Rik van Riel <riel@redhat.com>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Neil Brown <neilb@suse.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Hommey <mh@glandium.org>, Taras Glek <tglek@mozilla.com>,
	Jan Kara <jack@suse.cz>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Michel Lespinasse <walken@google.com>,
	Keith Packard <keithp@keithp.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 2/4] MADV_VOLATILE: Add MADV_VOLATILE/NONVOLATILE hooks and handle marking vmas
Date: Thu, 08 May 2014 17:24:41 -0700	[thread overview]
Message-ID: <536C2049.6020308@linaro.org> (raw)
In-Reply-To: <20140509000752.GD25951@bbox>

On 05/08/2014 05:07 PM, Minchan Kim wrote:
> On Thu, May 08, 2014 at 04:43:07PM -0700, John Stultz wrote:
>> On 05/08/2014 04:12 PM, Minchan Kim wrote:
>>> On Thu, May 08, 2014 at 09:38:40AM -0700, John Stultz wrote:
>>>> On 05/07/2014 06:21 PM, Minchan Kim wrote:
>>>>> Hey John,
>>>>>
>>>>> On Tue, Apr 29, 2014 at 02:21:21PM -0700, John Stultz wrote:
>>>>>> This patch introduces MADV_VOLATILE/NONVOLATILE flags to madvise(),
>>>>>> which allows for specifying ranges of memory as volatile, and able
>>>>>> to be discarded by the system.
>>>>>>
>>>>>> This initial patch simply adds flag handling to madvise, and the
>>>>>> vma handling, splitting and merging the vmas as needed, and marking
>>>>>> them with VM_VOLATILE.
>>>>>>
>>>>>> No purging or discarding of volatile ranges is done at this point.
>>>>>>
>>>>>> This a simplified implementation which reuses some of the logic
>>>>>> from Minchan's earlier efforts. So credit to Minchan for his work.
>>>>> Remove purged argument is really good thing but I'm not sure merging
>>>>> the feature into madvise syscall is good idea.
>>>>> My concern is how we support user who don't want SIGBUS.
>>>>> I believe we should support them because someuser(ex, sanitizer) really
>>>>> want to avoid MADV_NONVOLATILE call right before overwriting their cache
>>>>> (ex, If there was purged page for cyclic cache, user should call NONVOLATILE
>>>>> right before overwriting to avoid SIGBUS).
>>>> So... Why not use MADV_FREE then for this case?
>>> MADV_FREE is one-shot operation. I mean we should call it again to make
>>> them lazyfree while vrange could preserve volatility.
>>> Pz, think about thread-sanitizer usecase. They do mmap 70TB once start up
>>> and want to mark the range as volatile. If they uses MADV_FREE instead of
>>> volatile, they should mark 70TB as lazyfree periodically, which is terrible
>>> because MADV_FREE's cost is O(N).
>> I still have had difficulty seeing the thread-sanitizer usage as a
>> generic enough model for other applications. I realize they want to
>> avoid marking and unmarking ranges (and they want that marking and
>> unmarking to be very cheap), but the zero-fill purged page (while still
>> preserving volatility) causes lots of *very* strange behavior:
>  
> I don't think it's for only thread-sanitizer.
> Pz, think following usecase.
>
> Let's assume big volatile cache.
> If there is request for cache, it should find a object in a cache
> and if it found, it should call vrange(NOVOLATILE) right before
> passing it to the user and investigate it was purged or not.
> If it wasn't purged, cache manager could pass the object to the user.
> But it's circular cache so if there is no request from user, cache manager
> always overwrites objects so it could encounter SIGBUS easily
> so as current sematic, cache manager always should call vrange(NOVOLATILE)
> right before the overwriting. Otherwise, it should register SIGBUS handler
> to unmark volatile by page unit. SIGH.
>
> If we support zero-fill, cache manager could overwrite object without
> SIGBUS handling or vrange(NOVOLATILE) call right before overwriting.
> Just what we need is vrange(NOVOLATILE) call right before passing it
> to user.

But that wouldn't work. If the page was purged half way through writing
it, we end up with a page of half zero data and half written data. What
would the page state be at that point? Purged? Not purged?

* If its not purged (since a write was done to the page after being
zero-filled), we will silently return to the user corrupted data.

* If it is considered purged, how do we store that data? Since we
currently detect purged pages by checking if they are present when we
mark non-volatile.


This sort of zero-fill behavior on volatile pages only seems to make
sense if pages are written atomically.

The SIGBUS handling solution you SIGH'ed at above actually seems
reasonable, because it would allow the page to be safely filled
atomically (marking it non-volatile, filling it and then re-marking it
volatile). Sure it would cost more, fast and wrong isn't really a valid
option.



>
>> * How do general applications know the difference between a purged page
>> and a valid empty page?
>> * When reading/writing a page, what happens if half-way the application
>> is preempted, and the page is purged?
>> * If a volatile page is purged, then zero-filled on a read or write,
>> what is its purged state when we're marking it non-volatile?
> Maybe above scenario goes your questions to VOID.

I'm not sure I understand this.


>
>> These use cases don't seem completely baked, or maybe I've just not been
>> able to comprehend them yet. But I don't quite understand the desire to
>> prioritize this style of usage over other simpler and more well
>> established usage?
> I think it's one of typical usecase of vrange syscall.

I apologize if I'm seeming stubborn, but I just can't see how it would
work sanely.

thanks
-john

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-05-09  0:24 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-29 21:21 [PATCH 0/4] Volatile Ranges (v14 - madvise reborn edition!) John Stultz
2014-04-29 21:21 ` John Stultz
2014-04-29 21:21 ` [PATCH 1/4] swap: Cleanup how special swap file numbers are defined John Stultz
2014-04-29 21:21   ` John Stultz
2014-04-29 21:21 ` [PATCH 2/4] MADV_VOLATILE: Add MADV_VOLATILE/NONVOLATILE hooks and handle marking vmas John Stultz
2014-04-29 21:21   ` John Stultz
2014-05-08  1:21   ` Minchan Kim
2014-05-08  1:21     ` Minchan Kim
2014-05-08 16:38     ` John Stultz
2014-05-08 16:38       ` John Stultz
2014-05-08 23:12       ` Minchan Kim
2014-05-08 23:12         ` Minchan Kim
2014-05-08 23:43         ` John Stultz
2014-05-08 23:43           ` John Stultz
2014-05-09  0:07           ` Minchan Kim
2014-05-09  0:07             ` Minchan Kim
2014-05-09  0:24             ` John Stultz [this message]
2014-05-09  0:24               ` John Stultz
2014-05-09  0:41               ` Minchan Kim
2014-05-09  0:41                 ` Minchan Kim
2014-04-29 21:21 ` [PATCH 3/4] MADV_VOLATILE: Add purged page detection on setting memory non-volatile John Stultz
2014-04-29 21:21   ` John Stultz
2014-05-08  1:51   ` Minchan Kim
2014-05-08  1:51     ` Minchan Kim
2014-05-08 21:45     ` John Stultz
2014-05-08 21:45       ` John Stultz
2014-05-08 23:45       ` Minchan Kim
2014-05-08 23:45         ` Minchan Kim
2014-04-29 21:21 ` [PATCH 4/4] MADV_VOLATILE: Add page purging logic & SIGBUS trap John Stultz
2014-04-29 21:21   ` John Stultz
2014-05-08  5:16   ` Minchan Kim
2014-05-08  5:16     ` Minchan Kim
2014-05-08 16:39     ` John Stultz
2014-05-08 16:39       ` John Stultz
2014-05-08  5:58 ` [PATCH 0/4] Volatile Ranges (v14 - madvise reborn edition!) Minchan Kim
2014-05-08  5:58   ` Minchan Kim
2014-05-08 17:04   ` John Stultz
2014-05-08 17:04     ` John Stultz
2014-05-08 23:29     ` Minchan Kim
2014-05-08 23:29       ` Minchan Kim
2014-05-08 17:12 ` John Stultz
2014-05-08 17:12   ` John Stultz
2014-06-03 14:57   ` Johannes Weiner
2014-06-03 14:57     ` Johannes Weiner
2014-06-16 20:12     ` John Stultz
2014-06-16 20:12       ` John Stultz
2014-06-16 22:24       ` Andrea Arcangeli
2014-06-16 22:24         ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=536C2049.6020308@linaro.org \
    --to=john.stultz@linaro.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@sr71.net \
    --cc=dmitry.adamushko@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=keithp@keithp.com \
    --cc=kernel-team@android.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mh@glandium.org \
    --cc=minchan@kernel.org \
    --cc=neilb@suse.de \
    --cc=riel@redhat.com \
    --cc=rlove@google.com \
    --cc=tglek@mozilla.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.