All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Daniel Micay <danielmicay@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>, Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Michal Hocko <mhocko@suse.cz>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux API <linux-api@vger.kernel.org>, Jason Evans <je@fb.com>,
	Shaohua Li <shli@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	yalin wang <yalin.wang2010@gmail.com>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
Date: Wed, 4 Nov 2015 10:23:13 -0800	[thread overview]
Message-ID: <CALCETrU5P-mmjf+8QuS3-pm__R02j2nnRc5B1gQkeC013XWNvA@mail.gmail.com> (raw)
In-Reply-To: <56399CA5.8090101@gmail.com>

On Tue, Nov 3, 2015 at 9:50 PM, Daniel Micay <danielmicay@gmail.com> wrote:
>> Does this set the write protect bit?
>>
>> What happens on architectures without hardware dirty tracking?
>
> It's supposed to avoid needing page faults when the data is accessed
> again, but it can just be implemented via page faults on architectures
> without a way to check for access or writes. MADV_DONTNEED is also a
> valid implementation of MADV_FREE if it comes to that (which is what it
> does on swapless systems for now).

I wonder whether arches without the requisite tracking should just
turn it off.  While it might be faster than MADV_DONTNEED or munmap on
those arches, it doesn't really deserve to be faster.

>
>> Using the dirty bit for these semantics scares me.  This API creates a
>> page that can have visible nonzero contents and then can
>> asynchronously and magically zero itself thereafter.  That makes me
>> nervous.  Could we use the accessed bit instead?  Then the observable
>> semantics would be equivalent to having MADV_FREE either zero the page
>> or do nothing, except that it doesn't make up its mind until the next
>> read.
>
> FWIW, those are already basically the semantics provided by GCC and LLVM
> for data the compiler considers uninitialized (they could be more
> aggressive since C just says it's undefined, but in practice they allow
> it but can produce inconsistent results even if it isn't touched).
>
> http://llvm.org/docs/LangRef.html#undefined-values

But C isn't the only thing in the world.  Also, I think that a C
optimizer should be free to turn:

if ([complicated condition])
  *ptr = 1;

into:

if (*ptr != 1 && [complicated condition])
  *ptr = 1;

as long as [complicated condition] has no side effects.  The MADV_FREE
semantics in this patch set break that.

>
> It doesn't seem like there would be an advantage to checking if the data
> was written to vs. whether it was accessed if checking for both of those
> is comparable in performance. I don't know enough about that.

I'd imagine that there would be no performance difference whatsoever
on hardware that has a real accessed bit.  The only thing that changes
is the choice of which bit to use.

>
>>> +                       ptent = pte_mkold(ptent);
>>> +                       ptent = pte_mkclean(ptent);
>>> +                       set_pte_at(mm, addr, pte, ptent);
>>> +                       tlb_remove_tlb_entry(tlb, pte, addr);
>>
>> It looks like you are flushing the TLB.  In a multithreaded program,
>> that's rather expensive.  Potentially silly question: would it be
>> better to just zero the page immediately in a multithreaded program
>> and then, when swapping out, check the page is zeroed and, if so, skip
>> swapping it out?  That could be done without forcing an IPI.
>
> In the common case it will be passed many pages by the allocator. There
> will still be a layer of purging logic on top of MADV_FREE but it can be
> much thinner than the current workarounds for MADV_DONTNEED. So the
> allocator would still be coalescing dirty ranges and only purging when
> the ratio of dirty:clean pages rises above some threshold. It would be
> able to weight the largest ranges for purging first rather than logic
> based on stuff like aging as is used for MADV_DONTNEED.
>

With enough pages at once, though, munmap would be fine, too.

Maybe what's really needed is a MADV_FREE variant that takes an iovec.
On an all-cores multithreaded mm, the TLB shootdown broadcast takes
thousands of cycles on each core more or less regardless of how much
of the TLB gets zapped.

--Andy

WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@amacapital.net>
To: Daniel Micay <danielmicay@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>, Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Michal Hocko <mhocko@suse.cz>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux API <linux-api@vger.kernel.org>, Jason Evans <je@fb.com>,
	Shaohua Li <shli@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	yalin wang <yalin.wang2010@gmail.com>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
Date: Wed, 4 Nov 2015 10:23:13 -0800	[thread overview]
Message-ID: <CALCETrU5P-mmjf+8QuS3-pm__R02j2nnRc5B1gQkeC013XWNvA@mail.gmail.com> (raw)
In-Reply-To: <56399CA5.8090101@gmail.com>

On Tue, Nov 3, 2015 at 9:50 PM, Daniel Micay <danielmicay@gmail.com> wrote:
>> Does this set the write protect bit?
>>
>> What happens on architectures without hardware dirty tracking?
>
> It's supposed to avoid needing page faults when the data is accessed
> again, but it can just be implemented via page faults on architectures
> without a way to check for access or writes. MADV_DONTNEED is also a
> valid implementation of MADV_FREE if it comes to that (which is what it
> does on swapless systems for now).

I wonder whether arches without the requisite tracking should just
turn it off.  While it might be faster than MADV_DONTNEED or munmap on
those arches, it doesn't really deserve to be faster.

>
>> Using the dirty bit for these semantics scares me.  This API creates a
>> page that can have visible nonzero contents and then can
>> asynchronously and magically zero itself thereafter.  That makes me
>> nervous.  Could we use the accessed bit instead?  Then the observable
>> semantics would be equivalent to having MADV_FREE either zero the page
>> or do nothing, except that it doesn't make up its mind until the next
>> read.
>
> FWIW, those are already basically the semantics provided by GCC and LLVM
> for data the compiler considers uninitialized (they could be more
> aggressive since C just says it's undefined, but in practice they allow
> it but can produce inconsistent results even if it isn't touched).
>
> http://llvm.org/docs/LangRef.html#undefined-values

But C isn't the only thing in the world.  Also, I think that a C
optimizer should be free to turn:

if ([complicated condition])
  *ptr = 1;

into:

if (*ptr != 1 && [complicated condition])
  *ptr = 1;

as long as [complicated condition] has no side effects.  The MADV_FREE
semantics in this patch set break that.

>
> It doesn't seem like there would be an advantage to checking if the data
> was written to vs. whether it was accessed if checking for both of those
> is comparable in performance. I don't know enough about that.

I'd imagine that there would be no performance difference whatsoever
on hardware that has a real accessed bit.  The only thing that changes
is the choice of which bit to use.

>
>>> +                       ptent = pte_mkold(ptent);
>>> +                       ptent = pte_mkclean(ptent);
>>> +                       set_pte_at(mm, addr, pte, ptent);
>>> +                       tlb_remove_tlb_entry(tlb, pte, addr);
>>
>> It looks like you are flushing the TLB.  In a multithreaded program,
>> that's rather expensive.  Potentially silly question: would it be
>> better to just zero the page immediately in a multithreaded program
>> and then, when swapping out, check the page is zeroed and, if so, skip
>> swapping it out?  That could be done without forcing an IPI.
>
> In the common case it will be passed many pages by the allocator. There
> will still be a layer of purging logic on top of MADV_FREE but it can be
> much thinner than the current workarounds for MADV_DONTNEED. So the
> allocator would still be coalescing dirty ranges and only purging when
> the ratio of dirty:clean pages rises above some threshold. It would be
> able to weight the largest ranges for purging first rather than logic
> based on stuff like aging as is used for MADV_DONTNEED.
>

With enough pages at once, though, munmap would be fine, too.

Maybe what's really needed is a MADV_FREE variant that takes an iovec.
On an all-cores multithreaded mm, the TLB shootdown broadcast takes
thousands of cycles on each core more or less regardless of how much
of the TLB gets zapped.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-11-04 18:23 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-04  1:25 [PATCH v2 00/13] MADV_FREE support Minchan Kim
2015-11-04  1:25 ` Minchan Kim
2015-11-04  1:25 ` [PATCH v2 01/13] mm: support madvise(MADV_FREE) Minchan Kim
2015-11-04  1:25   ` Minchan Kim
2015-11-04  2:16   ` Sergey Senozhatsky
2015-11-04  2:16     ` Sergey Senozhatsky
2015-11-04  2:16     ` Sergey Senozhatsky
2015-11-04 23:39     ` Minchan Kim
2015-11-04 23:39       ` Minchan Kim
2015-11-04 23:39       ` Minchan Kim
2015-11-05  3:41       ` Sergey Senozhatsky
2015-11-05  3:41         ` Sergey Senozhatsky
2015-11-05  3:41         ` Sergey Senozhatsky
2015-11-04  2:29   ` Sergey Senozhatsky
2015-11-04  2:29     ` Sergey Senozhatsky
2015-11-04 23:40     ` Minchan Kim
2015-11-04 23:40       ` Minchan Kim
2015-11-04 23:40       ` Minchan Kim
2015-11-04  3:41   ` Andy Lutomirski
2015-11-04  3:41     ` Andy Lutomirski
2015-11-04  3:41     ` Andy Lutomirski
2015-11-04  5:50     ` Daniel Micay
2015-11-04  5:53       ` Daniel Micay
2015-11-04  5:53         ` Daniel Micay
2015-11-04  6:04         ` Daniel Micay
2015-11-04 18:23       ` Andy Lutomirski [this message]
2015-11-04 18:23         ` Andy Lutomirski
2015-11-04 22:05         ` Daniel Micay
2015-11-05 18:17           ` Shaohua Li
2015-11-05 18:17             ` Shaohua Li
2015-11-05 18:17             ` Shaohua Li
2015-11-05 20:13             ` Daniel Micay
2015-11-05 20:14               ` Daniel Micay
2015-11-05 20:14                 ` Daniel Micay
2015-11-05  0:13     ` Minchan Kim
2015-11-05  0:13       ` Minchan Kim
2015-11-05  0:13       ` Minchan Kim
2015-11-05  0:42       ` Andy Lutomirski
2015-11-05  0:42         ` Andy Lutomirski
2015-11-05  0:56         ` Minchan Kim
2015-11-05  0:56           ` Minchan Kim
2015-11-05  1:29           ` Andy Lutomirski
2015-11-05  1:29             ` Andy Lutomirski
2015-11-05  1:48             ` Minchan Kim
2015-11-05  1:48               ` Minchan Kim
2015-11-05  1:48               ` Minchan Kim
2015-11-04 20:00   ` Shaohua Li
2015-11-04 20:00     ` Shaohua Li
2015-11-04 20:00     ` Shaohua Li
2015-11-04 21:16     ` Daniel Micay
2015-11-04 21:16       ` Daniel Micay
2015-11-04 21:29       ` Daniel Micay
2015-11-04 21:29         ` Daniel Micay
2015-11-04 21:43     ` Andy Lutomirski
2015-11-04 21:43       ` Andy Lutomirski
2015-11-05  1:33     ` Minchan Kim
2015-11-05  1:33       ` Minchan Kim
2015-11-05  1:33       ` Minchan Kim
2015-11-05  1:37       ` Minchan Kim
2015-11-05  1:37         ` Minchan Kim
2015-11-05  1:37         ` Minchan Kim
2015-12-01 22:30     ` John Stultz
2015-12-01 22:30       ` John Stultz
2015-11-04  1:25 ` [PATCH v2 02/13] mm: define MADV_FREE for some arches Minchan Kim
2015-11-04  1:25   ` Minchan Kim
2015-11-04  1:25 ` [PATCH v2 03/13] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures Minchan Kim
2015-11-04  1:25   ` Minchan Kim
2015-11-04  1:25   ` Minchan Kim
2015-11-04  1:25   ` Minchan Kim
2015-11-04  1:25 ` [PATCH v2 04/13] mm: free swp_entry in madvise_free Minchan Kim
2015-11-04  1:25   ` Minchan Kim
2015-11-04  1:25   ` Minchan Kim
2015-11-04  1:25 ` [PATCH v2 05/13] mm: move lazily freed pages to inactive list Minchan Kim
2015-11-04  1:25   ` Minchan Kim
2015-11-04  1:26 ` [PATCH v2 06/13] mm: clear PG_dirty to mark page freeable Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-11-04  1:26 ` [PATCH v2 07/13] mm: mark stable page dirty in KSM Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-11-04  1:26 ` [PATCH v2 08/13] x86: add pmd_[dirty|mkclean] for THP Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-11-04  1:26 ` [PATCH v2 09/13] sparc: " Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-11-04  1:26 ` [PATCH v2 10/13] powerpc: " Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-11-04  1:26 ` [PATCH v2 11/13] arm: add pmd_mkclean " Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-11-04  1:26 ` [PATCH v2 12/13] arm64: " Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-11-04  1:26 ` [PATCH v2 13/13] mm: don't split THP page when syscall is called Minchan Kim
2015-11-04  1:26   ` Minchan Kim
2015-12-05 11:10 ` [PATCH v2 00/13] MADV_FREE support Pavel Machek
2015-12-05 11:10   ` Pavel Machek
2015-12-05 15:51   ` Daniel Micay
2015-12-05 15:51     ` Daniel Micay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrU5P-mmjf+8QuS3-pm__R02j2nnRc5B1gQkeC013XWNvA@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=akpm@linux-foundation.org \
    --cc=danielmicay@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=je@fb.com \
    --cc=kirill@shutemov.name \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=yalin.wang2010@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.