All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nai Xia <nai.xia@gmail.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>,
	Izik Eidus <izik.eidus@ravellosystems.com>,
	Avi Kivity <avi@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Chris Wright <chrisw@sous-sol.org>, linux-mm <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	kvm <kvm@vger.kernel.org>
Subject: Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking
Date: Thu, 23 Jun 2011 07:37:47 +0800	[thread overview]
Message-ID: <BANLkTinRYr9Vg==C-qyCaRmO7C_aQqBPzw@mail.gmail.com> (raw)
In-Reply-To: <20110622165529.GY20843@redhat.com>

On Thu, Jun 23, 2011 at 12:55 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Wed, Jun 22, 2011 at 11:39:40AM -0400, Rik van Riel wrote:
>> On 06/22/2011 07:19 AM, Izik Eidus wrote:
>>
>> > So what we say here is: it is better to have little junk in the unstable
>> > tree that get flushed eventualy anyway, instead of make the guest
>> > slower....
>> > this race is something that does not reflect accurate of ksm anyway due
>> > to the full memcmp that we will eventualy perform...
>>
>> With 2MB pages, I am not convinced they will get "flushed eventually",
>> because there is a good chance at least one of the 4kB pages inside
>> a 2MB page is in active use at all times.
>>
>> I worry that the proposed changes may end up effectively preventing
>> KSM from scanning inside 2MB pages, when even one 4kB page inside
>> is in active use.  This could mean increased swapping on systems
>> that run low on memory, which can be a much larger performance penalty
>> than ksmd CPU use.
>>
>> We need to scan inside 2MB pages when memory runs low, regardless
>> of the accessed or dirty bits.
>
> I guess we could fallback to the cksum when a THP is encountered
> (repeating the test_and_clear_dirty also wouldn't give the expected
> result if it's repeated on the same hugepmd for the next 4k virtual
> address candidate for unstable tree insertion, so it'd need special
> handling during the virtual walk anyway).
>
> So it's getting a little hairy, skip on THP, skip on EPT, then I
> wonder what is the common case that would be left using it...
>
> Or we could evaluate with statistic how many less pages are inserted
> into the unstable tree using the 2m dirty bit but clearly it'd be less
> reliable, the algorithm really is meant to track the volatility of
> what is later merged, not of a bigger chunk with unrelated data in it.

On 2MB pages, I'd like to remind you and Rik that ksmd currently splits
huge pages before their sub pages gets really merged to stable tree.
So when there are many 2MB pages each having a 4kB subpage
changed for all time, this is already a concern for ksmd to judge
if it's worthwhile to split 2MB page and get its sub-pages merged.
I think the policy for ksmd in a system should be "If you cannot do sth good,
at least do nothing evil". So I really don't think we can satisfy _all_ people.
Get a general method and give users one or two knobs to tune it when they
are the corner cases. How do  you think of my proposal ?

>
> On a side note, khugepaged should also be changed to preserve the
> dirty bit if at least one dirty bit of the ptes is dirty (currently
> the hugepmd is always created dirty, it can never happen for an
> hugepmd to be clean today so it wasn't preserved in khugepaged so far).
>

Thanks for the point that out. This is what I have overlooked!

thanks,
Nai

WARNING: multiple messages have this Message-ID (diff)
From: Nai Xia <nai.xia@gmail.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>,
	Izik Eidus <izik.eidus@ravellosystems.com>,
	Avi Kivity <avi@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Chris Wright <chrisw@sous-sol.org>, linux-mm <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	kvm <kvm@vger.kernel.org>
Subject: Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking
Date: Thu, 23 Jun 2011 07:37:47 +0800	[thread overview]
Message-ID: <BANLkTinRYr9Vg==C-qyCaRmO7C_aQqBPzw@mail.gmail.com> (raw)
In-Reply-To: <20110622165529.GY20843@redhat.com>

On Thu, Jun 23, 2011 at 12:55 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Wed, Jun 22, 2011 at 11:39:40AM -0400, Rik van Riel wrote:
>> On 06/22/2011 07:19 AM, Izik Eidus wrote:
>>
>> > So what we say here is: it is better to have little junk in the unstable
>> > tree that get flushed eventualy anyway, instead of make the guest
>> > slower....
>> > this race is something that does not reflect accurate of ksm anyway due
>> > to the full memcmp that we will eventualy perform...
>>
>> With 2MB pages, I am not convinced they will get "flushed eventually",
>> because there is a good chance at least one of the 4kB pages inside
>> a 2MB page is in active use at all times.
>>
>> I worry that the proposed changes may end up effectively preventing
>> KSM from scanning inside 2MB pages, when even one 4kB page inside
>> is in active use.  This could mean increased swapping on systems
>> that run low on memory, which can be a much larger performance penalty
>> than ksmd CPU use.
>>
>> We need to scan inside 2MB pages when memory runs low, regardless
>> of the accessed or dirty bits.
>
> I guess we could fallback to the cksum when a THP is encountered
> (repeating the test_and_clear_dirty also wouldn't give the expected
> result if it's repeated on the same hugepmd for the next 4k virtual
> address candidate for unstable tree insertion, so it'd need special
> handling during the virtual walk anyway).
>
> So it's getting a little hairy, skip on THP, skip on EPT, then I
> wonder what is the common case that would be left using it...
>
> Or we could evaluate with statistic how many less pages are inserted
> into the unstable tree using the 2m dirty bit but clearly it'd be less
> reliable, the algorithm really is meant to track the volatility of
> what is later merged, not of a bigger chunk with unrelated data in it.

On 2MB pages, I'd like to remind you and Rik that ksmd currently splits
huge pages before their sub pages gets really merged to stable tree.
So when there are many 2MB pages each having a 4kB subpage
changed for all time, this is already a concern for ksmd to judge
if it's worthwhile to split 2MB page and get its sub-pages merged.
I think the policy for ksmd in a system should be "If you cannot do sth good,
at least do nothing evil". So I really don't think we can satisfy _all_ people.
Get a general method and give users one or two knobs to tune it when they
are the corner cases. How do  you think of my proposal ?

>
> On a side note, khugepaged should also be changed to preserve the
> dirty bit if at least one dirty bit of the ptes is dirty (currently
> the hugepmd is always created dirty, it can never happen for an
> hugepmd to be clean today so it wasn't preserved in khugepaged so far).
>

Thanks for the point that out. This is what I have overlooked!

thanks,
Nai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-06-22 23:37 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-21 12:55 [PATCH 0/2 V2] ksm: take dirty bit as reference to avoid volatile pages scanning Nai Xia
2011-06-21 12:55 ` Nai Xia
2011-06-21 13:26 ` [PATCH 1/2 " Nai Xia
2011-06-21 13:26   ` Nai Xia
2011-06-21 21:42   ` Chris Wright
2011-06-21 21:42     ` Chris Wright
2011-06-22  0:02     ` Nai Xia
2011-06-22  0:02       ` Nai Xia
2011-06-22  0:42       ` Chris Wright
2011-06-22  0:42         ` Chris Wright
2011-06-21 13:32 ` [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking Nai Xia
2011-06-21 13:32   ` Nai Xia
2011-06-22  0:21   ` Chris Wright
2011-06-22  0:21     ` Chris Wright
2011-06-22  4:43     ` Nai Xia
2011-06-22  4:43       ` Nai Xia
2011-06-22  6:15     ` Izik Eidus
2011-06-22  6:15       ` Izik Eidus
2011-06-22  6:38       ` Nai Xia
2011-06-22  6:38         ` Nai Xia
2011-06-22 15:46       ` Chris Wright
2011-06-22 15:46         ` Chris Wright
2011-06-22 10:43   ` Avi Kivity
2011-06-22 10:43     ` Avi Kivity
2011-06-22 11:05     ` Izik Eidus
2011-06-22 11:05       ` Izik Eidus
2011-06-22 11:10       ` Avi Kivity
2011-06-22 11:10         ` Avi Kivity
2011-06-22 11:19         ` Izik Eidus
2011-06-22 11:19           ` Izik Eidus
2011-06-22 11:24           ` Avi Kivity
2011-06-22 11:24             ` Avi Kivity
2011-06-22 11:28             ` Avi Kivity
2011-06-22 11:28               ` Avi Kivity
2011-06-22 11:31               ` Avi Kivity
2011-06-22 11:31                 ` Avi Kivity
2011-06-22 11:33               ` Nai Xia
2011-06-22 11:33                 ` Nai Xia
2011-06-22 11:39                 ` Izik Eidus
2011-06-22 11:39                   ` Izik Eidus
2011-06-22 15:39           ` Rik van Riel
2011-06-22 15:39             ` Rik van Riel
2011-06-22 16:55             ` Andrea Arcangeli
2011-06-22 16:55               ` Andrea Arcangeli
2011-06-22 23:37               ` Nai Xia [this message]
2011-06-22 23:37                 ` Nai Xia
2011-06-22 23:59                 ` Andrea Arcangeli
2011-06-22 23:59                   ` Andrea Arcangeli
2011-06-23  0:31                   ` Nai Xia
2011-06-23  0:31                     ` Nai Xia
2011-06-23  0:44                     ` Andrea Arcangeli
2011-06-23  0:44                       ` Andrea Arcangeli
2011-06-23  1:36                       ` Nai Xia
2011-06-23  1:36                         ` Nai Xia
2011-06-23  0:00                 ` Rik van Riel
2011-06-23  0:00                   ` Rik van Riel
2011-06-23  0:42                   ` Nai Xia
2011-06-23  0:42                     ` Nai Xia
2011-06-22 23:13             ` Nai Xia
2011-06-22 23:13               ` Nai Xia
2011-06-22 23:25               ` Andrea Arcangeli
2011-06-22 23:25                 ` Andrea Arcangeli
2011-06-23  1:30                 ` Nai Xia
2011-06-23  1:30                   ` Nai Xia
2011-06-22 23:28               ` Rik van Riel
2011-06-22 23:28                 ` Rik van Riel
2011-06-23  0:52                 ` Nai Xia
2011-06-23  0:52                   ` Nai Xia
2011-06-22 11:24     ` Nai Xia
2011-06-22 15:03   ` Andrea Arcangeli
2011-06-22 15:03     ` Andrea Arcangeli
2011-06-22 15:19     ` Izik Eidus
2011-06-22 15:19       ` Izik Eidus
2011-06-22 23:19     ` Nai Xia
2011-06-22 23:19       ` Nai Xia
2011-06-22 23:44       ` Andrea Arcangeli
2011-06-22 23:44         ` Andrea Arcangeli
2011-06-23  0:14         ` Nai Xia
2011-06-23  0:14           ` Nai Xia
2011-06-22 23:42     ` Nai Xia
2011-06-22 23:42       ` Nai Xia
2011-06-21 13:36 ` [PATCH 2/2 V2] ksm: take dirty bit as reference to avoid volatile pages scanning Nai Xia
2011-06-21 13:36   ` Nai Xia
2011-06-21 22:38   ` Chris Wright
2011-06-21 22:38     ` Chris Wright
2011-06-22  0:04     ` Nai Xia
2011-06-22  0:04       ` Nai Xia
2011-06-22  0:35       ` Chris Wright
2011-06-22  0:35         ` Chris Wright
2011-06-22  4:47         ` Nai Xia
2011-06-22  4:47           ` Nai Xia
2011-06-22 10:55         ` Nai Xia
2011-06-22 10:55           ` Nai Xia
2011-06-22  0:46 ` [PATCH 0/2 " Chris Wright
2011-06-22  0:46   ` Chris Wright
2011-06-22  4:15   ` Nai Xia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTinRYr9Vg==C-qyCaRmO7C_aQqBPzw@mail.gmail.com' \
    --to=nai.xia@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=chrisw@sous-sol.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=izik.eidus@ravellosystems.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.