linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Mel Gorman <mgorman@suse.de>, Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	xfs@oss.sgi.com, ppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
Date: Thu, 19 Mar 2015 14:41:48 -0700	[thread overview]
Message-ID: <CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com> (raw)
In-Reply-To: <CA+55aFy-Mw74rAdLMMMUgnsG3ZttMWVNGz7CXZJY7q9fqyRYfg@mail.gmail.com>

On Wed, Mar 18, 2015 at 10:31 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I think there's something I'm missing. For non-shared mappings, I
> still have the idea that pte_dirty should be the same as pte_write.
> And yet, your testing of 3.19 shows that it's a big difference.
> There's clearly something I'm completely missing.

Ahh. The normal page table scanning and page fault handling both clear
and set the dirty bit together with the writable one. But "fork()"
will clear the writable bit without clearing dirty. For some reason I
thought it moved the dirty bit into the struct page like the VM
scanning does, but that was just me having a brainfart. So yeah,
pte_dirty doesn't have to match pte_write even under perfectly normal
circumstances. Maybe there are other cases.

Not that I see a lot of forking in the xfs repair case either, so..

Dave, mind re-running the plain 3.19 numbers to really verify that the
pte_dirty/pte_write change really made that big of a difference. Maybe
your recollection of ~55,000 migrate_pages events was faulty. If the
pte_write ->pte_dirty change is the *only* difference, it's still very
odd how that one difference would make migrate_rate go from ~55k to
471k. That's an order of magnitude difference, for what really
shouldn't be a big change.

I'm running a kernel right now with a hacky "update_mmu_cache()" that
warns if pte_dirty is ever different from pte_write().

+void update_mmu_cache(struct vm_area_struct *vma,
+               unsigned long addr, pte_t *ptep)
+{
+       if (!(vma->vm_flags & VM_SHARED)) {
+               pte_t now = READ_ONCE(*ptep);
+               if (!pte_write(now) != !pte_dirty(now)) {
+                       static int count = 20;
+                       static unsigned int prev = 0;
+                       unsigned int val = pte_val(now) & 0xfff;
+                       if (prev != val && count) {
+                               prev = val;
+                               count--;
+                               WARN(1, "pte value %x", val);
+                       }
+               }
+       }
+}

I haven't seen a single warning so far (and there I wrote all that
code to limit repeated warnings), although admittedly
update_mu_cache() isn't called for all cases where we change a pte
(not for the fork case, for example). But it *is* called for the page
faulting cases

Maybe a system update has changed libraries and memory allocation
patterns, and there is something bigger than that one-liner
pte_dirty/write change going on?

                             Linus

  parent reply	other threads:[~2015-03-19 21:41 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-07 15:20 [RFC PATCH 0/4] Automatic NUMA balancing and PROT_NONE handling followup v2r8 Mel Gorman
2015-03-07 15:20 ` [PATCH 1/4] mm: thp: Return the correct value for change_huge_pmd Mel Gorman
2015-03-07 20:13   ` Linus Torvalds
2015-03-07 20:31   ` Linus Torvalds
2015-03-07 20:56     ` Mel Gorman
2015-03-07 15:20 ` [PATCH 2/4] mm: numa: Remove migrate_ratelimited Mel Gorman
2015-03-07 15:20 ` [PATCH 3/4] mm: numa: Mark huge PTEs young when clearing NUMA hinting faults Mel Gorman
2015-03-07 18:33   ` Linus Torvalds
2015-03-07 18:42     ` Linus Torvalds
2015-03-07 15:20 ` [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Mel Gorman
2015-03-07 16:36   ` Ingo Molnar
2015-03-07 17:37     ` Mel Gorman
2015-03-08  9:54       ` Ingo Molnar
2015-03-07 19:12     ` Linus Torvalds
2015-03-08 10:02       ` Ingo Molnar
2015-03-08 18:35         ` Linus Torvalds
2015-03-08 18:46           ` Linus Torvalds
2015-03-09 11:29           ` Dave Chinner
2015-03-09 16:52             ` Linus Torvalds
2015-03-09 19:19               ` Dave Chinner
2015-03-10 23:55                 ` Linus Torvalds
2015-03-12 13:10                   ` Mel Gorman
2015-03-12 16:20                     ` Linus Torvalds
2015-03-12 18:49                       ` Mel Gorman
2015-03-17  7:06                         ` Dave Chinner
2015-03-17 16:53                           ` Linus Torvalds
2015-03-17 20:51                             ` Dave Chinner
2015-03-17 21:30                               ` Linus Torvalds
2015-03-17 22:08                                 ` Dave Chinner
2015-03-18 16:08                                   ` Linus Torvalds
2015-03-18 17:31                                     ` Linus Torvalds
2015-03-18 22:23                                       ` Dave Chinner
2015-03-19 14:10                                       ` Mel Gorman
2015-03-19 18:09                                         ` Linus Torvalds
2015-03-19 21:41                                       ` Linus Torvalds [this message]
2015-03-19 22:41                                         ` Dave Chinner
2015-03-19 23:05                                           ` Linus Torvalds
2015-03-19 23:23                                             ` Dave Chinner
2015-03-20  0:23                                             ` Dave Chinner
2015-03-20  1:29                                               ` Linus Torvalds
2015-03-20  4:13                                                 ` Dave Chinner
2015-03-20 17:02                                                   ` Linus Torvalds
2015-03-23 12:01                                                     ` Mel Gorman
2015-03-20 10:12                                                 ` Mel Gorman
2015-03-20  9:56                                             ` Mel Gorman
2015-03-08 20:40         ` Mel Gorman
2015-03-09 21:02           ` Mel Gorman
2015-03-10 13:08             ` Mel Gorman
2015-03-08  9:41   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).