linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Thomas Hellström (VMware)" <thomas_os@shipmail.org>
Cc: "Thomas Hellstrom" <thellstrom@vmware.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Will Deacon" <will.deacon@arm.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Rik van Riel" <riel@surriel.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present
Date: Wed, 9 Oct 2019 17:18:04 -0700	[thread overview]
Message-ID: <CAHk-=wgR3zs=zPsOe4eV-yJi=LLo7bTyrB+6o-FiJz1P0nkCWg@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wg6n_nGRtJd4MeXZrA5QrrVViJeO4x2w37KDbcDmTh3dg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1696 bytes --]

On Wed, Oct 9, 2019 at 4:51 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>  (a) right now nobody wants the "skip" behavior. You think you'll
> eventually want it
>
>  (b) right now, the "return positive value" is actually a horribly
> ugly pointless hack, which could be made to be an error value and
> cleaned up in the process
>
>  (c) to me that really argues that we should just make the rule be
>
>      - negative error means break out with error
>
>      - 0 means continue down the next level
>
>      - positive could be trivially be made to just mean "ok, I did it,
> you can just continue".
>
> and I think that would make a lot of sense.

So here's an ENTIRELY untested patch, but the return value of
pmd_entry() now makes conceptual sense to me. The whole "I hit an
error, I did nothing, I already did it myself" to me is the intuitive
meaning of {neg,0,pos} handling.

I think we probably should do this same thing for the upper levels too
to be consistent, but I think this at least makes sense, and is
simple, and avoids any hacky PAGE_WALK_CALLER_MAX magic.

I also wonder if some caller might want to get a count of "how many X
handled", and we'd just sum up all the positive return values as we're
traversing things, but that falls under "nobody seems to want it right
now", so I'm not adding extra code for something that might not be
useful.

And it is possible that I missed some other pmd_entry() callback that
returns a positive value. I did check them all, but mistakes happen
and I might have missed some case...

Again: entirely and utterly untested. It compiles. That's all I'm
going to guarantee, and even that might be a fluke.

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2426 bytes --]

 mm/mempolicy.c | 10 +++++-----
 mm/pagewalk.c  | 11 ++++++++---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ae967bcf954..f8c99591592b 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -482,7 +482,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
  *
  * queue_pages_pte_range() has three possible return values:
  * 0 - pages are placed on the right node or queued successfully.
- * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * -EBUSY - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
  *     specified.
  * -EIO - only MPOL_MF_STRICT was specified and an existing page was already
  *        on a node that does not follow the policy.
@@ -669,7 +669,7 @@ static const struct mm_walk_ops queue_pages_walk_ops = {
  * passed via @private.
  *
  * queue_pages_range() has three possible return values:
- * 1 - there is unmovable page, but MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * -EBUSY - there is unmovable page, but MPOL_MF_MOVE* & MPOL_MF_STRICT were
  *     specified.
  * 0 - queue pages successfully or no misplaced page.
  * -EIO - there is misplaced page and only MPOL_MF_STRICT was specified.
@@ -1285,8 +1285,8 @@ static long do_mbind(unsigned long start, unsigned long len,
 	ret = queue_pages_range(mm, start, end, nmask,
 			  flags | MPOL_MF_INVERT, &pagelist);
 
-	if (ret < 0) {
-		err = -EIO;
+	if (ret < 0 && ret != -EBUSY) {
+		err = ret;
 		goto up_out;
 	}
 
@@ -1303,7 +1303,7 @@ static long do_mbind(unsigned long start, unsigned long len,
 				putback_movable_pages(&pagelist);
 		}
 
-		if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
+		if ((ret < 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
 			err = -EIO;
 	} else
 		putback_movable_pages(&pagelist);
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index d48c2a986ea3..eb9d292588a2 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -49,10 +49,15 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 		 * This implies that each ->pmd_entry() handler
 		 * needs to know about pmd_trans_huge() pmds
 		 */
-		if (ops->pmd_entry)
+		if (ops->pmd_entry) {
 			err = ops->pmd_entry(pmd, addr, next, walk);
-		if (err)
-			break;
+			if (err < 0)
+				break;
+			if (err > 0) {
+				err = 0;
+				continue;
+			}
+		}
 
 		/*
 		 * Check this here so we only break down trans_huge

  reply	other threads:[~2019-10-10  0:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:14 [PATCH v4 0/9] Emulated coherent graphics memory take 2 Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 1/9] mm: Remove BUG_ON mmap_sem not held from xxx_trans_huge_lock() Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 2/9] mm: pagewalk: Take the pagetable lock in walk_pte_range() Thomas Hellström (VMware)
2019-10-09 15:14   ` Kirill A. Shutemov
2019-10-09 16:07     ` Linus Torvalds
2019-10-08  9:15 ` [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present Thomas Hellström (VMware)
2019-10-09 15:27   ` Kirill A. Shutemov
2019-10-09 15:27     ` Kirill A. Shutemov
2019-10-09 16:20     ` Thomas Hellström (VMware)
2019-10-09 16:20       ` Thomas Hellström (VMware)        
2019-10-09 16:21     ` Linus Torvalds
2019-10-09 17:03       ` Thomas Hellström (VMware)
2019-10-09 17:16         ` Linus Torvalds
2019-10-09 18:52           ` Thomas Hellstrom
2019-10-09 19:20             ` Linus Torvalds
2019-10-09 20:06               ` Thomas Hellström (VMware)
2019-10-09 20:20                 ` Linus Torvalds
2019-10-09 22:30                   ` Thomas Hellström (VMware)
2019-10-09 23:50                     ` Thomas Hellström (VMware)
2019-10-09 23:51                     ` Linus Torvalds
2019-10-10  0:18                       ` Linus Torvalds [this message]
2019-10-10  1:09                       ` Thomas Hellström (VMware)
2019-10-10  2:07                         ` Linus Torvalds
2019-10-10  6:15                           ` Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 4/9] mm: Add a walk_page_mapping() function to the pagewalk code Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 5/9] mm: Add write-protect and clean utilities for address space ranges Thomas Hellström (VMware)
2019-10-08 17:06   ` Linus Torvalds
2019-10-08 18:25     ` Thomas Hellstrom
2019-10-08  9:15 ` [PATCH v4 6/9] drm/vmwgfx: Implement an infrastructure for write-coherent resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 7/9] drm/vmwgfx: Use an RBtree instead of linked list for MOB resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 8/9] drm/vmwgfx: Implement an infrastructure for read-coherent resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 9/9] drm/vmwgfx: Add surface dirty-tracking callbacks Thomas Hellström (VMware)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgR3zs=zPsOe4eV-yJi=LLo7bTyrB+6o-FiJz1P0nkCWg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=thellstrom@vmware.com \
    --cc=thomas_os@shipmail.org \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).