linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Thomas Hellström (VMware)" <thomas_os@shipmail.org>
Cc: "Thomas Hellstrom" <thellstrom@vmware.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Will Deacon" <will.deacon@arm.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Rik van Riel" <riel@surriel.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present
Date: Wed, 9 Oct 2019 19:07:24 -0700	[thread overview]
Message-ID: <CAHk-=wimOMuhBVrxYneQ5bawFNeGW1h_D3cxN80pitDKgfBx0A@mail.gmail.com> (raw)
In-Reply-To: <6d3ef513-ca9d-9778-10da-86f368a57cd0@shipmail.org>

On Wed, Oct 9, 2019 at 6:10 PM Thomas Hellström (VMware)
<thomas_os@shipmail.org> wrote:
>
> Your original patch does exactly the same!

Oh, no. You misread my original patch.

Look again.

The logic in my original patch was very different. It said that

 - *if* we have a pmd_entry function, then we obviously call that one.

    And if - after calling the pmd_entry function - we are still a
hugepage, then we skip the pte_entry case entirely.

   And part of skipping is obviously "don't split" - but it never had
that "don't split and then call the pte walker" case.

 - and if we *don't* have a pmd_entry function, but we do have a
pte_entry function, then we always split before calling it.

Notice the difference?

So instead of looking at the return value of the pmd_entry() function,
the approach of that suggested patch was to basically say that if the
pmd_entry function wants us to go another level deeper and it was a
hugepmd, it needed to split the pmd to make that happen.

That's actually very different from what your patch did. My original
patch never tried to walk the pte level without having one - either it
*checked* that it had one, or it split.

But I see where you might have misread the patch, particularly if you
only looked at it as a patch, not as the end result of the patch.

The end result of that patch was this:

                if (ops->pmd_entry) {
                        err = ops->pmd_entry(pmd, addr, next, walk);
                        if (err)
                                break;
                        /* No pte level walking? */
                        if (!ops->pte_entry)
                                continue;
                        /* No pte level at all? */
                        if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)
|| pmd_devmap(*pmd))
                                continue;
                } else {
                        if (!ops->pte_entry)
                                continue;

                        split_huge_pmd(walk->vma, pmd, addr);
                        if (pmd_trans_unstable(pmd))
                                goto again;
                }
                err = walk_pte_range(pmd, addr, next, walk);

and look at thew two different sides of the if-statement: if they get
to "walk_pte_range()", both cases wil have verified that there
actually _is_ a pte level. They will just have done it differently. -
the "we didn't have a pmd function" will have split the pmd if it was
a hugepmd, while the "we do have a pmd_entry" case will just check
whether it's still a huge-pmd, and done a "continue" if it was and
never even tried to walk the ptes.

But I think the "change pmd_entry to have a sane return code" is a
simpler and more flexible model, and then the pmd_entry code can just
let the pte walker split the pmd if needed.

So I liked that part of your patch.

           Linus


  reply	other threads:[~2019-10-10  2:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:14 [PATCH v4 0/9] Emulated coherent graphics memory take 2 Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 1/9] mm: Remove BUG_ON mmap_sem not held from xxx_trans_huge_lock() Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 2/9] mm: pagewalk: Take the pagetable lock in walk_pte_range() Thomas Hellström (VMware)
2019-10-09 15:14   ` Kirill A. Shutemov
2019-10-09 16:07     ` Linus Torvalds
2019-10-08  9:15 ` [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present Thomas Hellström (VMware)
2019-10-09 15:27   ` Kirill A. Shutemov
2019-10-09 15:27     ` Kirill A. Shutemov
2019-10-09 16:20     ` Thomas Hellström (VMware)
2019-10-09 16:20       ` Thomas Hellström (VMware)        
2019-10-09 16:21     ` Linus Torvalds
2019-10-09 17:03       ` Thomas Hellström (VMware)
2019-10-09 17:16         ` Linus Torvalds
2019-10-09 18:52           ` Thomas Hellstrom
2019-10-09 19:20             ` Linus Torvalds
2019-10-09 20:06               ` Thomas Hellström (VMware)
2019-10-09 20:20                 ` Linus Torvalds
2019-10-09 22:30                   ` Thomas Hellström (VMware)
2019-10-09 23:50                     ` Thomas Hellström (VMware)
2019-10-09 23:51                     ` Linus Torvalds
2019-10-10  0:18                       ` Linus Torvalds
2019-10-10  1:09                       ` Thomas Hellström (VMware)
2019-10-10  2:07                         ` Linus Torvalds [this message]
2019-10-10  6:15                           ` Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 4/9] mm: Add a walk_page_mapping() function to the pagewalk code Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 5/9] mm: Add write-protect and clean utilities for address space ranges Thomas Hellström (VMware)
2019-10-08 17:06   ` Linus Torvalds
2019-10-08 18:25     ` Thomas Hellstrom
2019-10-08  9:15 ` [PATCH v4 6/9] drm/vmwgfx: Implement an infrastructure for write-coherent resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 7/9] drm/vmwgfx: Use an RBtree instead of linked list for MOB resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 8/9] drm/vmwgfx: Implement an infrastructure for read-coherent resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 9/9] drm/vmwgfx: Add surface dirty-tracking callbacks Thomas Hellström (VMware)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wimOMuhBVrxYneQ5bawFNeGW1h_D3cxN80pitDKgfBx0A@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=thellstrom@vmware.com \
    --cc=thomas_os@shipmail.org \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).