linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Thomas Hellström (VMware)" <thomas_os@shipmail.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Thomas Hellstrom" <thellstrom@vmware.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Will Deacon" <will.deacon@arm.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Rik van Riel" <riel@surriel.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present
Date: Thu, 10 Oct 2019 00:30:49 +0200	[thread overview]
Message-ID: <80f25292-585c-7729-2a23-7c46b3309a1a@shipmail.org> (raw)
In-Reply-To: <CAHk-=wgmr-BPMTnSuKrAMoHL_A0COV_sZkdcNB9aosYfouA_fw@mail.gmail.com>

On 10/9/19 10:20 PM, Linus Torvalds wrote:
> On Wed, Oct 9, 2019 at 1:06 PM Thomas Hellström (VMware)
> <thomas_os@shipmail.org> wrote:
>> On 10/9/19 9:20 PM, Linus Torvalds wrote:
>>> Don't you get it? There *is* no PTE level if you didn't split.
>> Hmm, This paragraph makes me think we have very different perceptions about what I'm trying to achieve.
> It's not about what you're trying to achieve.
>
> It's about the actual code.
>
> You cannot do that
>
>> -               split_huge_pmd(walk->vma, pmd, addr);
>> +               if (!ops->pmd_entry)
>> +                       split_huge_pmd(walk->vma, pmd, addr);
> it's insane.
>
> You *have* to call split_huge_pmd() if you're doing to call the
> pte_entry() function.
>
> I don't understand why you are arguing. This is not about "feelings"
> and "intentions" or about "trying to achieve".
>
> This is about cold hard "you can't do that", and this is now the third
> time I tell you _why_ you can't do that: you can't walk the last level
> if you don't _have_ a last level. You have to split the pmd to do so.
It's not so much arguing but rather trying to understand your concerns 
and your perception of what the final code should look like.
>
> End of story.

So is it that you want pte_entry() to be strictly called for *each* 
virtual address, even if we have a pmd_entry()?
In that case I completely follow your arguments, meaning we skip this 
patch completely?

My take on the change was that pmd_entry() returning 0 would mean we 
could actually skip the pte level completely and nothing would otherwise 
pass down to the next level for which split_huge_pmd() wasn't a NOP, 
similar to how pud_entry does things. FWIW, see

https://lore.kernel.org/lkml/20191004123732.xpr3vroee5mhg2zt@box.shutemov.name/

and we could in the long run transform the pte walk in many pmd_entry 
callbacks into pte_entry callbacks.


>
>> I wanted the pte level to *only* get called for *pre-existing* pte entries.
> Again, I told you what the solution was.
>
> But the fact is, it's not what your other code even wants or does.
>
> Seriously. You have two cases you care about in your callbacks
>
>   - an actual hugepmd. This is an ERROR for you and you do a huge
> WARN_ON() for it to let people know.
No, it's typically a NOP, since the hugepmd should be read-only. 
Write-enabled huge pages are split in fault().
>
>   - regular pages. This is what your other code actually handles.
>
> So for the hugepomd case, you have two choices:
>
>   - handle it by splitting and deal with the regular pages: "return 0;"
Well, this is not what we want to do, and the reason we have the patch 
in the first place.
>
>   - actually error out: "return -EINVAL".

/Thomas




  reply	other threads:[~2019-10-09 22:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:14 [PATCH v4 0/9] Emulated coherent graphics memory take 2 Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 1/9] mm: Remove BUG_ON mmap_sem not held from xxx_trans_huge_lock() Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 2/9] mm: pagewalk: Take the pagetable lock in walk_pte_range() Thomas Hellström (VMware)
2019-10-09 15:14   ` Kirill A. Shutemov
2019-10-09 16:07     ` Linus Torvalds
2019-10-08  9:15 ` [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present Thomas Hellström (VMware)
2019-10-09 15:27   ` Kirill A. Shutemov
2019-10-09 15:27     ` Kirill A. Shutemov
2019-10-09 16:20     ` Thomas Hellström (VMware)
2019-10-09 16:20       ` Thomas Hellström (VMware)        
2019-10-09 16:21     ` Linus Torvalds
2019-10-09 17:03       ` Thomas Hellström (VMware)
2019-10-09 17:16         ` Linus Torvalds
2019-10-09 18:52           ` Thomas Hellstrom
2019-10-09 19:20             ` Linus Torvalds
2019-10-09 20:06               ` Thomas Hellström (VMware)
2019-10-09 20:20                 ` Linus Torvalds
2019-10-09 22:30                   ` Thomas Hellström (VMware) [this message]
2019-10-09 23:50                     ` Thomas Hellström (VMware)
2019-10-09 23:51                     ` Linus Torvalds
2019-10-10  0:18                       ` Linus Torvalds
2019-10-10  1:09                       ` Thomas Hellström (VMware)
2019-10-10  2:07                         ` Linus Torvalds
2019-10-10  6:15                           ` Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 4/9] mm: Add a walk_page_mapping() function to the pagewalk code Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 5/9] mm: Add write-protect and clean utilities for address space ranges Thomas Hellström (VMware)
2019-10-08 17:06   ` Linus Torvalds
2019-10-08 18:25     ` Thomas Hellstrom
2019-10-08  9:15 ` [PATCH v4 6/9] drm/vmwgfx: Implement an infrastructure for write-coherent resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 7/9] drm/vmwgfx: Use an RBtree instead of linked list for MOB resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 8/9] drm/vmwgfx: Implement an infrastructure for read-coherent resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 9/9] drm/vmwgfx: Add surface dirty-tracking callbacks Thomas Hellström (VMware)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80f25292-585c-7729-2a23-7c46b3309a1a@shipmail.org \
    --to=thomas_os@shipmail.org \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=thellstrom@vmware.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).