From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org, shu wang <malate_wangshu@hotmail.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Heiko Carstens <hca@linux.ibm.com>,
Alexey Dobriyan <adobriyan@gmail.com>,
Matthew Wilcox <willy@infradead.org>,
Michel Lespinasse <walken@google.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH 5/5] mm proc/task_mmu.c: add hugetlb specific routine for clear_refs
Date: Thu, 18 Feb 2021 16:14:30 -0800 [thread overview]
Message-ID: <c356f931-8400-b3c8-73b0-2537815d7a6b@oracle.com> (raw)
In-Reply-To: <20210217202518.GA19238@xz-x1>
On 2/17/21 12:25 PM, Peter Xu wrote:
> On Wed, Feb 10, 2021 at 04:03:22PM -0800, Mike Kravetz wrote:
>> There was is no hugetlb specific routine for clearing soft dirty and
>> other referrences. The 'default' routines would only clear the
>> VM_SOFTDIRTY flag in the vma.
>>
>> Add new routine specifically for hugetlb vmas.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> fs/proc/task_mmu.c | 110 +++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 110 insertions(+)
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 829b35016aaa..f06cf9b131a8 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -1116,6 +1116,115 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
>> }
>> #endif
>>
>> +#ifdef CONFIG_HUGETLB_PAGE
>> +static inline bool huge_pte_is_pinned(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t pte)
>> +{
>> + struct page *page;
>> +
>> + if (likely(!atomic_read(&vma->vm_mm->has_pinned)))
>> + return false;
>> + page = pte_page(pte);
>> + if (!page)
>> + return false;
>> + return page_maybe_dma_pinned(page);
>> +}
>> +
>> +static int clear_refs_hugetlb_range(pte_t *ptep, unsigned long hmask,
>> + unsigned long addr, unsigned long end,
>> + struct mm_walk *walk)
>> +{
>> + struct clear_refs_private *cp = walk->private;
>> + struct vm_area_struct *vma = walk->vma;
>> + struct hstate *h = hstate_vma(walk->vma);
>> + unsigned long adj_start = addr, adj_end = end;
>> + spinlock_t *ptl;
>> + pte_t old_pte, pte;
>> +
>> + /*
>> + * clear_refs should only operate on complete vmas. Therefore,
>> + * values passed here should be huge page aligned and huge page
>> + * size in length. Quick validation before taking any action in
>> + * case upstream code is changed.
>> + */
>> + if ((addr & hmask) != addr || end - addr != huge_page_size(h)) {
>> + WARN_ONCE(1, "%s passed unaligned address\n", __func__);
>> + return 1;
>> + }
>
> I wouldn't worry too much on the interface change - The one who will change the
> interface should guarantee all existing hooks will still work, isn't it? :)
Yeah, I can drop this.
> It's slightly confusing to me on why "clear_refs should only operate on
> complete vmas" is related to the check, though.
Mostly me thinking that since it is operating on complete (hugetlb) vma,
then we know vms is huge page aligned and a multiple of huge page in size.
So, all passed addressed should be huge page aligned as well.
>
>> +
>> + ptl = huge_pte_lock(hstate_vma(vma), walk->mm, ptep);
>> +
>> + /* Soft dirty and pmd sharing do not mix */
>
> Right, this seems to be a placeholder for unsharing code.
Sorry, comment was left over from earlier code. Unsharing is actually
done below, I forgot to remove comment.
> Though maybe we can do that earlier in pre_vma() hook? That should be per-vma
> rather than handling one specific huge page here, hence more efficient imho.
Yes, let me look into that. The code below is certianly not the most
efficient.
> this reminded me that I should also better move hugetlb_unshare_all_pmds() of
> my other patch into hugetlb.c, so that this code can call it. Currently it's a
> static function in userfaultfd.c.
>
>> +
>> + pte = huge_ptep_get(ptep);
>> + if (!pte_present(pte))
>> + goto out;
>> + if (unlikely(is_hugetlb_entry_hwpoisoned(pte)))
>> + goto out;
>> +
>> + if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
>
> Maybe move this check into clear_refs_test_walk()? We can bail out earlier if:
>
> (is_vm_hugetlb_page(vma) && (type != CLEAR_REFS_SOFT_DIRTY))
>
Yes, we can do that. I was patterning this after the other 'clear_refs'
routines. But, they can clear things besides soft dirty. Since soft
dirty is the only thing handled for hugetlb, we can bail earlier.
>> + if (huge_pte_is_pinned(vma, addr, pte))
>> + goto out;
>
> Out of topic of this patchset, but it's definitely a pity that we can't track
> soft dirty for pinned pages. Currently the assumption of the pte code path is:
> "if this page can be DMA written then we won't know whether data changed after
> all, then tracking dirty is meaningless", however that's prone to change when
> new hardwares coming, say, IOMMU could start to trap DMA writes already.
>
> But again that's another story.. and we should just follow what we do with
> non-hugetlbfs for sure here, until some day if we'd like to revive soft dirty
> tracking with pinned pages.
>
>> +
>> + /*
>> + * soft dirty and pmd sharing do not work together as
>> + * per-process is tracked in ptes, and pmd sharing allows
>> + * processed to share ptes. We unshare any pmds here.
>> + */
>> + adjust_range_if_pmd_sharing_possible(vma, &adj_start, &adj_end);
>
> Ideally when reach here, huge pmd sharing won't ever exist, right? Then do we
> still need to adjust the range at all?
Right, we should be able to do it earlier.
Thanks again for taking a look at this.
--
Mike Kravetz
>
> Thanks,
>
prev parent reply other threads:[~2021-02-19 0:15 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-11 0:03 [RFC PATCH 0/5] Add hugetlb soft dirty support Mike Kravetz
2021-02-11 0:03 ` [RFC PATCH 1/5] hugetlb: add hugetlb helpers for " Mike Kravetz
2021-02-17 16:24 ` Peter Xu
2021-02-18 22:58 ` Mike Kravetz
2021-02-24 16:46 ` Gerald Schaefer
2021-02-24 16:55 ` Gerald Schaefer
2021-02-11 0:03 ` [RFC PATCH 2/5] hugetlb: enhance hugetlb fault processing to support soft dirty Mike Kravetz
2021-02-17 19:32 ` Peter Xu
2021-02-18 23:26 ` Mike Kravetz
2021-02-11 0:03 ` [RFC PATCH 3/5] mm proc/task_mmu.c: add soft dirty pte checks for hugetlb Mike Kravetz
2021-02-17 19:35 ` Peter Xu
2021-02-18 23:59 ` Mike Kravetz
2021-02-11 0:03 ` [RFC PATCH 4/5] hugetlb: don't permit pmd sharing if soft dirty in use Mike Kravetz
2021-02-17 19:44 ` Peter Xu
2021-02-11 0:03 ` [RFC PATCH 5/5] mm proc/task_mmu.c: add hugetlb specific routine for clear_refs Mike Kravetz
2021-02-17 20:25 ` Peter Xu
2021-02-19 0:14 ` Mike Kravetz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c356f931-8400-b3c8-73b0-2537815d7a6b@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=aarcange@redhat.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=hca@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=malate_wangshu@hotmail.com \
--cc=peterx@redhat.com \
--cc=walken@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).