linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: akpm@linux-foundation.org, songmuchun@bytedance.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size hugetlb page
Date: Thu, 25 Aug 2022 09:25:31 +0200	[thread overview]
Message-ID: <887ca2e2-a7c5-93a7-46cb-185daccd4444@redhat.com> (raw)
In-Reply-To: <Ywa1jp/6naTmUh42@monkey>

> Is the primary concern the locking?  If so, I am not sure we have an issue.
> As mentioned in your commit message, current code will use
> pte_offset_map_lock().  pte_offset_map_lock uses pte_lockptr, and pte_lockptr
> will either be the mm wide lock or pmd_page lock.  To me, it seems that
> either would provide correct synchronization for CONT-PTE entries.  Am I
> missing something or misreading the code?
> 
> I started looking at code cleanup suggested by David.  Here is a quick
> patch (not tested and likely containing errors) to see if this is a step
> in the right direction.
> 
> I like it because we get rid of/combine all those follow_huge_p*d
> routines.
> 

Yes, see comments below.

> From 35d117a707c1567ddf350554298697d40eace0d7 Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Wed, 24 Aug 2022 15:59:15 -0700
> Subject: [PATCH] hugetlb: call hugetlb_follow_page_mask for hugetlb pages in
>  follow_page_mask
> 
> At the beginning of follow_page_mask, there currently is a call to
> follow_huge_addr which 'may' handle hugetlb pages.  ia64 is the only
> architecture which (incorrectly) provides a follow_huge_addr routine
> that does not return error.  Instead, at each level of the page table a
> check is made for a hugetlb entry.  If a hugetlb entry is found, a call
> to a routine associated with that page table level such as
> follow_huge_pmd is made.
> 
> All the follow_huge_p*d routines are basically the same.  In addition
> huge page size can be derived from the vma, so we know where in the page
> table a huge page would reside.  So, replace follow_huge_addr with a
> new architecture independent routine which will provide the same
> functionality as the follow_huge_p*d routines.  We can then eliminate
> the p*d_huge checks in follow_page_mask page table walking as well as
> the follow_huge_p*d routines themselves.>
> follow_page_mask still has is_hugepd hugetlb checks during page table
> walking.  This is due to these checks and follow_huge_pd being
> architecture specific.  These can be eliminated if
> hugetlb_follow_page_mask can be overwritten by architectures (powerpc)
> that need to do follow_huge_pd processing.

But won't the

> +	/* hugetlb is special */
> +	if (is_vm_hugetlb_page(vma))
> +		return hugetlb_follow_page_mask(vma, address, flags);

code route everything via hugetlb_follow_page_mask() and all these
(beloved) hugepd checks would essentially be unreachable?

At least my understanding is that hugepd only applies to hugetlb.

Can't we move the hugepd handling code into hugetlb_follow_page_mask()
as well?

I mean, doesn't follow_hugetlb_page() also have to handle that hugepd
stuff already ... ?

[...]

>  
> +struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> +				unsigned long address, unsigned int flags)
> +{
> +	struct hstate *h = hstate_vma(vma);
> +	struct mm_struct *mm = vma->vm_mm;
> +	unsigned long haddr = address & huge_page_mask(h);
> +	struct page *page = NULL;
> +	spinlock_t *ptl;
> +	pte_t *pte, entry;
> +
> +	/*
> +	 * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
> +	 * follow_hugetlb_page().
> +	 */
> +	if (WARN_ON_ONCE(flags & FOLL_PIN))
> +		return NULL;
> +
> +	pte = huge_pte_offset(mm, haddr, huge_page_size(h));
> +	if (!pte)
> +		return NULL;
> +
> +retry:
> +	ptl = huge_pte_lock(h, mm, pte);
> +	entry = huge_ptep_get(pte);
> +	if (pte_present(entry)) {
> +		page = pte_page(entry);
> +		/*
> +		 * try_grab_page() should always succeed here, because we hold
> +		 * the ptl lock and have verified pte_present().
> +		 */
> +		if (WARN_ON_ONCE(!try_grab_page(page, flags))) {
> +			page = NULL;
> +			goto out;
> +		}
> +	} else {
> +		if (is_hugetlb_entry_migration(entry)) {
> +			spin_unlock(ptl);
> +			__migration_entry_wait_huge(pte, ptl);
> +			goto retry;
> +		}
> +		/*
> +		 * hwpoisoned entry is treated as no_page_table in
> +		 * follow_page_mask().
> +		 */
> +	}
> +out:
> +	spin_unlock(ptl);
> +	return page;


This is neat and clean enough to not reuse follow_hugetlb_page(). I
wonder if we want to add some comment to the function how this differs
to follow_hugetlb_page().

... or do we maybe want to rename follow_hugetlb_page() to someting like
__hugetlb_get_user_pages() to make it clearer in which context it will
get called?


I guess it might be feasible in the future to eliminate
follow_hugetlb_page() and centralizing the faulting code. For now, this
certainly improves the situation.

-- 
Thanks,

David / dhildenb



  parent reply	other threads:[~2022-08-25  7:25 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-23  7:50 [PATCH v2 0/5] Fix some issues when looking up hugetlb page Baolin Wang
2022-08-23  7:50 ` [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size " Baolin Wang
2022-08-23  8:29   ` David Hildenbrand
2022-08-23 10:02     ` Baolin Wang
2022-08-23 10:23       ` David Hildenbrand
2022-08-23 23:55         ` Mike Kravetz
2022-08-24  2:06           ` Baolin Wang
2022-08-24  7:31             ` David Hildenbrand
2022-08-24  9:41               ` Baolin Wang
2022-08-24 11:55                 ` David Hildenbrand
2022-08-24 14:30                   ` Baolin Wang
2022-08-24 14:33                     ` David Hildenbrand
2022-08-24 15:06                       ` Baolin Wang
2022-08-24 15:13                         ` David Hildenbrand
2022-08-24 15:23                           ` Baolin Wang
2022-08-24 23:34                 ` Mike Kravetz
2022-08-25  1:43                   ` Baolin Wang
2022-08-25  7:10                     ` David Hildenbrand
2022-08-25  7:58                       ` Baolin Wang
2022-08-25 18:30                     ` Mike Kravetz
2022-08-25  7:25                   ` David Hildenbrand [this message]
2022-08-25 10:54                     ` Baolin Wang
2022-08-25 21:13                     ` Mike Kravetz
2022-08-26 22:40                       ` Mike Kravetz
2022-08-27 13:59                       ` Aneesh Kumar K.V
2022-08-29 18:30                         ` Mike Kravetz
2022-08-23  7:50 ` [PATCH v2 2/5] mm/hugetlb: use PTE page lock to protect CONT-PTE entries Baolin Wang
2022-08-23  7:50 ` [PATCH v2 3/5] mm/hugetlb: fix races when looking up a CONT-PMD size hugetlb page Baolin Wang
2022-08-23  7:50 ` [PATCH v2 4/5] mm/hugetlb: use PMD page lock to protect CONT-PTE entries Baolin Wang
2022-08-23  8:14   ` David Hildenbrand
2022-08-23 10:12     ` Baolin Wang
2022-08-23  7:50 ` [PATCH v2 5/5] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=887ca2e2-a7c5-93a7-46cb-185daccd4444@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).