All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miaohe Lin <linmiaohe@huawei.com>
To: Muchun Song <muchun.song@linux.dev>,
	"Yin, Fengwei" <fengwei.yin@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Linux MM <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at()
Date: Thu, 18 Aug 2022 20:58:54 +0800	[thread overview]
Message-ID: <615c8ec8-6977-2ce0-f049-d2ec1619245c@huawei.com> (raw)
In-Reply-To: <15DD6DCA-39BC-4EA2-984F-D488E94CC4FF@linux.dev>

On 2022/8/18 17:18, Muchun Song wrote:
> 
> 
>> On Aug 18, 2022, at 16:54, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 8/18/2022 4:40 PM, Muchun Song wrote:
>>>
>>>
>>>> On Aug 18, 2022, at 16:32, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 8/18/2022 3:59 PM, Muchun Song wrote:
>>>>>
>>>>>
>>>>>> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>
>>>>>> On 2022/8/18 10:47, Muchun Song wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>>>>>>> 	/*
>>>>>>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>>>>>>> 	 * the set_pte_at() write.
>>>>>>>>>>>> 	 */
>>>>>>>>>>>> 	__SetPageUptodate(page);
>>>>>>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>>>>>>> can tell us more details.
>>>>>>>>>>>
>>>>>>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>>>>>>> This is my understanding also. Thanks.
>>>>>>>>> That's also my understanding. Thanks both.
>>>>>>>> I have an unclear thing (not related with this patch directly): Who is response
>>>>>>>> for the read barrier in the read side in this case?
>>>>>>>>
>>>>>>>> For SetPageUptodate, there are paring write/read memory barrier.
>>>>>>>>
>>>>>>>
>>>>>>> I have the same question. So I think the example proposed by Miaohe is a little
>>>>>>> difference from the case (hugetlb_vmemmap) here.
>>>>>>
>>>>>> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
>>>>>> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
>>>>>> the above proposed case, if user wants to access the page contents, the corresponding pagetable
>>>>>> should be visible first or the page contents can't be accessed. So there should be a data dependency
>>>>>> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
>>>>>> Or am I miss something?
>>>>>
>>>>> Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
>>>>> the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
>>>>> loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
>>>>> access. Maybe it is hardware’s guarantee?
>>>> I just found the comment in pmd_install() explained why most arch has no read
>>>
>>> I think pmd_install() is a little different as well. We should make sure the
>>> page table walker (like GUP) see the correct PTE entry after they see the pmd
>>> entry.
>>
>> The difference I can see is that pmd/pte thing has both hardware page walker and
>> software page walker (like GUP) as read side. While the case here only has hardware
>> page walker as read side. But I suppose the memory barrier requirement still apply
>> here.
> 
> I am not against this change. Just in order to make me get a better understanding of
> hardware behavior.
> 
>>
>> Maybe we could do a test: add large delay between reset_struct_page() and set_pte_at?
> 
> Hi Miaohe,
> 
> Would you mind doing this test? One thread do vmemmap_restore_pte(), another thread
> detect if it can see a tail page with PG_head after the previous thread has executed
> set_pte_at().

Will it be easier to construct the memory reorder manually like below?

vmemmap_restore_pte()
	...
	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
	/* might a delay. */
	copy_page(to, (void *)walk->reuse_addr);
	reset_struct_pages(to);

And another thread detects whether it can see a tail page with some invalid fields? If so,
it seems the problem will always trigger? If not, we depend on the observed meory reorder
and set_pte_at doesn't contain a memory barrier?

Thanks,
Miaohe Lin


  reply	other threads:[~2022-08-18 12:59 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-16 13:05 [PATCH 0/6] A few fixup patches for hugetlb Miaohe Lin
2022-08-16 13:05 ` [PATCH 1/6] mm/hugetlb: fix incorrect update of max_huge_pages Miaohe Lin
2022-08-16 22:52   ` Mike Kravetz
2022-08-16 23:20     ` Andrew Morton
2022-08-16 23:34       ` Mike Kravetz
2022-08-17  1:53         ` Miaohe Lin
2022-08-17  2:28   ` Muchun Song
2022-08-16 13:05 ` [PATCH 2/6] mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group() Miaohe Lin
2022-08-16 22:55   ` Mike Kravetz
2022-08-17  2:31   ` Muchun Song
2022-08-17  2:39     ` Miaohe Lin
2022-08-16 13:05 ` [PATCH 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error() Miaohe Lin
2022-08-16 23:31   ` Mike Kravetz
2022-08-17  1:59     ` Miaohe Lin
2022-08-16 13:05 ` [PATCH 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at() Miaohe Lin
2022-08-17  2:53   ` Muchun Song
2022-08-17  8:41     ` Miaohe Lin
2022-08-17  9:13       ` Yin, Fengwei
2022-08-17 11:21       ` Muchun Song
2022-08-18  1:14         ` Yin, Fengwei
2022-08-18  1:55           ` Miaohe Lin
2022-08-18  2:00             ` Yin, Fengwei
2022-08-18  2:47               ` Muchun Song
2022-08-18  7:52                 ` Miaohe Lin
2022-08-18  7:59                   ` Muchun Song
2022-08-18  8:32                     ` Yin, Fengwei
2022-08-18  8:40                       ` Muchun Song
2022-08-18  8:54                         ` Yin, Fengwei
2022-08-18  9:18                           ` Muchun Song
2022-08-18 12:58                             ` Miaohe Lin [this message]
2022-08-18 23:53                               ` Yin, Fengwei
2022-08-19  3:19                               ` Muchun Song
2022-08-19  7:26                                 ` Miaohe Lin
2022-08-18  1:15   ` Yin, Fengwei
2022-08-20  8:12   ` Muchun Song
2022-08-22  8:45     ` Miaohe Lin
2022-08-22 10:23       ` Muchun Song
2022-08-23  1:42         ` Miaohe Lin
2022-08-16 13:05 ` [PATCH 5/6] mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node() Miaohe Lin
2022-08-17  9:41   ` Yin, Fengwei
2022-08-18  1:00     ` Yin, Fengwei
2022-08-18  1:12   ` Yin, Fengwei
2022-08-16 13:05 ` [PATCH 6/6] mm/hugetlb: make detecting shared pte more reliable Miaohe Lin
2022-08-17 23:56   ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=615c8ec8-6977-2ce0-f049-d2ec1619245c@huawei.com \
    --to=linmiaohe@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.