Re: [PATCH v1 2/2] mm: remove redundant smp_wmb()

From: David Hildenbrand <david@redhat.com>
To: Qi Zheng <zhengqi.arch@bytedance.com>,
	akpm@linux-foundation.org, tglx@linutronix.de,
	hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com,
	kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, songmuchun@bytedance.com
Subject: Re: [PATCH v1 2/2] mm: remove redundant smp_wmb()
Date: Tue, 31 Aug 2021 12:02:18 +0200	[thread overview]
Message-ID: <9da807d4-1fcc-72e0-dc9e-91ab9fbeb7c6@redhat.com> (raw)
In-Reply-To: <20210828042306.42886-3-zhengqi.arch@bytedance.com>

On 28.08.21 06:23, Qi Zheng wrote:
> The smp_wmb() which is in the __pte_alloc() is used to
> ensure all ptes setup is visible before the pte is made
> visible to other CPUs by being put into page tables. We
> only need this when the pte is actually populated, so
> move it to pte_install(). __pte_alloc_kernel(),
> __p4d_alloc(), __pud_alloc() and __pmd_alloc() are similar
> to this case.
> 
> We can also defer smp_wmb() to the place where the pmd entry
> is really populated by preallocated pte. There are two kinds
> of user of preallocated pte, one is filemap & finish_fault(),
> another is THP. The former does not need another smp_wmb()
> because the smp_wmb() has been done by pte_install().
> Fortunately, the latter also does not need another smp_wmb()
> because there is already a smp_wmb() before populating the
> new pte when the THP uses a preallocated pte to split a huge
> pmd.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> ---
>   mm/memory.c         | 47 ++++++++++++++++++++---------------------------
>   mm/sparse-vmemmap.c |  2 +-
>   2 files changed, 21 insertions(+), 28 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index ef7b1762e996..9c7534187454 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -439,6 +439,20 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte)
>   
>   	if (likely(pmd_none(*pmd))) {	/* Has another populated it ? */
>   		mm_inc_nr_ptes(mm);
> +		/*
> +		 * Ensure all pte setup (eg. pte page lock and page clearing) are
> +		 * visible before the pte is made visible to other CPUs by being
> +		 * put into page tables.
> +		 *
> +		 * The other side of the story is the pointer chasing in the page
> +		 * table walking code (when walking the page table without locking;
> +		 * ie. most of the time). Fortunately, these data accesses consist
> +		 * of a chain of data-dependent loads, meaning most CPUs (alpha
> +		 * being the notable exception) will already guarantee loads are
> +		 * seen in-order. See the alpha page table accessors for the
> +		 * smp_rmb() barriers in page table walking code.
> +		 */
> +		smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
>   		pmd_populate(mm, pmd, *pte);
>   		*pte = NULL;
>   	}
> @@ -451,21 +465,6 @@ int __pte_alloc(struct mm_struct *mm, pmd_t *pmd)
>   	if (!new)
>   		return -ENOMEM;
>   
> -	/*
> -	 * Ensure all pte setup (eg. pte page lock and page clearing) are
> -	 * visible before the pte is made visible to other CPUs by being
> -	 * put into page tables.
> -	 *
> -	 * The other side of the story is the pointer chasing in the page
> -	 * table walking code (when walking the page table without locking;
> -	 * ie. most of the time). Fortunately, these data accesses consist
> -	 * of a chain of data-dependent loads, meaning most CPUs (alpha
> -	 * being the notable exception) will already guarantee loads are
> -	 * seen in-order. See the alpha page table accessors for the
> -	 * smp_rmb() barriers in page table walking code.
> -	 */
> -	smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
> -
>   	pmd_install(mm, pmd, &new);
>   	if (new)
>   		pte_free(mm, new);
> @@ -478,10 +477,9 @@ int __pte_alloc_kernel(pmd_t *pmd)
>   	if (!new)
>   		return -ENOMEM;
>   
> -	smp_wmb(); /* See comment in __pte_alloc */
> -
>   	spin_lock(&init_mm.page_table_lock);
>   	if (likely(pmd_none(*pmd))) {	/* Has another populated it ? */
> +		smp_wmb(); /* See comment in pmd_install() */
>   		pmd_populate_kernel(&init_mm, pmd, new);
>   		new = NULL;
>   	}
> @@ -3857,7 +3855,6 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
>   		vmf->prealloc_pte = pte_alloc_one(vma->vm_mm);
>   		if (!vmf->prealloc_pte)
>   			return VM_FAULT_OOM;
> -		smp_wmb(); /* See comment in __pte_alloc() */
>   	}
>   
>   	ret = vma->vm_ops->fault(vmf);
> @@ -3919,7 +3916,6 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>   		vmf->prealloc_pte = pte_alloc_one(vma->vm_mm);
>   		if (!vmf->prealloc_pte)
>   			return VM_FAULT_OOM;
> -		smp_wmb(); /* See comment in __pte_alloc() */
>   	}
>   
>   	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
> @@ -4144,7 +4140,6 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
>   		vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm);
>   		if (!vmf->prealloc_pte)
>   			return VM_FAULT_OOM;
> -		smp_wmb(); /* See comment in __pte_alloc() */
>   	}
>   
>   	return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff);
> @@ -4819,13 +4814,13 @@ int __p4d_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
>   	if (!new)
>   		return -ENOMEM;
>   
> -	smp_wmb(); /* See comment in __pte_alloc */
> -
>   	spin_lock(&mm->page_table_lock);
>   	if (pgd_present(*pgd))		/* Another has populated it */
>   		p4d_free(mm, new);
> -	else
> +	else {
> +		smp_wmb(); /* See comment in pmd_install() */
>   		pgd_populate(mm, pgd, new);
> +	}

Nit:

if () {

} else {

}

see Documentation/process/coding-style.rst

"This does not apply if only one branch of a conditional statement is a 
single statement; in the latter case use braces in both branches:"

Apart from that, I think this is fine,

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb