linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast()
       [not found] <0-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
@ 2020-10-30 14:46 ` Jason Gunthorpe
  2020-10-30 16:29   ` Jan Kara
                     ` (2 more replies)
  2020-11-02 22:19 ` [PATCH v2 0/2] Add a seqcount between gup_fast and copy_page_range() Ahmed S. Darwish
       [not found] ` <2-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
  2 siblings, 3 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-10-30 14:46 UTC (permalink / raw)
  To: linux-kernel, Peter Xu, Linus Torvalds
  Cc: Andrea Arcangeli, Andrew Morton, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Linux-MM, Michal Hocko, Oleg Nesterov

The next patch in this series makes the lockless flow a little more
complex, so move the entire block into a new function and remove a level
of indention. Tidy a bit of cruft:

 - addr is always the same as start, so use start

 - Use the modern check_add_overflow() for computing end = start + len

 - nr_pinned/pages << PAGE_SHIFT needs the LHS to be unsigned long to
   avoid shift overflow, make the variables unsigned long to avoid coding
   casts in both places. nr_pinned was missing its cast

 - The handling of ret and nr_pinned can be streamlined a bit

No functional change.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 mm/gup.c | 99 ++++++++++++++++++++++++++++++--------------------------
 1 file changed, 54 insertions(+), 45 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 102877ed77a4b4..150cc962c99201 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2671,13 +2671,43 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
 	return ret;
 }
 
-static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
+static unsigned long lockless_pages_from_mm(unsigned long start,
+					    unsigned long end,
+					    unsigned int gup_flags,
+					    struct page **pages)
+{
+	unsigned long flags;
+	int nr_pinned = 0;
+
+	if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) ||
+	    !gup_fast_permitted(start, end))
+		return 0;
+
+	/*
+	 * Disable interrupts. The nested form is used, in order to allow full,
+	 * general purpose use of this routine.
+	 *
+	 * With interrupts disabled, we block page table pages from being freed
+	 * from under us. See struct mmu_table_batch comments in
+	 * include/asm-generic/tlb.h for more details.
+	 *
+	 * We do not adopt an rcu_read_lock() here as we also want to block IPIs
+	 * that come from THPs splitting.
+	 */
+	local_irq_save(flags);
+	gup_pgd_range(start, end, gup_flags, pages, &nr_pinned);
+	local_irq_restore(flags);
+	return nr_pinned;
+}
+
+static int internal_get_user_pages_fast(unsigned long start,
+					unsigned long nr_pages,
 					unsigned int gup_flags,
 					struct page **pages)
 {
-	unsigned long addr, len, end;
-	unsigned long flags;
-	int nr_pinned = 0, ret = 0;
+	unsigned long len, end;
+	unsigned long nr_pinned;
+	int ret;
 
 	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
 				       FOLL_FORCE | FOLL_PIN | FOLL_GET |
@@ -2691,54 +2721,33 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
 		might_lock_read(&current->mm->mmap_lock);
 
 	start = untagged_addr(start) & PAGE_MASK;
-	addr = start;
-	len = (unsigned long) nr_pages << PAGE_SHIFT;
-	end = start + len;
-
-	if (end <= start)
+	len = nr_pages << PAGE_SHIFT;
+	if (check_add_overflow(start, len, &end))
 		return 0;
 	if (unlikely(!access_ok((void __user *)start, len)))
 		return -EFAULT;
 
-	/*
-	 * Disable interrupts. The nested form is used, in order to allow
-	 * full, general purpose use of this routine.
-	 *
-	 * With interrupts disabled, we block page table pages from being
-	 * freed from under us. See struct mmu_table_batch comments in
-	 * include/asm-generic/tlb.h for more details.
-	 *
-	 * We do not adopt an rcu_read_lock(.) here as we also want to
-	 * block IPIs that come from THPs splitting.
-	 */
-	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) {
-		unsigned long fast_flags = gup_flags;
-
-		local_irq_save(flags);
-		gup_pgd_range(addr, end, fast_flags, pages, &nr_pinned);
-		local_irq_restore(flags);
-		ret = nr_pinned;
-	}
+	nr_pinned = lockless_pages_from_mm(start, end, gup_flags, pages);
+	if (nr_pinned == nr_pages || gup_flags & FOLL_FAST_ONLY)
+		return nr_pinned;
 
-	if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) {
-		/* Try to get the remaining pages with get_user_pages */
-		start += nr_pinned << PAGE_SHIFT;
-		pages += nr_pinned;
-
-		ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
-					      gup_flags, pages);
-
-		/* Have to be a bit careful with return values */
-		if (nr_pinned > 0) {
-			if (ret < 0)
-				ret = nr_pinned;
-			else
-				ret += nr_pinned;
-		}
+	/* Slow path: try to get the remaining pages with get_user_pages */
+	start += nr_pinned << PAGE_SHIFT;
+	pages += nr_pinned;
+	ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
+				      pages);
+	if (ret < 0) {
+		/*
+		 * The caller has to unpin the pages we already pinned so
+		 * returning -errno is not an option
+		 */
+		if (nr_pinned)
+			return nr_pinned;
+		return ret;
 	}
-
-	return ret;
+	return ret + nr_pinned;
 }
+
 /**
  * get_user_pages_fast_only() - pin user pages in memory
  * @start:      starting user address
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast()
  2020-10-30 14:46 ` [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast() Jason Gunthorpe
@ 2020-10-30 16:29   ` Jan Kara
  2020-10-30 21:31   ` John Hubbard
  2020-10-30 22:36   ` Peter Xu
  2 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2020-10-30 16:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, Peter Xu, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Christoph Hellwig, Hugh Dickins, Jan Kara,
	Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai, Linux-MM,
	Michal Hocko, Oleg Nesterov

On Fri 30-10-20 11:46:20, Jason Gunthorpe wrote:
> The next patch in this series makes the lockless flow a little more
> complex, so move the entire block into a new function and remove a level
> of indention. Tidy a bit of cruft:
> 
>  - addr is always the same as start, so use start
> 
>  - Use the modern check_add_overflow() for computing end = start + len
> 
>  - nr_pinned/pages << PAGE_SHIFT needs the LHS to be unsigned long to
>    avoid shift overflow, make the variables unsigned long to avoid coding
>    casts in both places. nr_pinned was missing its cast
> 
>  - The handling of ret and nr_pinned can be streamlined a bit
> 
> No functional change.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/gup.c | 99 ++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 54 insertions(+), 45 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 102877ed77a4b4..150cc962c99201 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2671,13 +2671,43 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
>  	return ret;
>  }
>  
> -static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
> +static unsigned long lockless_pages_from_mm(unsigned long start,
> +					    unsigned long end,
> +					    unsigned int gup_flags,
> +					    struct page **pages)
> +{
> +	unsigned long flags;
> +	int nr_pinned = 0;
> +
> +	if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) ||
> +	    !gup_fast_permitted(start, end))
> +		return 0;
> +
> +	/*
> +	 * Disable interrupts. The nested form is used, in order to allow full,
> +	 * general purpose use of this routine.
> +	 *
> +	 * With interrupts disabled, we block page table pages from being freed
> +	 * from under us. See struct mmu_table_batch comments in
> +	 * include/asm-generic/tlb.h for more details.
> +	 *
> +	 * We do not adopt an rcu_read_lock() here as we also want to block IPIs
> +	 * that come from THPs splitting.
> +	 */
> +	local_irq_save(flags);
> +	gup_pgd_range(start, end, gup_flags, pages, &nr_pinned);
> +	local_irq_restore(flags);
> +	return nr_pinned;
> +}
> +
> +static int internal_get_user_pages_fast(unsigned long start,
> +					unsigned long nr_pages,
>  					unsigned int gup_flags,
>  					struct page **pages)
>  {
> -	unsigned long addr, len, end;
> -	unsigned long flags;
> -	int nr_pinned = 0, ret = 0;
> +	unsigned long len, end;
> +	unsigned long nr_pinned;
> +	int ret;
>  
>  	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
>  				       FOLL_FORCE | FOLL_PIN | FOLL_GET |
> @@ -2691,54 +2721,33 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
>  		might_lock_read(&current->mm->mmap_lock);
>  
>  	start = untagged_addr(start) & PAGE_MASK;
> -	addr = start;
> -	len = (unsigned long) nr_pages << PAGE_SHIFT;
> -	end = start + len;
> -
> -	if (end <= start)
> +	len = nr_pages << PAGE_SHIFT;
> +	if (check_add_overflow(start, len, &end))
>  		return 0;
>  	if (unlikely(!access_ok((void __user *)start, len)))
>  		return -EFAULT;
>  
> -	/*
> -	 * Disable interrupts. The nested form is used, in order to allow
> -	 * full, general purpose use of this routine.
> -	 *
> -	 * With interrupts disabled, we block page table pages from being
> -	 * freed from under us. See struct mmu_table_batch comments in
> -	 * include/asm-generic/tlb.h for more details.
> -	 *
> -	 * We do not adopt an rcu_read_lock(.) here as we also want to
> -	 * block IPIs that come from THPs splitting.
> -	 */
> -	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) {
> -		unsigned long fast_flags = gup_flags;
> -
> -		local_irq_save(flags);
> -		gup_pgd_range(addr, end, fast_flags, pages, &nr_pinned);
> -		local_irq_restore(flags);
> -		ret = nr_pinned;
> -	}
> +	nr_pinned = lockless_pages_from_mm(start, end, gup_flags, pages);
> +	if (nr_pinned == nr_pages || gup_flags & FOLL_FAST_ONLY)
> +		return nr_pinned;
>  
> -	if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) {
> -		/* Try to get the remaining pages with get_user_pages */
> -		start += nr_pinned << PAGE_SHIFT;
> -		pages += nr_pinned;
> -
> -		ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
> -					      gup_flags, pages);
> -
> -		/* Have to be a bit careful with return values */
> -		if (nr_pinned > 0) {
> -			if (ret < 0)
> -				ret = nr_pinned;
> -			else
> -				ret += nr_pinned;
> -		}
> +	/* Slow path: try to get the remaining pages with get_user_pages */
> +	start += nr_pinned << PAGE_SHIFT;
> +	pages += nr_pinned;
> +	ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
> +				      pages);
> +	if (ret < 0) {
> +		/*
> +		 * The caller has to unpin the pages we already pinned so
> +		 * returning -errno is not an option
> +		 */
> +		if (nr_pinned)
> +			return nr_pinned;
> +		return ret;
>  	}
> -
> -	return ret;
> +	return ret + nr_pinned;
>  }
> +
>  /**
>   * get_user_pages_fast_only() - pin user pages in memory
>   * @start:      starting user address
> -- 
> 2.28.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
       [not found] ` <2-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
@ 2020-10-30 16:51   ` Jan Kara
       [not found]     ` <20201030170226.GF2620339@nvidia.com>
  2020-10-30 22:52   ` Peter Xu
  2020-11-02 23:58   ` Ahmed S. Darwish
  2 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2020-10-30 16:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, Peter Xu, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov

On Fri 30-10-20 11:46:21, Jason Gunthorpe wrote:
> Since commit 70e806e4e645 ("mm: Do early cow for pinned pages during
> fork() for ptes") pages under a FOLL_PIN will not be write protected
> during COW for fork. This means that pages returned from
> pin_user_pages(FOLL_WRITE) should not become write protected while the pin
> is active.
> 
> However, there is a small race where get_user_pages_fast(FOLL_PIN) can
> establish a FOLL_PIN at the same time copy_present_page() is write
> protecting it:
> 
>         CPU 0                             CPU 1
>    get_user_pages_fast()
>     internal_get_user_pages_fast()
>                                        copy_page_range()
>                                          pte_alloc_map_lock()
>                                            copy_present_page()
>                                              atomic_read(has_pinned) == 0
> 					     page_maybe_dma_pinned() == false
>      atomic_set(has_pinned, 1);
>      gup_pgd_range()
>       gup_pte_range()
>        pte_t pte = gup_get_pte(ptep)
>        pte_access_permitted(pte)
>        try_grab_compound_head()
>                                              pte = pte_wrprotect(pte)
> 	                                     set_pte_at();
>                                          pte_unmap_unlock()
>       // GUP now returns with a write protected page
> 
> The first attempt to resolve this by using the write protect caused
> problems (and was missing a barrrier), see commit f3c64eda3e50 ("mm: avoid
> early COW write protect games during fork()")
> 
> Instead wrap copy_p4d_range() with the write side of a seqcount and check
> the read side around gup_pgd_range(). If there is a collision then
> get_user_pages_fast() fails and falls back to slow GUP.
> 
> Slow GUP is safe against this race because copy_page_range() is only
> called while holding the exclusive side of the mmap_lock on the src
> mm_struct.
> 
> Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()")
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Looks good to me. Just one nit below. With that fixed feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

> @@ -446,6 +447,12 @@ struct mm_struct {
>  		 */
>  		atomic_t has_pinned;
>  
> +		/**
> +		 * @write_protect_seq: Odd when any thread is write protecting
> +		 * pages in this mm, for instance during fork().
> +		 */
> +		seqcount_t write_protect_seq;
> +

So this comment isn't quite true. We can be writeprotecting pages due to
many other reasons and not touch write_protect_seq. E.g. for shared
mappings or due to explicit mprotect() calls. So the write_protect_seq
protection has to be about something more than pure write protection. One
requirement certainly is that the VMA has to be is_cow_mapping(). What
about mprotect(2) calls? I guess the application would have only itself to
blame so we don't care?

Anyway my point is just that the comment should tell more what this is
about. I'd even go as far as making it "page table copying during fork in
progress".

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast()
  2020-10-30 14:46 ` [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast() Jason Gunthorpe
  2020-10-30 16:29   ` Jan Kara
@ 2020-10-30 21:31   ` John Hubbard
  2020-10-30 22:36   ` Peter Xu
  2 siblings, 0 replies; 28+ messages in thread
From: John Hubbard @ 2020-10-30 21:31 UTC (permalink / raw)
  To: Jason Gunthorpe, linux-kernel, Peter Xu, Linus Torvalds
  Cc: Andrea Arcangeli, Andrew Morton, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, Kirill Shutemov, Kirill Tkhai, Linux-MM,
	Michal Hocko, Oleg Nesterov

On 10/30/20 7:46 AM, Jason Gunthorpe wrote:
> The next patch in this series makes the lockless flow a little more
> complex, so move the entire block into a new function and remove a level
> of indention. Tidy a bit of cruft:
> 
>   - addr is always the same as start, so use start
> 
>   - Use the modern check_add_overflow() for computing end = start + len
> 
>   - nr_pinned/pages << PAGE_SHIFT needs the LHS to be unsigned long to
>     avoid shift overflow, make the variables unsigned long to avoid coding
>     casts in both places. nr_pinned was missing its cast
> 
>   - The handling of ret and nr_pinned can be streamlined a bit
> 
> No functional change.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   mm/gup.c | 99 ++++++++++++++++++++++++++++++--------------------------
>   1 file changed, 54 insertions(+), 45 deletions(-)

Everything still looks correct.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA


> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 102877ed77a4b4..150cc962c99201 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2671,13 +2671,43 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
>   	return ret;
>   }
>   
> -static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
> +static unsigned long lockless_pages_from_mm(unsigned long start,
> +					    unsigned long end,
> +					    unsigned int gup_flags,
> +					    struct page **pages)
> +{
> +	unsigned long flags;
> +	int nr_pinned = 0;
> +
> +	if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) ||
> +	    !gup_fast_permitted(start, end))
> +		return 0;
> +
> +	/*
> +	 * Disable interrupts. The nested form is used, in order to allow full,
> +	 * general purpose use of this routine.
> +	 *
> +	 * With interrupts disabled, we block page table pages from being freed
> +	 * from under us. See struct mmu_table_batch comments in
> +	 * include/asm-generic/tlb.h for more details.
> +	 *
> +	 * We do not adopt an rcu_read_lock() here as we also want to block IPIs
> +	 * that come from THPs splitting.
> +	 */
> +	local_irq_save(flags);
> +	gup_pgd_range(start, end, gup_flags, pages, &nr_pinned);
> +	local_irq_restore(flags);
> +	return nr_pinned;
> +}
> +
> +static int internal_get_user_pages_fast(unsigned long start,
> +					unsigned long nr_pages,
>   					unsigned int gup_flags,
>   					struct page **pages)
>   {
> -	unsigned long addr, len, end;
> -	unsigned long flags;
> -	int nr_pinned = 0, ret = 0;
> +	unsigned long len, end;
> +	unsigned long nr_pinned;
> +	int ret;
>   
>   	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
>   				       FOLL_FORCE | FOLL_PIN | FOLL_GET |
> @@ -2691,54 +2721,33 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
>   		might_lock_read(&current->mm->mmap_lock);
>   
>   	start = untagged_addr(start) & PAGE_MASK;
> -	addr = start;
> -	len = (unsigned long) nr_pages << PAGE_SHIFT;
> -	end = start + len;
> -
> -	if (end <= start)
> +	len = nr_pages << PAGE_SHIFT;
> +	if (check_add_overflow(start, len, &end))
>   		return 0;
>   	if (unlikely(!access_ok((void __user *)start, len)))
>   		return -EFAULT;
>   
> -	/*
> -	 * Disable interrupts. The nested form is used, in order to allow
> -	 * full, general purpose use of this routine.
> -	 *
> -	 * With interrupts disabled, we block page table pages from being
> -	 * freed from under us. See struct mmu_table_batch comments in
> -	 * include/asm-generic/tlb.h for more details.
> -	 *
> -	 * We do not adopt an rcu_read_lock(.) here as we also want to
> -	 * block IPIs that come from THPs splitting.
> -	 */
> -	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) {
> -		unsigned long fast_flags = gup_flags;
> -
> -		local_irq_save(flags);
> -		gup_pgd_range(addr, end, fast_flags, pages, &nr_pinned);
> -		local_irq_restore(flags);
> -		ret = nr_pinned;
> -	}
> +	nr_pinned = lockless_pages_from_mm(start, end, gup_flags, pages);
> +	if (nr_pinned == nr_pages || gup_flags & FOLL_FAST_ONLY)
> +		return nr_pinned;
>   
> -	if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) {
> -		/* Try to get the remaining pages with get_user_pages */
> -		start += nr_pinned << PAGE_SHIFT;
> -		pages += nr_pinned;
> -
> -		ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
> -					      gup_flags, pages);
> -
> -		/* Have to be a bit careful with return values */
> -		if (nr_pinned > 0) {
> -			if (ret < 0)
> -				ret = nr_pinned;
> -			else
> -				ret += nr_pinned;
> -		}
> +	/* Slow path: try to get the remaining pages with get_user_pages */
> +	start += nr_pinned << PAGE_SHIFT;
> +	pages += nr_pinned;
> +	ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
> +				      pages);
> +	if (ret < 0) {
> +		/*
> +		 * The caller has to unpin the pages we already pinned so
> +		 * returning -errno is not an option
> +		 */
> +		if (nr_pinned)
> +			return nr_pinned;
> +		return ret;
>   	}
> -
> -	return ret;
> +	return ret + nr_pinned;
>   }
> +
>   /**
>    * get_user_pages_fast_only() - pin user pages in memory
>    * @start:      starting user address
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast()
  2020-10-30 14:46 ` [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast() Jason Gunthorpe
  2020-10-30 16:29   ` Jan Kara
  2020-10-30 21:31   ` John Hubbard
@ 2020-10-30 22:36   ` Peter Xu
  2 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2020-10-30 22:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, Linus Torvalds, Andrea Arcangeli, Andrew Morton,
	Christoph Hellwig, Hugh Dickins, Jan Kara, Jann Horn,
	John Hubbard, Kirill Shutemov, Kirill Tkhai, Linux-MM,
	Michal Hocko, Oleg Nesterov

On Fri, Oct 30, 2020 at 11:46:20AM -0300, Jason Gunthorpe wrote:
> The next patch in this series makes the lockless flow a little more
> complex, so move the entire block into a new function and remove a level
> of indention. Tidy a bit of cruft:
> 
>  - addr is always the same as start, so use start
> 
>  - Use the modern check_add_overflow() for computing end = start + len
> 
>  - nr_pinned/pages << PAGE_SHIFT needs the LHS to be unsigned long to
>    avoid shift overflow, make the variables unsigned long to avoid coding
>    casts in both places. nr_pinned was missing its cast
> 
>  - The handling of ret and nr_pinned can be streamlined a bit
> 
> No functional change.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Sorry for a very late reply (due to other distractions):

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
       [not found] ` <2-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
  2020-10-30 16:51   ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Jan Kara
@ 2020-10-30 22:52   ` Peter Xu
       [not found]     ` <20201030235121.GQ2620339@nvidia.com>
  2020-11-02 23:58   ` Ahmed S. Darwish
  2 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2020-10-30 22:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, Linus Torvalds, Andrea Arcangeli, Andrew Morton,
	Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins, Jan Kara,
	Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov

Hi, Jason,

I think majorly the patch looks good to me, but I have a few pure questions
majorly not directly related to the patch itself, but around the contexts.
Since I _feel_ like there'll be a new version to update the comments below,
maybe I can still ask aloud... Please bare with me. :)

On Fri, Oct 30, 2020 at 11:46:21AM -0300, Jason Gunthorpe wrote:
> Slow GUP is safe against this race because copy_page_range() is only
> called while holding the exclusive side of the mmap_lock on the src
> mm_struct.

Pure question: I understand that this patch requires this, but... Could anyone
remind me why read lock of mmap_sem is not enough for fork() before this one?

> 
> Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()")
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  arch/x86/kernel/tboot.c    |  1 +
>  drivers/firmware/efi/efi.c |  1 +
>  include/linux/mm_types.h   |  7 +++++++
>  kernel/fork.c              |  1 +
>  mm/gup.c                   | 19 +++++++++++++++++++
>  mm/init-mm.c               |  1 +
>  mm/memory.c                | 10 +++++++++-
>  7 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
> index 992fb1415c0f1f..6a2f542d9588a4 100644
> --- a/arch/x86/kernel/tboot.c
> +++ b/arch/x86/kernel/tboot.c
> @@ -93,6 +93,7 @@ static struct mm_struct tboot_mm = {
>  	.pgd            = swapper_pg_dir,
>  	.mm_users       = ATOMIC_INIT(2),
>  	.mm_count       = ATOMIC_INIT(1),
> +	.write_protect_seq = SEQCNT_ZERO(tboot_mm.write_protect_seq),
>  	MMAP_LOCK_INITIALIZER(init_mm)
>  	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
>  	.mmlist         = LIST_HEAD_INIT(init_mm.mmlist),
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 5e5480a0a32d7d..2520f6e05f4d44 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -57,6 +57,7 @@ struct mm_struct efi_mm = {
>  	.mm_rb			= RB_ROOT,
>  	.mm_users		= ATOMIC_INIT(2),
>  	.mm_count		= ATOMIC_INIT(1),
> +	.write_protect_seq      = SEQCNT_ZERO(efi_mm.write_protect_seq),
>  	MMAP_LOCK_INITIALIZER(efi_mm)
>  	.page_table_lock	= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
>  	.mmlist			= LIST_HEAD_INIT(efi_mm.mmlist),

Another pure question: I'm just curious how you find all the statically
definied mm_structs, and to make sure all of them are covered (just in case
un-initialized seqcount could fail strangely).

Actually I'm thinking whether we should have one place to keep all the init
vars for all the statically definied mm_structs, so we don't need to find them
everytime, but only change that one place.

> diff --git a/mm/memory.c b/mm/memory.c
> index c48f8df6e50268..294c2c3c4fe00d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1171,6 +1171,12 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
>  		mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
>  					0, src_vma, src_mm, addr, end);
>  		mmu_notifier_invalidate_range_start(&range);
> +		/*
> +		 * The read side doesn't spin, it goes to the mmap_lock, so the
> +		 * raw version is used to avoid disabling preemption here
> +		 */
> +		mmap_assert_write_locked(src_mm);
> +		raw_write_seqcount_t_begin(&src_mm->write_protect_seq);

Would raw_write_seqcount_begin() be better here?

My understanding is that we used raw_write_seqcount_t_begin() because we're
with spin lock so assuming we disabled preemption already.  However I'm
thinking whether raw_write_seqcount_begin() would be even better to guarantee
that.  I have no idea of how the rt kernel merging topic, but if rt kernel
merged into mainline then IIUC preemption is allowed here (since pgtable spin
lock should be rt_spin_lock, not raw spin locks).

An even further pure question on __seqcount_preemptible() (feel free to ignore
this question!): I saw that __seqcount_preemptible() seems to have been
constantly defined as "return false".  Not sure what happened there..

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
       [not found]     ` <20201030235121.GQ2620339@nvidia.com>
@ 2020-10-31 15:26       ` Peter Xu
  2020-11-03  0:33         ` Ahmed S. Darwish
  2020-11-03  0:17       ` Ahmed S. Darwish
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Xu @ 2020-10-31 15:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ahmed S. Darwish, linux-kernel, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov

On Fri, Oct 30, 2020 at 08:51:21PM -0300, Jason Gunthorpe wrote:
> > Another pure question: I'm just curious how you find all the statically
> > definied mm_structs, and to make sure all of them are covered (just in case
> > un-initialized seqcount could fail strangely).
> 
> I searched for all MMAP_LOCK_INITIALIZER() places and assumed that
> Michel got them all when he added it :)

Hmm, I should have noticed that before I ask.. :)

> 
> > Actually I'm thinking whether we should have one place to keep all the init
> > vars for all the statically definied mm_structs, so we don't need to find them
> > everytime, but only change that one place.
> 
> I was thinking that as well, most of the places are all the same

Yes, we can work on top.

> 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index c48f8df6e50268..294c2c3c4fe00d 100644
> > > +++ b/mm/memory.c
> > > @@ -1171,6 +1171,12 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
> > >  		mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
> > >  					0, src_vma, src_mm, addr, end);
> > >  		mmu_notifier_invalidate_range_start(&range);
> > > +		/*
> > > +		 * The read side doesn't spin, it goes to the mmap_lock, so the
> > > +		 * raw version is used to avoid disabling preemption here
> > > +		 */
> > > +		mmap_assert_write_locked(src_mm);
> > > +		raw_write_seqcount_t_begin(&src_mm->write_protect_seq);
> > 
> > Would raw_write_seqcount_begin() be better here?
> 
> Hum..
> 
> I felt no because it had the preempt stuff added into it, however it
> would work - __seqcount_lock_preemptible() == false for the seqcount_t
> case (see below)
> 
> Looking more closely, maybe the right API to pick is
> write_seqcount_t_begin() and write_seqcount_t_end() ??
> 
> However, no idea what the intention of the '*_seqcount_t_*' family is
> - it only seems to be used to implement the seqlock..
> 
> Lets add Amhed, perhaps he can give some guidance (see next section)?

IMHO we shouldn't directly use these helpers since they seem to only be used by
lock-associated versions of seqcount types.  But yeah, Amhed would be the best
one to answer...

> 
> > My understanding is that we used raw_write_seqcount_t_begin() because we're
> > with spin lock so assuming we disabled preemption already.
> 
> Here we rely on the exclusive mmap_lock, not a spinlock. This ensures
> only one write side is running concurrently as required by seqcount.

So imho here we have these things to consider during one thread updating the
seqcount_t:

  0. Concurrent read is perfectly welcomed, for sure.

  1. Concurrent writes on seqcount_t: mm sem protects it.

  2. Preempted write (if possible, maybe on RT?): I think it's also protected
     by mm sem, so looks ok too to me.

  3. Preempted/interrupted read on seqcount_t.  Seems to be the one discussed
     below.  Looks safe to me now with below explanation.  However...

> 
> The concern about preemption disable is that it wasn't held for fork()
> before, and we don't need it.. I understand preemption disable regions
> must be short or the RT people will not be happy, holding one across
> all of copy_page_range() sounds bad.
> 
> Ahmed explained in commit 8117ab508f the reason the seqcount_t write
> side has preemption disabled is because it can livelock RT kernels if
> the read side is spinning after preempting the write side. eg look at
> how __read_seqcount_begin() is implemented:
> 
> 	while ((seq = __seqcount_sequence(s)) & 1)			\
> 		cpu_relax();						\
> 
> However, in this patch, we don't spin on the read side.

... Shall we document this explicitly (if this patch still needs a repost)?
Seems not straightforward since that seems not the usual way to use seqcount,
not sure whether I'm the only one that feels this way, though.

> 
> If the read side collides with a writer it immediately goes to the
> mmap_lock, which is sleeping, and so it will sort itself out properly,
> even if it was preempted.
> 
> > An even further pure question on __seqcount_preemptible() (feel free to ignore
> > this question!): I saw that __seqcount_preemptible() seems to have been
> > constantly defined as "return false".  Not sure what happened there..
> 
> The new code has a range of seqcount_t types see
> Documentation/locking/seqlock.rst 'Sequence counters with associated
> locks'
> 
> It uses _Generic to do a bit of meta-programming and creates a compile
> time table of lock properties:
> 
> SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t,  false,    s->lock,        raw_spin, raw_spin_lock(s->lock))
> SEQCOUNT_LOCKNAME(spinlock,     spinlock_t,      __SEQ_RT, s->lock,        spin,     spin_lock(s->lock))
> SEQCOUNT_LOCKNAME(rwlock,       rwlock_t,        __SEQ_RT, s->lock,        read,     read_lock(s->lock))
> SEQCOUNT_LOCKNAME(mutex,        struct mutex,    true,     s->lock,        mutex,    mutex_lock(s->lock))
> SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mutex, ww_mutex_lock(s->lock, NULL))
> 
> As well as as default set of properties for normal seqcount_t. The
> __seqcount_preemptible() is selected by the _Generic for seqcount_t:
> 
> #define __seqprop(s, prop) _Generic(*(s),				\
> 	seqcount_t:		__seqprop_##prop((void *)(s)),		\
> 
> And it says preemption must be disabled before using the lock:
> 
> static inline void __seqprop_assert(const seqcount_t *s)
> {
> 	lockdep_assert_preemption_disabled();
> }
> 
> And thus no need to have an automatic disable preemption:
> 
> static inline bool __seqprop_preemptible(const seqcount_t *s)
> {
> 	return false;
> }
> 
> Other lock subtypes are different, eg the codegen for mutex will use
> lockdep_assert_held(s->lock) for _assert and true for _preemptible()
> 
> So if we map the 'write begin' entry points:
> 
>  write_seqcount_begin - Enforces preemption off
>  raw_write_seqcount_begin - Auto disable preemption if required (false)
>  raw_write_seqcount_t_begin - No preemption stuff
>  write_seqcount_t_begin - No preemption stuff

Thanks for listing these details.

As a summary, I think I'm convinced maybe we can have this work without disable
preemtion.  It's just that some more comment might be even better.

The other thing is, considering this use of seqcount seems to be quite special
as explained below, I'm just not sure whether this would confuse lockdep or
kcsan, etc., if we decide to use write_seqcount_t_begin().

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
       [not found]     ` <20201030170226.GF2620339@nvidia.com>
@ 2020-11-02  8:31       ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2020-11-02  8:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jan Kara, linux-kernel, Peter Xu, Linus Torvalds,
	Andrea Arcangeli, Andrew Morton, Aneesh Kumar K.V,
	Christoph Hellwig, Hugh Dickins, Jann Horn, John Hubbard,
	Kirill Shutemov, Kirill Tkhai, Leon Romanovsky, Linux-MM,
	Michal Hocko, Oleg Nesterov

On Fri 30-10-20 14:02:26, Jason Gunthorpe wrote:
> On Fri, Oct 30, 2020 at 05:51:05PM +0100, Jan Kara wrote:
> > > @@ -446,6 +447,12 @@ struct mm_struct {
> > >  		 */
> > >  		atomic_t has_pinned;
> > >  
> > > +		/**
> > > +		 * @write_protect_seq: Odd when any thread is write protecting
> > > +		 * pages in this mm, for instance during fork().
> > > +		 */
> > > +		seqcount_t write_protect_seq;
> > > +
> > 
> > So this comment isn't quite true. We can be writeprotecting pages due to
> > many other reasons and not touch write_protect_seq. E.g. for shared
> > mappings or due to explicit mprotect() calls. So the write_protect_seq
> > protection has to be about something more than pure write protection. One
> > requirement certainly is that the VMA has to be is_cow_mapping(). What
> > about mprotect(2) calls? I guess the application would have only itself to
> > blame so we don't care?
> 
> Yes, that sounds right, How about
> 
> /**
>  * @write_protect_seq: Locked when any thread is write protecting
>  * pages for COW in this mm, for instance during page table copying
           ^^^ maybe I'd write a bit more explicitly "... write protecting
pages mapped by this mm to enforce later COW, ..."

>  * for fork().
>  */
> 
> mprotect and shared mappings cause faults on write access not COW?

Correct.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/2] Add a seqcount between gup_fast and copy_page_range()
       [not found] <0-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
  2020-10-30 14:46 ` [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast() Jason Gunthorpe
@ 2020-11-02 22:19 ` Ahmed S. Darwish
  2020-11-02 22:39   ` Linus Torvalds
       [not found] ` <2-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
  2 siblings, 1 reply; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-02 22:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, Peter Xu, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Sebastian Siewior

Hello Jason,

Thanks for keeping me in the loop on this.

I've also added the locking maintainers in Cc. IMHO there are some
seqlock.h API violations in this series, and they should have the final
say on this.

On Fri, Oct 30, 2020 at 11:46:19AM -0300, Jason Gunthorpe wrote:
>
> As discussed and suggested by Linus use a seqcount to close the small race
> between gup_fast and copy_page_range().
>
> Unfortunately the good suggestion to just use write_seqcount_begin() blows
> up lockdep immediately due to the (new?) requirement that the write side
> of seqcount be in a preempt disabled region.

Disabling preemption for seqcount_t write-side critical sections was
never a new requirement. It has always been this way, for the reasons
explained at Documentation/locking/seqlock.rst, "Introduction" section.

The recent seqcount_t changes did not mandate any new rules. This was
done explicitly, and on-purpose, not to break any of the *large* set of
existing seqcount_t call sites. It added multiple lockdep asserts
though, to catch a number of (already) buggy users, and they were fixed
beforehand.

It seems you have a special case here, so I'll continue discussing this
at patch #2 where the code resides. Just wanted to answer the "(new?)"
part above.

>                                               For this application it does
> not seem like a good idea, nor is it necessary as we don't spin on retry.
> This is solved by being the first place to use raw_write_seqcount_t_begin()
>

Regardless of this series write side preemptibility requirements, the
"_write_seqcount_*t*_()" interfaces are internal to seqlock.h and should
_never_ be used outside of it.

For plain seqcount_t, raw_write_seqcount_begin() is equivalent to
raw_write_seqcount_*t*_begin() anyway, and should already satisfy your
needs.

/me jumps to patch #2 now...

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/2] Add a seqcount between gup_fast and copy_page_range()
  2020-11-02 22:19 ` [PATCH v2 0/2] Add a seqcount between gup_fast and copy_page_range() Ahmed S. Darwish
@ 2020-11-02 22:39   ` Linus Torvalds
  2020-11-02 23:18     ` Ahmed S. Darwish
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2020-11-02 22:39 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Jason Gunthorpe, Linux Kernel Mailing List, Peter Xu,
	Andrea Arcangeli, Andrew Morton, Aneesh Kumar K.V,
	Christoph Hellwig, Hugh Dickins, Jan Kara, Jann Horn,
	John Hubbard, Kirill Shutemov, Kirill Tkhai, Leon Romanovsky,
	Linux-MM, Michal Hocko, Oleg Nesterov, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian Siewior

On Mon, Nov 2, 2020 at 2:19 PM Ahmed S. Darwish <a.darwish@linutronix.de> wrote:
>
> Disabling preemption for seqcount_t write-side critical sections was
> never a new requirement. It has always been this way, for the reasons
> explained at Documentation/locking/seqlock.rst, "Introduction" section.

Note that that is only true if you spin on the reading side (either of
the two kinds of spinning: (a) spinning to wait for it to become even,
or (b) repeating if they don't match)

Which this code doesn't do, it just fails.

I'm not sure how to perhaps document that.

             Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/2] Add a seqcount between gup_fast and copy_page_range()
  2020-11-02 22:39   ` Linus Torvalds
@ 2020-11-02 23:18     ` Ahmed S. Darwish
  0 siblings, 0 replies; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-02 23:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, Linux Kernel Mailing List, Peter Xu,
	Andrea Arcangeli, Andrew Morton, Aneesh Kumar K.V,
	Christoph Hellwig, Hugh Dickins, Jan Kara, Jann Horn,
	John Hubbard, Kirill Shutemov, Kirill Tkhai, Leon Romanovsky,
	Linux-MM, Michal Hocko, Oleg Nesterov, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian Siewior

On Mon, Nov 02, 2020 at 02:39:49PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2020 at 2:19 PM Ahmed S. Darwish <a.darwish@linutronix.de> wrote:
> >
> > Disabling preemption for seqcount_t write-side critical sections was
> > never a new requirement. It has always been this way, for the reasons
> > explained at Documentation/locking/seqlock.rst, "Introduction" section.
>
> Note that that is only true if you spin on the reading side (either of
> the two kinds of spinning: (a) spinning to wait for it to become even,
> or (b) repeating if they don't match)
>
> Which this code doesn't do, it just fails.
>
> I'm not sure how to perhaps document that.
>

Sure, and this is one of the reasons the lockdep non-preemptibility
check is only added to the non-raw variants of the seqcount write APIs.

Presumably, users of the raw_*() part of the API know what they're
doing, and they don't need to read seqlock.rst :)

(I'm in progress of replying to patch #2, which touches a bit on this
 and other points)..

>              Linus

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
       [not found] ` <2-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
  2020-10-30 16:51   ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Jan Kara
  2020-10-30 22:52   ` Peter Xu
@ 2020-11-02 23:58   ` Ahmed S. Darwish
  2 siblings, 0 replies; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-02 23:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, Peter Xu, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Sebastian Siewior

On Fri, Oct 30, 2020 at 11:46:21AM -0300, Jason Gunthorpe wrote:

...

> diff --git a/mm/memory.c b/mm/memory.c
> index c48f8df6e50268..294c2c3c4fe00d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1171,6 +1171,12 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
>  		mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
>  					0, src_vma, src_mm, addr, end);
>  		mmu_notifier_invalidate_range_start(&range);

> +		/*
> +		 * The read side doesn't spin, it goes to the mmap_lock, so the
> +		 * raw version is used to avoid disabling preemption here
> +		 */
> +		mmap_assert_write_locked(src_mm);
> +		raw_write_seqcount_t_begin(&src_mm->write_protect_seq);
>  	}
>

Please, s/raw_write_seqcount_t_begin()/raw_write_seqcount_begin()/g. For
plain seqcount_t, it's the same, while still respecting the seqlock.h
API boundaries.

Let's make the comment also a bit more clear (IMHO, "lockdep" needs to
be mentioned somewhere):

		/*
		 * Disabling preemption is not needed for the write side, as
		 * the read side doesn't spin, but goes to the mmap_lock.
		 *
		 * Use the raw variant of the seqcount_t write API to avoid
		 * lockdep complaining about preemptibility.
		 */
		mmap_assert_write_locked(src_mm);
		raw_write_seqcount_t_begin(&src_mm->write_protect_seq);

>  	ret = 0;
> @@ -1187,8 +1193,10 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
>  		}
>  	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
>
> -	if (is_cow)
> +	if (is_cow) {
> +		raw_write_seqcount_t_end(&src_mm->write_protect_seq);

ditto.

s/raw_write_seqcount_t_end()/raw_write_seqcount_end()/g

>  		mmu_notifier_invalidate_range_end(&range);
> +	}
>  	return ret;
>  }
>

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
       [not found]     ` <20201030235121.GQ2620339@nvidia.com>
  2020-10-31 15:26       ` Peter Xu
@ 2020-11-03  0:17       ` Ahmed S. Darwish
       [not found]         ` <20201103002532.GL2620339@nvidia.com>
  2020-11-03 17:03         ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Peter Xu
  1 sibling, 2 replies; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-03  0:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, linux-kernel, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Sebastian Siewior

On Fri, Oct 30, 2020 at 08:51:21PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 30, 2020 at 06:52:50PM -0400, Peter Xu wrote:

...

>
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index c48f8df6e50268..294c2c3c4fe00d 100644
> > > +++ b/mm/memory.c
> > > @@ -1171,6 +1171,12 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
> > >  		mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
> > >  					0, src_vma, src_mm, addr, end);
> > >  		mmu_notifier_invalidate_range_start(&range);
> > > +		/*
> > > +		 * The read side doesn't spin, it goes to the mmap_lock, so the
> > > +		 * raw version is used to avoid disabling preemption here
> > > +		 */
> > > +		mmap_assert_write_locked(src_mm);
> > > +		raw_write_seqcount_t_begin(&src_mm->write_protect_seq);
> >
> > Would raw_write_seqcount_begin() be better here?
>
> Hum..
>
> I felt no because it had the preempt stuff added into it, however it
> would work - __seqcount_lock_preemptible() == false for the seqcount_t
> case (see below)
>
> Looking more closely, maybe the right API to pick is
> write_seqcount_t_begin() and write_seqcount_t_end() ??
>

No, that's not the right API: it is also internal to seqlock.h.

Please stick with the official exported API: raw_write_seqcount_begin().

It should satisfy your needs, and the raw_*() variant is created exactly
for contexts wishing to avoid the lockdep checks (e.g. NMI handlers
cannot invoke lockdep, etc.)

> However, no idea what the intention of the '*_seqcount_t_*' family is
> - it only seems to be used to implement the seqlock..
>

Exactly. '*_seqcount_t_*' is a seqlock.h implementation detail, and it
has _zero_ relevance to what is discussed in this thread actually.

...

> Ahmed explained in commit 8117ab508f the reason the seqcount_t write
> side has preemption disabled is because it can livelock RT kernels if
> the read side is spinning after preempting the write side. eg look at
> how __read_seqcount_begin() is implemented:
>
> 	while ((seq = __seqcount_sequence(s)) & 1)			\
> 		cpu_relax();						\
>
> However, in this patch, we don't spin on the read side.
>
> If the read side collides with a writer it immediately goes to the
> mmap_lock, which is sleeping, and so it will sort itself out properly,
> even if it was preempted.
>

Correct.

Thanks,

--
Ahmed Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-10-31 15:26       ` Peter Xu
@ 2020-11-03  0:33         ` Ahmed S. Darwish
  0 siblings, 0 replies; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-03  0:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jason Gunthorpe, linux-kernel, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Sebastian Siewior

On Sat, Oct 31, 2020 at 11:26:05AM -0400, Peter Xu wrote:

...

> Shall we document this explicitly (if this patch still needs a repost)?

Yes, this patch series needs a v3 :)

> Seems not straightforward since that seems not the usual way to use seqcount,
> not sure whether I'm the only one that feels this way, though.

Yes, this usage is correct but not common. I've proposed a more explicit
comment above the write section code, in my reply to patch #2.

...

> The other thing is, considering this use of seqcount seems to be quite special
> as explained below, I'm just not sure whether this would confuse lockdep or
> kcsan, etc., if we decide to use write_seqcount_t_begin().
>

Lockdep won't be confused as it's not used in the raw_*() variant of the
seqcount APIs.

AFAIK KCSAN also has some margin to protect itself from this: see
seqlock.h KCSAN_SEQLOCK_REGION_MAX.

Thanks,

> Peter Xu

Ahmed Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
       [not found]         ` <20201103002532.GL2620339@nvidia.com>
@ 2020-11-03  0:41           ` Ahmed S. Darwish
  2020-11-03  2:20             ` John Hubbard
  0 siblings, 1 reply; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-03  0:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, linux-kernel, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Sebastian Siewior

On Mon, Nov 02, 2020 at 08:25:32PM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 03, 2020 at 01:17:12AM +0100, Ahmed S. Darwish wrote:
>
> > Please stick with the official exported API: raw_write_seqcount_begin().
>
> How did you know this was 'offical exported API' ??
>

All the official exported seqlock.h APIs are marked with verbose
kernel-doc annotations on top. The rest are internal...

> > Exactly. '*_seqcount_t_*' is a seqlock.h implementation detail, and it
> > has _zero_ relevance to what is discussed in this thread actually.
>
> Add some leading __'s to them?
>

It's a bit more complicated than just adding some "__" prefixes, due to
the massive compile-time polymorphism done through _Generic().

The '*_seqcount_t_*' format was the best we could come up with to
distinguish (again, for internal seqlock.h code) between macros taking
all seqcount_LOCKNAME_t types, and macros/functions taking only plain
seqcount_t.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-03  0:41           ` Ahmed S. Darwish
@ 2020-11-03  2:20             ` John Hubbard
  2020-11-03  6:52               ` Ahmed S. Darwish
  0 siblings, 1 reply; 28+ messages in thread
From: John Hubbard @ 2020-11-03  2:20 UTC (permalink / raw)
  To: Ahmed S. Darwish, Jason Gunthorpe
  Cc: Peter Xu, linux-kernel, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Sebastian Siewior

On 11/2/20 4:41 PM, Ahmed S. Darwish wrote:
> On Mon, Nov 02, 2020 at 08:25:32PM -0400, Jason Gunthorpe wrote:
>> On Tue, Nov 03, 2020 at 01:17:12AM +0100, Ahmed S. Darwish wrote:
>>
>>> Please stick with the official exported API: raw_write_seqcount_begin().
>>
>> How did you know this was 'offical exported API' ??
>>
> 
> All the official exported seqlock.h APIs are marked with verbose
> kernel-doc annotations on top. The rest are internal...
> 

OK, but no one here was able to deduce that, probably because there is not
enough consistency throughout the kernel to be able to assume such things--even
though your seqlock project is internally consistent. It's just not *quite*
enough communication.

I think if we added the following it would be very nice:

a) Short comments to the "unofficial and internal" routines, identifying them as
such, and

b) Short comments to the "official API for general use", also identifying
those as such.

c) A comment about what makes "raw" actually raw, for seqlock.


Since I'm proposing new work, I'll also offer to help, perhaps by putting together
a small patch to get it kicked off, if you approve of the idea.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-03  2:20             ` John Hubbard
@ 2020-11-03  6:52               ` Ahmed S. Darwish
  2020-11-03 17:40                 ` Linus Torvalds
  0 siblings, 1 reply; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-03  6:52 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jason Gunthorpe, Peter Xu, linux-kernel, Linus Torvalds,
	Andrea Arcangeli, Andrew Morton, Aneesh Kumar K.V,
	Christoph Hellwig, Hugh Dickins, Jan Kara, Jann Horn,
	Kirill Shutemov, Kirill Tkhai, Leon Romanovsky, Linux-MM,
	Michal Hocko, Oleg Nesterov, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Thomas Gleixner, Sebastian Siewior

On Mon, Nov 02, 2020 at 06:20:45PM -0800, John Hubbard wrote:
> On 11/2/20 4:41 PM, Ahmed S. Darwish wrote:
> > On Mon, Nov 02, 2020 at 08:25:32PM -0400, Jason Gunthorpe wrote:
> > > On Tue, Nov 03, 2020 at 01:17:12AM +0100, Ahmed S. Darwish wrote:
> > >
> > > > Please stick with the official exported API: raw_write_seqcount_begin().
> > >
> > > How did you know this was 'offical exported API' ??
> > >
> >
> > All the official exported seqlock.h APIs are marked with verbose
> > kernel-doc annotations on top. The rest are internal...
> >
>
> OK, but no one here was able to deduce that, probably because there is not
> enough consistency throughout the kernel to be able to assume such things--even
> though your seqlock project is internally consistent. It's just not *quite*
> enough communication.
>
> I think if we added the following it would be very nice:
>

The problem is, I've already documented seqlock.h to death.... There are
more comments than code in there, and there is "seqlock.rst" under
Documentation/ to further describe the big picture.

There comes a point where you decide what level of documentation to add,
and what level to skip.

Because in the end, you don't want to confuse "Joe, the general driver
developer" with too much details that's not relevant to their task at
hand.  (I work in the Embedded domain, and I've seen so much ugly code
from embedded drivers/SoC developers already, sorry)

See for example my reply to Linus, where any talk about the lockdep-free
and barrier-free parts of the API was explicitly not mentioned in
seqlock.rst. This was done on purpose: 1) you want to keep the generic
case simple, but the special case do-able, 2) you want to encourage
people to use the standard entry/exit points as much as possible.

> a) Short comments to the "unofficial and internal" routines, identifying them as
> such, and
>
> b) Short comments to the "official API for general use", also identifying
> those as such.
>

I really think the already added kernel-doc is sufficient...

See for example __read_seqcount_begin() and __read_seqcount_retry().
They begin with "__", but they are semi-external seqlock.h API that are
used by VFS to avoid barriers. And these APIs are then polymorphised
according to the write serialization lock type, and so on.

So the most consistent way for seqlock.h was to use kernel-doc as *the*
marker for exported functions.

This is not unique to seqlock.h by the way. The same pattern is heavily
used by the DRM folks.

Yes, of course, we can add even more comments to seqlock.h, but then, I
honestly think it would be too much that maybe people will just skip
reading the whole thing altogether...

> c) A comment about what makes "raw" actually raw, for seqlock.
>

That's already documented.

What more can really be written than what's in seqlock.h below??

  /**
   * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep

  /**
   * raw_seqcount_begin() - begin a seqcount_t read critical section w/o
   *                        lockdep and w/o counter stabilization

  /**
   * raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep

  /**
   * raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep

>
> Since I'm proposing new work, I'll also offer to help, perhaps by putting together
> a small patch to get it kicked off, if you approve of the idea.
>

Patches are always welcome of course, and please put me in Cc. I don't
approve or deny anything though, that's the locking maintainers job :)

Kind regards,

> John Hubbard
> NVIDIA

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-03  0:17       ` Ahmed S. Darwish
       [not found]         ` <20201103002532.GL2620339@nvidia.com>
@ 2020-11-03 17:03         ` Peter Xu
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Xu @ 2020-11-03 17:03 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Jason Gunthorpe, linux-kernel, Linus Torvalds, Andrea Arcangeli,
	Andrew Morton, Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins,
	Jan Kara, Jann Horn, John Hubbard, Kirill Shutemov, Kirill Tkhai,
	Leon Romanovsky, Linux-MM, Michal Hocko, Oleg Nesterov,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Sebastian Siewior

On Tue, Nov 03, 2020 at 01:17:12AM +0100, Ahmed S. Darwish wrote:
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index c48f8df6e50268..294c2c3c4fe00d 100644
> > > > +++ b/mm/memory.c
> > > > @@ -1171,6 +1171,12 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
> > > >  		mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
> > > >  					0, src_vma, src_mm, addr, end);
> > > >  		mmu_notifier_invalidate_range_start(&range);
> > > > +		/*
> > > > +		 * The read side doesn't spin, it goes to the mmap_lock, so the
> > > > +		 * raw version is used to avoid disabling preemption here
> > > > +		 */
> > > > +		mmap_assert_write_locked(src_mm);
> > > > +		raw_write_seqcount_t_begin(&src_mm->write_protect_seq);
> > >
> > > Would raw_write_seqcount_begin() be better here?
> >
> > Hum..
> >
> > I felt no because it had the preempt stuff added into it, however it
> > would work - __seqcount_lock_preemptible() == false for the seqcount_t
> > case (see below)
> >
> > Looking more closely, maybe the right API to pick is
> > write_seqcount_t_begin() and write_seqcount_t_end() ??
> >
> 
> No, that's not the right API: it is also internal to seqlock.h.
> 
> Please stick with the official exported API: raw_write_seqcount_begin().
> 
> It should satisfy your needs, and the raw_*() variant is created exactly
> for contexts wishing to avoid the lockdep checks (e.g. NMI handlers
> cannot invoke lockdep, etc.)

Ahmed, Jason, feel free to correct me - but I feel like what Jason wanted here
is indeed the version that does not require disabling of preemption, a.k.a.,
write_seqcount_t_begin() and write_seqcount_t_end(), since it's preempt-safe if
the read side does not retry.  Not sure whether there's no "*_t_*" version of it.

Another idea is that maybe we can use the raw_write_seqcount_begin() version,
instead of in copy_page_range() but move it to copy_pte_range().  That would
not affect normal Linux on preemption I think, since when reach pte level we
should have disabled preemption already after all (by taking the pgtable spin
lock).  But again there could be extra overhead since we'll need to take the
write seqcount very often (rather than once per fork(), so maybe there's some
perf influence), also that means it'll be an extra/real disable_preempt() for
the future RT code if it'll land some day (since again rt_spin_lock should not
need to disable preemption, iiuc, which is used here).  So seems it's still
better to do in copy_page_range() as Jason proposed.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-03  6:52               ` Ahmed S. Darwish
@ 2020-11-03 17:40                 ` Linus Torvalds
  2020-11-04  1:32                   ` Ahmed S. Darwish
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2020-11-03 17:40 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: John Hubbard, Jason Gunthorpe, Peter Xu,
	Linux Kernel Mailing List, Andrea Arcangeli, Andrew Morton,
	Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins, Jan Kara,
	Jann Horn, Kirill Shutemov, Kirill Tkhai, Leon Romanovsky,
	Linux-MM, Michal Hocko, Oleg Nesterov, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian Siewior

On Mon, Nov 2, 2020 at 10:52 PM Ahmed S. Darwish
<a.darwish@linutronix.de> wrote:
>
> The problem is, I've already documented seqlock.h to death.... There are
> more comments than code in there, and there is "seqlock.rst" under
> Documentation/ to further describe the big picture.

Well, honestly, I think the correct thing to do is to get rid of the
*_seqcount_t_*() functions entirely.

They add nothing but confusion, and they are entirely misnamed. That's
not the pattern we use for "internal use only" functions, and they are
*very* confusing.

They have other issues too: like raw_write_seqcount_end() not being
usable on its own when preemptibility isn't an issue like here. You
basically _have_ to use raw_write_seqcount_t_end(), because otherwise
it tries to re-enable preemption that was never there.

                   Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-03 17:40                 ` Linus Torvalds
@ 2020-11-04  1:32                   ` Ahmed S. Darwish
  2020-11-04  2:01                     ` John Hubbard
  0 siblings, 1 reply; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-04  1:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Hubbard, Jason Gunthorpe, Peter Xu,
	Linux Kernel Mailing List, Andrea Arcangeli, Andrew Morton,
	Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins, Jan Kara,
	Jann Horn, Kirill Shutemov, Kirill Tkhai, Leon Romanovsky,
	Linux-MM, Michal Hocko, Oleg Nesterov, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian Siewior

On Tue, Nov 03, 2020 at 09:40:22AM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2020 at 10:52 PM Ahmed S. Darwish
> <a.darwish@linutronix.de> wrote:
> >
> > The problem is, I've already documented seqlock.h to death.... There are
> > more comments than code in there, and there is "seqlock.rst" under
> > Documentation/ to further describe the big picture.
>
> Well, honestly, I think the correct thing to do is to get rid of the
> *_seqcount_t_*() functions entirely.
>
> They add nothing but confusion, and they are entirely misnamed. That's
> not the pattern we use for "internal use only" functions, and they are
> *very* confusing.
>

I see. Would the enclosed patch #1 be OK? It basically uses the "__do_"
prefix instead, with some rationale.

>
> They have other issues too: like raw_write_seqcount_end() not being
> usable on its own when preemptibility isn't an issue like here. You
> basically _have_ to use raw_write_seqcount_t_end(), because otherwise
> it tries to re-enable preemption that was never there.
>

Hmmm, raw_write_seqcount_{begin,end}() *never* disable/enable preemption
for plain seqcount_t. This is why I kept recommending those for this
patch series instead of internal raw_write_seqcount_*t*_{begin,end}().

But..... given that multiple people made the exact same remark by now, I
guess that's due to:

#define raw_write_seqcount_begin(s)		\
do {						\
	if (__seqcount_lock_preemptible(s))	\
		preempt_disable();		\
						\
	...					\
} while (0);

#define raw_write_seqcount_end(s)		\
do {						\
	...					\
						\
	if (__seqcount_lock_preemptible(s))	\
		preempt_enable();		\
} while (0);

The tricky part is that __seqcount_lock_preemptible() is always false
for plain "seqcount_t".  With that data type, the _Generic() selection
makes it resolve to __seqprop_preemptible(), which just returns false.

Originally, __seqcount_lock_preemptible() was called:

  __seqcount_associated_lock_exists_and_is_preemptible()

but it got transformed to its current short form in the process of some
pre-mainline refactorings. Looking at it now after all the dust has
settled, maybe the verbose form was much more clear.

Please see the enclosed patch #2... Would that be OK too?

(I will submit these two patches in their own thread after some common
 ground is reached.)

Patches
-------

====>
====> patch #1:
====>

Subject: [RFC][PATCH 1/2] seqlock: Use __do_ prefix instead of non-standed
 _seqcount_t_ marker

The use of "*_seqcount_t_*" as a marker to denote internal seqlock.h
functions taking only plain seqcount_t instead of the whole
seqcount_LOCKNAME_t family is confusing users, as it's also not the
standard kernel pattern for denoting header file internal functions.

Use the __do_ prefix instead.

Note, a plain "__" prefix is not used since seqlock.h already uses it
for some of its exported functions; e.g. __read_seqcount_begin() and
__read_seqcount_retry().

Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Reported-by: John Hubbard <jhubbard@nvidia.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 62 ++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index cbfc78b92b65..5de043841d33 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -425,9 +425,9 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
  * Return: true if a read section retry is required, else false
  */
 #define __read_seqcount_retry(s, start)					\
-	__read_seqcount_t_retry(__seqcount_ptr(s), start)
+	__do___read_seqcount_retry(__seqcount_ptr(s), start)

-static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
+static inline int __do___read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	kcsan_atomic_next(0);
 	return unlikely(READ_ONCE(s->sequence) != start);
@@ -445,12 +445,12 @@ static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
  * Return: true if a read section retry is required, else false
  */
 #define read_seqcount_retry(s, start)					\
-	read_seqcount_t_retry(__seqcount_ptr(s), start)
+	__do_read_seqcount_retry(__seqcount_ptr(s), start)

-static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
+static inline int __do_read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_t_retry(s, start);
+	return __do___read_seqcount_retry(s, start);
 }

 /**
@@ -462,10 +462,10 @@ do {									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_disable();					\
 									\
-	raw_write_seqcount_t_begin(__seqcount_ptr(s));			\
+	__do_raw_write_seqcount_begin(__seqcount_ptr(s));		\
 } while (0)

-static inline void raw_write_seqcount_t_begin(seqcount_t *s)
+static inline void __do_raw_write_seqcount_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -478,13 +478,13 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
  */
 #define raw_write_seqcount_end(s)					\
 do {									\
-	raw_write_seqcount_t_end(__seqcount_ptr(s));			\
+	__do_raw_write_seqcount_end(__seqcount_ptr(s));			\
 									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_enable();					\
 } while (0)

-static inline void raw_write_seqcount_t_end(seqcount_t *s)
+static inline void __do_raw_write_seqcount_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
@@ -506,12 +506,12 @@ do {									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin_nested(__seqcount_ptr(s), subclass);	\
+	__do_write_seqcount_begin_nested(__seqcount_ptr(s), subclass);	\
 } while (0)

-static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
+static inline void __do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
-	raw_write_seqcount_t_begin(s);
+	__do_raw_write_seqcount_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }

@@ -533,12 +533,12 @@ do {									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin(__seqcount_ptr(s));			\
+	__do_write_seqcount_begin(__seqcount_ptr(s));			\
 } while (0)

-static inline void write_seqcount_t_begin(seqcount_t *s)
+static inline void __do_write_seqcount_begin(seqcount_t *s)
 {
-	write_seqcount_t_begin_nested(s, 0);
+	__do_write_seqcount_begin_nested(s, 0);
 }

 /**
@@ -549,16 +549,16 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
  */
 #define write_seqcount_end(s)						\
 do {									\
-	write_seqcount_t_end(__seqcount_ptr(s));			\
+	__do_write_seqcount_end(__seqcount_ptr(s));			\
 									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_enable();					\
 } while (0)

-static inline void write_seqcount_t_end(seqcount_t *s)
+static inline void __do_write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
-	raw_write_seqcount_t_end(s);
+	__do_raw_write_seqcount_end(s);
 }

 /**
@@ -603,9 +603,9 @@ static inline void write_seqcount_t_end(seqcount_t *s)
  *      }
  */
 #define raw_write_seqcount_barrier(s)					\
-	raw_write_seqcount_t_barrier(__seqcount_ptr(s))
+	__do_raw_write_seqcount_barrier(__seqcount_ptr(s))

-static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
+static inline void __do_raw_write_seqcount_barrier(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -623,9 +623,9 @@ static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
  * will complete successfully and see data older than this.
  */
 #define write_seqcount_invalidate(s)					\
-	write_seqcount_t_invalidate(__seqcount_ptr(s))
+	__do_write_seqcount_invalidate(__seqcount_ptr(s))

-static inline void write_seqcount_t_invalidate(seqcount_t *s)
+static inline void __do_write_seqcount_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	kcsan_nestable_atomic_begin();
@@ -865,7 +865,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 }

 /*
- * For all seqlock_t write side functions, use write_seqcount_*t*_begin()
+ * For all seqlock_t write side functions, use __do_write_seqcount_begin()
  * instead of the generic write_seqcount_begin(). This way, no redundant
  * lockdep_assert_held() checks are added.
  */
@@ -886,7 +886,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	__do_write_seqcount_begin(&sl->seqcount.seqcount);
 }

 /**
@@ -898,7 +898,7 @@ static inline void write_seqlock(seqlock_t *sl)
  */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	__do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock(&sl->lock);
 }

@@ -912,7 +912,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	__do_write_seqcount_begin(&sl->seqcount.seqcount);
 }

 /**
@@ -925,7 +925,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
  */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	__do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock_bh(&sl->lock);
 }

@@ -939,7 +939,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	__do_write_seqcount_begin(&sl->seqcount.seqcount);
 }

 /**
@@ -951,7 +951,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
  */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	__do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock_irq(&sl->lock);
 }

@@ -960,7 +960,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;

 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	__do_write_seqcount_begin(&sl->seqcount.seqcount);
 	return flags;
 }

@@ -989,7 +989,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	__do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }

====>
====> patch #2:
====>

Subject: [PATCH 2/2] seqlock: seqcount_LOCKAME_t: Use more verbose macro names

As evidenced by multiple discussions over LKML so far, it's not clear
that __seqcount_lock_preemptible() is always false for plain seqcount_t.
For that data type, the _Generic() selection resolves to
__seqprop_preemptible(), which just returns false.

Use __seqcount_associated_lock_exists_and_is_preemptible() instead,
which hints that "preemptibility" is for the associated write
serialization lock (if any), not for the seqcount itself.

Similarly, rename __seqcount_assert_lock_held() to
__seqcount_assert_associated_lock_held().

Link: https://lkml.kernel.org/r/CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com
Link: https://lkml.kernel.org/r/20201030235121.GQ2620339@nvidia.com
Link: https://lkml.kernel.org/r/20201103170327.GJ20600@xz-x1
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5de043841d33..eb1e5a822e44 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -307,10 +307,10 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
 	__seqprop_case((s),	mutex,		prop),			\
 	__seqprop_case((s),	ww_mutex,	prop))

-#define __seqcount_ptr(s)		__seqprop(s, ptr)
-#define __seqcount_sequence(s)		__seqprop(s, sequence)
-#define __seqcount_lock_preemptible(s)	__seqprop(s, preemptible)
-#define __seqcount_assert_lock_held(s)	__seqprop(s, assert)
+#define __seqcount_ptr(s)					__seqprop(s, ptr)
+#define __seqcount_sequence(s)					__seqprop(s, sequence)
+#define __seqcount_associated_lock_exists_and_is_preemptible(s)	__seqprop(s, preemptible)
+#define __seqcount_assert_associated_lock_held(s)		__seqprop(s, assert)

 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -459,7 +459,7 @@ static inline int __do_read_seqcount_retry(const seqcount_t *s, unsigned start)
  */
 #define raw_write_seqcount_begin(s)					\
 do {									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_disable();					\
 									\
 	__do_raw_write_seqcount_begin(__seqcount_ptr(s));		\
@@ -480,7 +480,7 @@ static inline void __do_raw_write_seqcount_begin(seqcount_t *s)
 do {									\
 	__do_raw_write_seqcount_end(__seqcount_ptr(s));			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_enable();					\
 } while (0)

@@ -501,9 +501,9 @@ static inline void __do_raw_write_seqcount_end(seqcount_t *s)
  */
 #define write_seqcount_begin_nested(s, subclass)			\
 do {									\
-	__seqcount_assert_lock_held(s);					\
+	__seqcount_assert_associated_lock_held(s);			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_disable();					\
 									\
 	__do_write_seqcount_begin_nested(__seqcount_ptr(s), subclass);	\
@@ -528,9 +528,9 @@ static inline void __do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
  */
 #define write_seqcount_begin(s)						\
 do {									\
-	__seqcount_assert_lock_held(s);					\
+	__seqcount_assert_associated_lock_held(s);			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_disable();					\
 									\
 	__do_write_seqcount_begin(__seqcount_ptr(s));			\
@@ -551,7 +551,7 @@ static inline void __do_write_seqcount_begin(seqcount_t *s)
 do {									\
 	__do_write_seqcount_end(__seqcount_ptr(s));			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_enable();					\
 } while (0)

>                    Linus

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-04  1:32                   ` Ahmed S. Darwish
@ 2020-11-04  2:01                     ` John Hubbard
  2020-11-04  3:17                       ` Ahmed S. Darwish
  0 siblings, 1 reply; 28+ messages in thread
From: John Hubbard @ 2020-11-04  2:01 UTC (permalink / raw)
  To: Ahmed S. Darwish, Linus Torvalds
  Cc: Jason Gunthorpe, Peter Xu, Linux Kernel Mailing List,
	Andrea Arcangeli, Andrew Morton, Aneesh Kumar K.V,
	Christoph Hellwig, Hugh Dickins, Jan Kara, Jann Horn,
	Kirill Shutemov, Kirill Tkhai, Leon Romanovsky, Linux-MM,
	Michal Hocko, Oleg Nesterov, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Thomas Gleixner, Sebastian Siewior

On 11/3/20 5:32 PM, Ahmed S. Darwish wrote:
> On Tue, Nov 03, 2020 at 09:40:22AM -0800, Linus Torvalds wrote:
>> On Mon, Nov 2, 2020 at 10:52 PM Ahmed S. Darwish
>> <a.darwish@linutronix.de> wrote:
...
> 
> ====>
> ====> patch #1:
> ====>
> 
> Subject: [RFC][PATCH 1/2] seqlock: Use __do_ prefix instead of non-standed
>   _seqcount_t_ marker
> 
> The use of "*_seqcount_t_*" as a marker to denote internal seqlock.h
> functions taking only plain seqcount_t instead of the whole
> seqcount_LOCKNAME_t family is confusing users, as it's also not the
> standard kernel pattern for denoting header file internal functions.
> 
> Use the __do_ prefix instead.
> 
> Note, a plain "__" prefix is not used since seqlock.h already uses it
> for some of its exported functions; e.g. __read_seqcount_begin() and
> __read_seqcount_retry().
> 
> Reported-by: Jason Gunthorpe <jgg@nvidia.com>
> Reported-by: John Hubbard <jhubbard@nvidia.com>
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Link: https://lkml.kernel.org/r/CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> ---
>   include/linux/seqlock.h | 62 ++++++++++++++++++++---------------------
>   1 file changed, 31 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index cbfc78b92b65..5de043841d33 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -425,9 +425,9 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
>    * Return: true if a read section retry is required, else false
>    */
>   #define __read_seqcount_retry(s, start)					\
> -	__read_seqcount_t_retry(__seqcount_ptr(s), start)
> +	__do___read_seqcount_retry(__seqcount_ptr(s), start)

Looking better. "do_" is clearly an internal function name prefix, so that's good.

A nit: while various numbers of leading underscores are sometimes used, it's a lot
less common to use, say, 3 consecutive underscores (as above) *within* the name. And
I don't think you need it for uniqueness, at least from a quick look around here.

In fact, if you actually need 3 vs 2 vs 1 underscore within the name, I'm going to
claim that that's too far afield, and the naming should be re-revisited. :)

So why not just:

     __do_read_seqcount_retry()

?

...or, if you feeling bold:

	do_read_seqcount_retry()

...thus taking further advantage of the "do" convention, in order to get rid of some
more underscores.

And similarly for other __do___*() functions.

But again, either way, I think "do" is helping a *lot* here (as is getting rid
of the _t_ idea).


thanks,
-- 
John Hubbard
NVIDIA

> 
> -static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
> +static inline int __do___read_seqcount_retry(const seqcount_t *s, unsigned start)
>   {
>   	kcsan_atomic_next(0);
>   	return unlikely(READ_ONCE(s->sequence) != start);
> @@ -445,12 +445,12 @@ static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
>    * Return: true if a read section retry is required, else false
>    */
>   #define read_seqcount_retry(s, start)					\
> -	read_seqcount_t_retry(__seqcount_ptr(s), start)
> +	__do_read_seqcount_retry(__seqcount_ptr(s), start)
> 
> -static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
> +static inline int __do_read_seqcount_retry(const seqcount_t *s, unsigned start)
>   {
>   	smp_rmb();
> -	return __read_seqcount_t_retry(s, start);
> +	return __do___read_seqcount_retry(s, start);
>   }
> 
>   /**
> @@ -462,10 +462,10 @@ do {									\
>   	if (__seqcount_lock_preemptible(s))				\
>   		preempt_disable();					\
>   									\
> -	raw_write_seqcount_t_begin(__seqcount_ptr(s));			\
> +	__do_raw_write_seqcount_begin(__seqcount_ptr(s));		\
>   } while (0)
> 
> -static inline void raw_write_seqcount_t_begin(seqcount_t *s)
> +static inline void __do_raw_write_seqcount_begin(seqcount_t *s)
>   {
>   	kcsan_nestable_atomic_begin();
>   	s->sequence++;
> @@ -478,13 +478,13 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
>    */
>   #define raw_write_seqcount_end(s)					\
>   do {									\
> -	raw_write_seqcount_t_end(__seqcount_ptr(s));			\
> +	__do_raw_write_seqcount_end(__seqcount_ptr(s));			\
>   									\
>   	if (__seqcount_lock_preemptible(s))				\
>   		preempt_enable();					\
>   } while (0)
> 
> -static inline void raw_write_seqcount_t_end(seqcount_t *s)
> +static inline void __do_raw_write_seqcount_end(seqcount_t *s)
>   {
>   	smp_wmb();
>   	s->sequence++;
> @@ -506,12 +506,12 @@ do {									\
>   	if (__seqcount_lock_preemptible(s))				\
>   		preempt_disable();					\
>   									\
> -	write_seqcount_t_begin_nested(__seqcount_ptr(s), subclass);	\
> +	__do_write_seqcount_begin_nested(__seqcount_ptr(s), subclass);	\
>   } while (0)
> 
> -static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
> +static inline void __do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
>   {
> -	raw_write_seqcount_t_begin(s);
> +	__do_raw_write_seqcount_begin(s);
>   	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
>   }
> 
> @@ -533,12 +533,12 @@ do {									\
>   	if (__seqcount_lock_preemptible(s))				\
>   		preempt_disable();					\
>   									\
> -	write_seqcount_t_begin(__seqcount_ptr(s));			\
> +	__do_write_seqcount_begin(__seqcount_ptr(s));			\
>   } while (0)
> 
> -static inline void write_seqcount_t_begin(seqcount_t *s)
> +static inline void __do_write_seqcount_begin(seqcount_t *s)
>   {
> -	write_seqcount_t_begin_nested(s, 0);
> +	__do_write_seqcount_begin_nested(s, 0);
>   }
> 
>   /**
> @@ -549,16 +549,16 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
>    */
>   #define write_seqcount_end(s)						\
>   do {									\
> -	write_seqcount_t_end(__seqcount_ptr(s));			\
> +	__do_write_seqcount_end(__seqcount_ptr(s));			\
>   									\
>   	if (__seqcount_lock_preemptible(s))				\
>   		preempt_enable();					\
>   } while (0)
> 
> -static inline void write_seqcount_t_end(seqcount_t *s)
> +static inline void __do_write_seqcount_end(seqcount_t *s)
>   {
>   	seqcount_release(&s->dep_map, _RET_IP_);
> -	raw_write_seqcount_t_end(s);
> +	__do_raw_write_seqcount_end(s);
>   }
> 
>   /**
> @@ -603,9 +603,9 @@ static inline void write_seqcount_t_end(seqcount_t *s)
>    *      }
>    */
>   #define raw_write_seqcount_barrier(s)					\
> -	raw_write_seqcount_t_barrier(__seqcount_ptr(s))
> +	__do_raw_write_seqcount_barrier(__seqcount_ptr(s))
> 
> -static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
> +static inline void __do_raw_write_seqcount_barrier(seqcount_t *s)
>   {
>   	kcsan_nestable_atomic_begin();
>   	s->sequence++;
> @@ -623,9 +623,9 @@ static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
>    * will complete successfully and see data older than this.
>    */
>   #define write_seqcount_invalidate(s)					\
> -	write_seqcount_t_invalidate(__seqcount_ptr(s))
> +	__do_write_seqcount_invalidate(__seqcount_ptr(s))
> 
> -static inline void write_seqcount_t_invalidate(seqcount_t *s)
> +static inline void __do_write_seqcount_invalidate(seqcount_t *s)
>   {
>   	smp_wmb();
>   	kcsan_nestable_atomic_begin();
> @@ -865,7 +865,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
>   }
> 
>   /*
> - * For all seqlock_t write side functions, use write_seqcount_*t*_begin()
> + * For all seqlock_t write side functions, use __do_write_seqcount_begin()
>    * instead of the generic write_seqcount_begin(). This way, no redundant
>    * lockdep_assert_held() checks are added.
>    */
> @@ -886,7 +886,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
>   static inline void write_seqlock(seqlock_t *sl)
>   {
>   	spin_lock(&sl->lock);
> -	write_seqcount_t_begin(&sl->seqcount.seqcount);
> +	__do_write_seqcount_begin(&sl->seqcount.seqcount);
>   }
> 
>   /**
> @@ -898,7 +898,7 @@ static inline void write_seqlock(seqlock_t *sl)
>    */
>   static inline void write_sequnlock(seqlock_t *sl)
>   {
> -	write_seqcount_t_end(&sl->seqcount.seqcount);
> +	__do_write_seqcount_end(&sl->seqcount.seqcount);
>   	spin_unlock(&sl->lock);
>   }
> 
> @@ -912,7 +912,7 @@ static inline void write_sequnlock(seqlock_t *sl)
>   static inline void write_seqlock_bh(seqlock_t *sl)
>   {
>   	spin_lock_bh(&sl->lock);
> -	write_seqcount_t_begin(&sl->seqcount.seqcount);
> +	__do_write_seqcount_begin(&sl->seqcount.seqcount);
>   }
> 
>   /**
> @@ -925,7 +925,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
>    */
>   static inline void write_sequnlock_bh(seqlock_t *sl)
>   {
> -	write_seqcount_t_end(&sl->seqcount.seqcount);
> +	__do_write_seqcount_end(&sl->seqcount.seqcount);
>   	spin_unlock_bh(&sl->lock);
>   }
> 
> @@ -939,7 +939,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
>   static inline void write_seqlock_irq(seqlock_t *sl)
>   {
>   	spin_lock_irq(&sl->lock);
> -	write_seqcount_t_begin(&sl->seqcount.seqcount);
> +	__do_write_seqcount_begin(&sl->seqcount.seqcount);
>   }
> 
>   /**
> @@ -951,7 +951,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
>    */
>   static inline void write_sequnlock_irq(seqlock_t *sl)
>   {
> -	write_seqcount_t_end(&sl->seqcount.seqcount);
> +	__do_write_seqcount_end(&sl->seqcount.seqcount);
>   	spin_unlock_irq(&sl->lock);
>   }
> 
> @@ -960,7 +960,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
>   	unsigned long flags;
> 
>   	spin_lock_irqsave(&sl->lock, flags);
> -	write_seqcount_t_begin(&sl->seqcount.seqcount);
> +	__do_write_seqcount_begin(&sl->seqcount.seqcount);
>   	return flags;
>   }
> 
> @@ -989,7 +989,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
>   static inline void
>   write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
>   {
> -	write_seqcount_t_end(&sl->seqcount.seqcount);
> +	__do_write_seqcount_end(&sl->seqcount.seqcount);
>   	spin_unlock_irqrestore(&sl->lock, flags);
>   }
> 
> ====>
> ====> patch #2:
> ====>
> 
> Subject: [PATCH 2/2] seqlock: seqcount_LOCKAME_t: Use more verbose macro names
> 
> As evidenced by multiple discussions over LKML so far, it's not clear
> that __seqcount_lock_preemptible() is always false for plain seqcount_t.
> For that data type, the _Generic() selection resolves to
> __seqprop_preemptible(), which just returns false.
> 
> Use __seqcount_associated_lock_exists_and_is_preemptible() instead,
> which hints that "preemptibility" is for the associated write
> serialization lock (if any), not for the seqcount itself.
> 
> Similarly, rename __seqcount_assert_lock_held() to
> __seqcount_assert_associated_lock_held().
> 
> Link: https://lkml.kernel.org/r/CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com
> Link: https://lkml.kernel.org/r/20201030235121.GQ2620339@nvidia.com
> Link: https://lkml.kernel.org/r/20201103170327.GJ20600@xz-x1
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> ---
>   include/linux/seqlock.h | 22 +++++++++++-----------
>   1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index 5de043841d33..eb1e5a822e44 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -307,10 +307,10 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
>   	__seqprop_case((s),	mutex,		prop),			\
>   	__seqprop_case((s),	ww_mutex,	prop))
> 
> -#define __seqcount_ptr(s)		__seqprop(s, ptr)
> -#define __seqcount_sequence(s)		__seqprop(s, sequence)
> -#define __seqcount_lock_preemptible(s)	__seqprop(s, preemptible)
> -#define __seqcount_assert_lock_held(s)	__seqprop(s, assert)
> +#define __seqcount_ptr(s)					__seqprop(s, ptr)
> +#define __seqcount_sequence(s)					__seqprop(s, sequence)
> +#define __seqcount_associated_lock_exists_and_is_preemptible(s)	__seqprop(s, preemptible)
> +#define __seqcount_assert_associated_lock_held(s)		__seqprop(s, assert)
> 
>   /**
>    * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
> @@ -459,7 +459,7 @@ static inline int __do_read_seqcount_retry(const seqcount_t *s, unsigned start)
>    */
>   #define raw_write_seqcount_begin(s)					\
>   do {									\
> -	if (__seqcount_lock_preemptible(s))				\
> +	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
>   		preempt_disable();					\
>   									\
>   	__do_raw_write_seqcount_begin(__seqcount_ptr(s));		\
> @@ -480,7 +480,7 @@ static inline void __do_raw_write_seqcount_begin(seqcount_t *s)
>   do {									\
>   	__do_raw_write_seqcount_end(__seqcount_ptr(s));			\
>   									\
> -	if (__seqcount_lock_preemptible(s))				\
> +	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
>   		preempt_enable();					\
>   } while (0)
> 
> @@ -501,9 +501,9 @@ static inline void __do_raw_write_seqcount_end(seqcount_t *s)
>    */
>   #define write_seqcount_begin_nested(s, subclass)			\
>   do {									\
> -	__seqcount_assert_lock_held(s);					\
> +	__seqcount_assert_associated_lock_held(s);			\
>   									\
> -	if (__seqcount_lock_preemptible(s))				\
> +	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
>   		preempt_disable();					\
>   									\
>   	__do_write_seqcount_begin_nested(__seqcount_ptr(s), subclass);	\
> @@ -528,9 +528,9 @@ static inline void __do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
>    */
>   #define write_seqcount_begin(s)						\
>   do {									\
> -	__seqcount_assert_lock_held(s);					\
> +	__seqcount_assert_associated_lock_held(s);			\
>   									\
> -	if (__seqcount_lock_preemptible(s))				\
> +	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
>   		preempt_disable();					\
>   									\
>   	__do_write_seqcount_begin(__seqcount_ptr(s));			\
> @@ -551,7 +551,7 @@ static inline void __do_write_seqcount_begin(seqcount_t *s)
>   do {									\
>   	__do_write_seqcount_end(__seqcount_ptr(s));			\
>   									\
> -	if (__seqcount_lock_preemptible(s))				\
> +	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
>   		preempt_enable();					\
>   } while (0)
> 
>>                     Linus
> 
> Thanks,
> 
> --
> Ahmed S. Darwish
> Linutronix GmbH
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-04  2:01                     ` John Hubbard
@ 2020-11-04  3:17                       ` Ahmed S. Darwish
  2020-11-04 18:38                         ` Linus Torvalds
  2020-11-10 11:53                         ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Peter Zijlstra
  0 siblings, 2 replies; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-04  3:17 UTC (permalink / raw)
  To: John Hubbard
  Cc: Linus Torvalds, Jason Gunthorpe, Peter Xu,
	Linux Kernel Mailing List, Andrea Arcangeli, Andrew Morton,
	Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins, Jan Kara,
	Jann Horn, Kirill Shutemov, Kirill Tkhai, Leon Romanovsky,
	Linux-MM, Michal Hocko, Oleg Nesterov, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian Siewior

On Tue, Nov 03, 2020 at 06:01:30PM -0800, John Hubbard wrote:
> On 11/3/20 5:32 PM, Ahmed S. Darwish wrote:
...
> >   #define __read_seqcount_retry(s, start)					\
> > -	__read_seqcount_t_retry(__seqcount_ptr(s), start)
> > +	__do___read_seqcount_retry(__seqcount_ptr(s), start)
>
...
> A nit: while various numbers of leading underscores are sometimes used, it's a lot
> less common to use, say, 3 consecutive underscores (as above) *within* the name. And
> I don't think you need it for uniqueness, at least from a quick look around here.
>
...
> But again, either way, I think "do" is helping a *lot* here (as is getting rid
> of the _t_ idea).

The three underscores are needed because there's a do_ version for
read_seqcount_retry(), and another for __read_seqcount_retry().

Similarly for {__,}read_seqcount_begin(). You want to be very careful
with this, and never mistaknely mix the two, because it affects some VFS
hot paths.

Nonetheless, as you mentioned in the later (dropped) part of your
message, I think do_ is better than __do_, so the final result will be:

  do___read_seqcount_retry()
  do_read_seqcount_retry()
  do_raw_write_seqcount_begin()
  do_raw_write_seqcount_end()
  do_write_seqcount_begin()
  ...

and so on.

I'll wait for some further feedback on the two patches (possibly from
Linus or PeterZ), then send a mini patch series.

(This shouldn't block a v3 of Jason's mm patch series though, as it will
 be using the external seqlock.h APIs anyway...).

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-04  3:17                       ` Ahmed S. Darwish
@ 2020-11-04 18:38                         ` Linus Torvalds
  2020-11-04 19:54                           ` Ahmed S. Darwish
                                             ` (2 more replies)
  2020-11-10 11:53                         ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Peter Zijlstra
  1 sibling, 3 replies; 28+ messages in thread
From: Linus Torvalds @ 2020-11-04 18:38 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: John Hubbard, Jason Gunthorpe, Peter Xu,
	Linux Kernel Mailing List, Andrea Arcangeli, Andrew Morton,
	Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins, Jan Kara,
	Jann Horn, Kirill Shutemov, Kirill Tkhai, Leon Romanovsky,
	Linux-MM, Michal Hocko, Oleg Nesterov, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian Siewior

On Tue, Nov 3, 2020 at 7:17 PM Ahmed S. Darwish <a.darwish@linutronix.de> wrote:
>
> Nonetheless, as you mentioned in the later (dropped) part of your
> message, I think do_ is better than __do_, so the final result will be:
>
>   do___read_seqcount_retry()
>   do_read_seqcount_retry()
>   do_raw_write_seqcount_begin()
>   do_raw_write_seqcount_end()
>   do_write_seqcount_begin()
>   ...
>
> and so on.

Looks reasonable to me.

And can you add a few comments to the magic type macros, so that it's
a lot more obvious what the end result was. I clearly wasn't able to
follow all the _Generic() cases from the seqcount_t to the final end
result. It's a really odd combination of subtle _GENERIC() macro and
token pasting to get from zeqcount_t to "false" in
__seqcount_lock_preemptible().

I can see it when I really look, but when looking at the actual use,
it's very non-obvious indeed.

                 Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-04 18:38                         ` Linus Torvalds
@ 2020-11-04 19:54                           ` Ahmed S. Darwish
  2020-12-09 18:38                           ` [tip: locking/core] seqlock: Prefix internal seqcount_t-only macros with a "do_" tip-bot2 for Ahmed S. Darwish
  2020-12-09 18:38                           ` [tip: locking/core] seqlock: kernel-doc: Specify when preemption is automatically altered tip-bot2 for Ahmed S. Darwish
  2 siblings, 0 replies; 28+ messages in thread
From: Ahmed S. Darwish @ 2020-11-04 19:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Hubbard, Jason Gunthorpe, Peter Xu,
	Linux Kernel Mailing List, Andrea Arcangeli, Andrew Morton,
	Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins, Jan Kara,
	Jann Horn, Kirill Shutemov, Kirill Tkhai, Leon Romanovsky,
	Linux-MM, Michal Hocko, Oleg Nesterov, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian Siewior

On Wed, Nov 04, 2020 at 10:38:57AM -0800, Linus Torvalds wrote:
...
>
> Looks reasonable to me.
>
> And can you add a few comments to the magic type macros, so that it's
> a lot more obvious what the end result was.
...
>
> I can see it when I really look, but when looking at the actual use,
> it's very non-obvious indeed.
>

ACK, will do.

>                  Linus

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
  2020-11-04  3:17                       ` Ahmed S. Darwish
  2020-11-04 18:38                         ` Linus Torvalds
@ 2020-11-10 11:53                         ` Peter Zijlstra
  2020-12-03 10:35                           ` [tip: locking/core] seqlock: Rename __seqprop() users tip-bot2 for Peter Zijlstra
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2020-11-10 11:53 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: John Hubbard, Linus Torvalds, Jason Gunthorpe, Peter Xu,
	Linux Kernel Mailing List, Andrea Arcangeli, Andrew Morton,
	Aneesh Kumar K.V, Christoph Hellwig, Hugh Dickins, Jan Kara,
	Jann Horn, Kirill Shutemov, Kirill Tkhai, Leon Romanovsky,
	Linux-MM, Michal Hocko, Oleg Nesterov, Ingo Molnar, Will Deacon,
	Thomas Gleixner, Sebastian Siewior

On Wed, Nov 04, 2020 at 04:17:11AM +0100, Ahmed S. Darwish wrote:
> On Tue, Nov 03, 2020 at 06:01:30PM -0800, John Hubbard wrote:
> > On 11/3/20 5:32 PM, Ahmed S. Darwish wrote:
> ...
> > >   #define __read_seqcount_retry(s, start)					\
> > > -	__read_seqcount_t_retry(__seqcount_ptr(s), start)
> > > +	__do___read_seqcount_retry(__seqcount_ptr(s), start)
> >
> ...
> > A nit: while various numbers of leading underscores are sometimes used, it's a lot
> > less common to use, say, 3 consecutive underscores (as above) *within* the name. And
> > I don't think you need it for uniqueness, at least from a quick look around here.
> >
> ...
> > But again, either way, I think "do" is helping a *lot* here (as is getting rid
> > of the _t_ idea).
> 
> The three underscores are needed because there's a do_ version for
> read_seqcount_retry(), and another for __read_seqcount_retry().
> 
> Similarly for {__,}read_seqcount_begin(). You want to be very careful
> with this, and never mistaknely mix the two, because it affects some VFS
> hot paths.
> 
> Nonetheless, as you mentioned in the later (dropped) part of your
> message, I think do_ is better than __do_, so the final result will be:
> 
>   do___read_seqcount_retry()
>   do_read_seqcount_retry()
>   do_raw_write_seqcount_begin()
>   do_raw_write_seqcount_end()
>   do_write_seqcount_begin()
>   ...
> 
> and so on.
> 
> I'll wait for some further feedback on the two patches (possibly from
> Linus or PeterZ), then send a mini patch series.

I'm Ok with that. The change I have issues with is:

-#define __seqcount_lock_preemptible(s) __seqprop(s, preemptible)
+#define __seqcount_associated_lock_exists_and_is_preemptible(s)        __seqprop(s, preemptible)

That's just _realllllllly_ long.

Would something like the below make it easier to follow?

seqprop_preemptible() is 'obviously' related to __seqprop_preemptible()
without having to follow through the _Generic token pasting maze.

---
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8d8552474c64..576594add921 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -307,10 +307,10 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
 	__seqprop_case((s),	mutex,		prop),			\
 	__seqprop_case((s),	ww_mutex,	prop))
 
-#define __seqcount_ptr(s)		__seqprop(s, ptr)
-#define __seqcount_sequence(s)		__seqprop(s, sequence)
-#define __seqcount_lock_preemptible(s)	__seqprop(s, preemptible)
-#define __seqcount_assert_lock_held(s)	__seqprop(s, assert)
+#define seqprop_ptr(s)			__seqprop(s, ptr)
+#define seqprop_sequence(s)		__seqprop(s, sequence)
+#define seqprop_preemptible(s)		__seqprop(s, preemptible)
+#define seqprop_assert(s)	__seqprop(s, assert)
 
 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -330,7 +330,7 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
 ({									\
 	unsigned __seq;							\
 									\
-	while ((__seq = __seqcount_sequence(s)) & 1)			\
+	while ((__seq = seqprop_sequence(s)) & 1)			\
 		cpu_relax();						\
 									\
 	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);			\
@@ -359,7 +359,7 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
  */
 #define read_seqcount_begin(s)						\
 ({									\
-	seqcount_lockdep_reader_access(__seqcount_ptr(s));		\
+	seqcount_lockdep_reader_access(seqprop_ptr(s));			\
 	raw_read_seqcount_begin(s);					\
 })
 
@@ -376,7 +376,7 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
  */
 #define raw_read_seqcount(s)						\
 ({									\
-	unsigned __seq = __seqcount_sequence(s);			\
+	unsigned __seq = seqprop_sequence(s);				\
 									\
 	smp_rmb();							\
 	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);			\
@@ -425,7 +425,7 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
  * Return: true if a read section retry is required, else false
  */
 #define __read_seqcount_retry(s, start)					\
-	__read_seqcount_t_retry(__seqcount_ptr(s), start)
+	__read_seqcount_t_retry(seqprop_ptr(s), start)
 
 static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
@@ -445,7 +445,7 @@ static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
  * Return: true if a read section retry is required, else false
  */
 #define read_seqcount_retry(s, start)					\
-	read_seqcount_t_retry(__seqcount_ptr(s), start)
+	read_seqcount_t_retry(seqprop_ptr(s), start)
 
 static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
@@ -459,10 +459,10 @@ static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
  */
 #define raw_write_seqcount_begin(s)					\
 do {									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_disable();					\
 									\
-	raw_write_seqcount_t_begin(__seqcount_ptr(s));			\
+	raw_write_seqcount_t_begin(seqprop_ptr(s));			\
 } while (0)
 
 static inline void raw_write_seqcount_t_begin(seqcount_t *s)
@@ -478,9 +478,9 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
  */
 #define raw_write_seqcount_end(s)					\
 do {									\
-	raw_write_seqcount_t_end(__seqcount_ptr(s));			\
+	raw_write_seqcount_t_end(seqprop_ptr(s));			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_enable();					\
 } while (0)
 
@@ -501,12 +501,12 @@ static inline void raw_write_seqcount_t_end(seqcount_t *s)
  */
 #define write_seqcount_begin_nested(s, subclass)			\
 do {									\
-	__seqcount_assert_lock_held(s);					\
+	seqprop_assert(s);						\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin_nested(__seqcount_ptr(s), subclass);	\
+	write_seqcount_t_begin_nested(seqprop_ptr(s), subclass);	\
 } while (0)
 
 static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
@@ -528,12 +528,12 @@ static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
  */
 #define write_seqcount_begin(s)						\
 do {									\
-	__seqcount_assert_lock_held(s);					\
+	seqprop_assert(s);						\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin(__seqcount_ptr(s));			\
+	write_seqcount_t_begin(seqprop_ptr(s));				\
 } while (0)
 
 static inline void write_seqcount_t_begin(seqcount_t *s)
@@ -549,9 +549,9 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
  */
 #define write_seqcount_end(s)						\
 do {									\
-	write_seqcount_t_end(__seqcount_ptr(s));			\
+	write_seqcount_t_end(seqprop_ptr(s));				\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_enable();					\
 } while (0)
 
@@ -603,7 +603,7 @@ static inline void write_seqcount_t_end(seqcount_t *s)
  *      }
  */
 #define raw_write_seqcount_barrier(s)					\
-	raw_write_seqcount_t_barrier(__seqcount_ptr(s))
+	raw_write_seqcount_t_barrier(seqprop_ptr(s))
 
 static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
 {
@@ -623,7 +623,7 @@ static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
  * will complete successfully and see data older than this.
  */
 #define write_seqcount_invalidate(s)					\
-	write_seqcount_t_invalidate(__seqcount_ptr(s))
+	write_seqcount_t_invalidate(seqprop_ptr(s))
 
 static inline void write_seqcount_t_invalidate(seqcount_t *s)
 {


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: locking/core] seqlock: Rename __seqprop() users
  2020-11-10 11:53                         ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Peter Zijlstra
@ 2020-12-03 10:35                           ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-12-03 10:35 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     ab440b2c604b60fe90885270fcfeb5c3dd5d6fae
Gitweb:        https://git.kernel.org/tip/ab440b2c604b60fe90885270fcfeb5c3dd5d6fae
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 10 Nov 2020 13:44:17 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 03 Dec 2020 11:20:51 +01:00

seqlock: Rename __seqprop() users

More consistent naming should make it easier to untangle the _Generic
token pasting maze called __seqprop().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201110115358.GE2594@hirez.programming.kicks-ass.net
---
 include/linux/seqlock.h | 46 ++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8d85524..d89134c 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -307,10 +307,10 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
 	__seqprop_case((s),	mutex,		prop),			\
 	__seqprop_case((s),	ww_mutex,	prop))
 
-#define __seqcount_ptr(s)		__seqprop(s, ptr)
-#define __seqcount_sequence(s)		__seqprop(s, sequence)
-#define __seqcount_lock_preemptible(s)	__seqprop(s, preemptible)
-#define __seqcount_assert_lock_held(s)	__seqprop(s, assert)
+#define seqprop_ptr(s)			__seqprop(s, ptr)
+#define seqprop_sequence(s)		__seqprop(s, sequence)
+#define seqprop_preemptible(s)		__seqprop(s, preemptible)
+#define seqprop_assert(s)		__seqprop(s, assert)
 
 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -330,7 +330,7 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
 ({									\
 	unsigned __seq;							\
 									\
-	while ((__seq = __seqcount_sequence(s)) & 1)			\
+	while ((__seq = seqprop_sequence(s)) & 1)			\
 		cpu_relax();						\
 									\
 	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);			\
@@ -359,7 +359,7 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
  */
 #define read_seqcount_begin(s)						\
 ({									\
-	seqcount_lockdep_reader_access(__seqcount_ptr(s));		\
+	seqcount_lockdep_reader_access(seqprop_ptr(s));			\
 	raw_read_seqcount_begin(s);					\
 })
 
@@ -376,7 +376,7 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
  */
 #define raw_read_seqcount(s)						\
 ({									\
-	unsigned __seq = __seqcount_sequence(s);			\
+	unsigned __seq = seqprop_sequence(s);				\
 									\
 	smp_rmb();							\
 	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);			\
@@ -425,7 +425,7 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
  * Return: true if a read section retry is required, else false
  */
 #define __read_seqcount_retry(s, start)					\
-	__read_seqcount_t_retry(__seqcount_ptr(s), start)
+	__read_seqcount_t_retry(seqprop_ptr(s), start)
 
 static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
@@ -445,7 +445,7 @@ static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
  * Return: true if a read section retry is required, else false
  */
 #define read_seqcount_retry(s, start)					\
-	read_seqcount_t_retry(__seqcount_ptr(s), start)
+	read_seqcount_t_retry(seqprop_ptr(s), start)
 
 static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
@@ -459,10 +459,10 @@ static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
  */
 #define raw_write_seqcount_begin(s)					\
 do {									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_disable();					\
 									\
-	raw_write_seqcount_t_begin(__seqcount_ptr(s));			\
+	raw_write_seqcount_t_begin(seqprop_ptr(s));			\
 } while (0)
 
 static inline void raw_write_seqcount_t_begin(seqcount_t *s)
@@ -478,9 +478,9 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
  */
 #define raw_write_seqcount_end(s)					\
 do {									\
-	raw_write_seqcount_t_end(__seqcount_ptr(s));			\
+	raw_write_seqcount_t_end(seqprop_ptr(s));			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_enable();					\
 } while (0)
 
@@ -501,12 +501,12 @@ static inline void raw_write_seqcount_t_end(seqcount_t *s)
  */
 #define write_seqcount_begin_nested(s, subclass)			\
 do {									\
-	__seqcount_assert_lock_held(s);					\
+	seqprop_assert(s);						\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin_nested(__seqcount_ptr(s), subclass);	\
+	write_seqcount_t_begin_nested(seqprop_ptr(s), subclass);	\
 } while (0)
 
 static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
@@ -528,12 +528,12 @@ static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
  */
 #define write_seqcount_begin(s)						\
 do {									\
-	__seqcount_assert_lock_held(s);					\
+	seqprop_assert(s);						\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin(__seqcount_ptr(s));			\
+	write_seqcount_t_begin(seqprop_ptr(s));				\
 } while (0)
 
 static inline void write_seqcount_t_begin(seqcount_t *s)
@@ -549,9 +549,9 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
  */
 #define write_seqcount_end(s)						\
 do {									\
-	write_seqcount_t_end(__seqcount_ptr(s));			\
+	write_seqcount_t_end(seqprop_ptr(s));				\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (seqprop_preemptible(s))					\
 		preempt_enable();					\
 } while (0)
 
@@ -603,7 +603,7 @@ static inline void write_seqcount_t_end(seqcount_t *s)
  *      }
  */
 #define raw_write_seqcount_barrier(s)					\
-	raw_write_seqcount_t_barrier(__seqcount_ptr(s))
+	raw_write_seqcount_t_barrier(seqprop_ptr(s))
 
 static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
 {
@@ -623,7 +623,7 @@ static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
  * will complete successfully and see data older than this.
  */
 #define write_seqcount_invalidate(s)					\
-	write_seqcount_t_invalidate(__seqcount_ptr(s))
+	write_seqcount_t_invalidate(seqprop_ptr(s))
 
 static inline void write_seqcount_t_invalidate(seqcount_t *s)
 {

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: locking/core] seqlock: Prefix internal seqcount_t-only macros with a "do_"
  2020-11-04 18:38                         ` Linus Torvalds
  2020-11-04 19:54                           ` Ahmed S. Darwish
@ 2020-12-09 18:38                           ` tip-bot2 for Ahmed S. Darwish
  2020-12-09 18:38                           ` [tip: locking/core] seqlock: kernel-doc: Specify when preemption is automatically altered tip-bot2 for Ahmed S. Darwish
  2 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-12-09 18:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     66bcfcdf89d00f2409f4b5da0f8c20c08318dc72
Gitweb:        https://git.kernel.org/tip/66bcfcdf89d00f2409f4b5da0f8c20c08318dc72
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Sun, 06 Dec 2020 17:21:42 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Dec 2020 17:08:49 +01:00

seqlock: Prefix internal seqcount_t-only macros with a "do_"

When the seqcount_LOCKNAME_t group of data types were introduced, two
classes of seqlock.h sequence counter macros were added:

  - An external public API which can either take a plain seqcount_t or
    any of the seqcount_LOCKNAME_t variants.

  - An internal API which takes only a plain seqcount_t.

To distinguish between the two groups, the "*_seqcount_t_*" pattern was
used for the latter. This confused a number of mm/ call-site developers,
and Linus also commented that it was not a standard practice for marking
seqlock.h internal APIs.

Distinguish the latter group of macros by prefixing a "do_".

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/CAHk-=wikhGExmprXgaW+MVXG1zsGpztBbVwOb23vetk41EtTBQ@mail.gmail.com
---
 include/linux/seqlock.h | 66 ++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index d89134c..235cbc6 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -425,9 +425,9 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
  * Return: true if a read section retry is required, else false
  */
 #define __read_seqcount_retry(s, start)					\
-	__read_seqcount_t_retry(seqprop_ptr(s), start)
+	do___read_seqcount_retry(seqprop_ptr(s), start)
 
-static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
+static inline int do___read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	kcsan_atomic_next(0);
 	return unlikely(READ_ONCE(s->sequence) != start);
@@ -445,12 +445,12 @@ static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
  * Return: true if a read section retry is required, else false
  */
 #define read_seqcount_retry(s, start)					\
-	read_seqcount_t_retry(seqprop_ptr(s), start)
+	do_read_seqcount_retry(seqprop_ptr(s), start)
 
-static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
+static inline int do_read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_t_retry(s, start);
+	return do___read_seqcount_retry(s, start);
 }
 
 /**
@@ -462,10 +462,10 @@ do {									\
 	if (seqprop_preemptible(s))					\
 		preempt_disable();					\
 									\
-	raw_write_seqcount_t_begin(seqprop_ptr(s));			\
+	do_raw_write_seqcount_begin(seqprop_ptr(s));			\
 } while (0)
 
-static inline void raw_write_seqcount_t_begin(seqcount_t *s)
+static inline void do_raw_write_seqcount_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -478,13 +478,13 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
  */
 #define raw_write_seqcount_end(s)					\
 do {									\
-	raw_write_seqcount_t_end(seqprop_ptr(s));			\
+	do_raw_write_seqcount_end(seqprop_ptr(s));			\
 									\
 	if (seqprop_preemptible(s))					\
 		preempt_enable();					\
 } while (0)
 
-static inline void raw_write_seqcount_t_end(seqcount_t *s)
+static inline void do_raw_write_seqcount_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
@@ -506,12 +506,12 @@ do {									\
 	if (seqprop_preemptible(s))					\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin_nested(seqprop_ptr(s), subclass);	\
+	do_write_seqcount_begin_nested(seqprop_ptr(s), subclass);	\
 } while (0)
 
-static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
+static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
-	raw_write_seqcount_t_begin(s);
+	do_raw_write_seqcount_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
@@ -533,12 +533,12 @@ do {									\
 	if (seqprop_preemptible(s))					\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin(seqprop_ptr(s));				\
+	do_write_seqcount_begin(seqprop_ptr(s));			\
 } while (0)
 
-static inline void write_seqcount_t_begin(seqcount_t *s)
+static inline void do_write_seqcount_begin(seqcount_t *s)
 {
-	write_seqcount_t_begin_nested(s, 0);
+	do_write_seqcount_begin_nested(s, 0);
 }
 
 /**
@@ -549,16 +549,16 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
  */
 #define write_seqcount_end(s)						\
 do {									\
-	write_seqcount_t_end(seqprop_ptr(s));				\
+	do_write_seqcount_end(seqprop_ptr(s));				\
 									\
 	if (seqprop_preemptible(s))					\
 		preempt_enable();					\
 } while (0)
 
-static inline void write_seqcount_t_end(seqcount_t *s)
+static inline void do_write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
-	raw_write_seqcount_t_end(s);
+	do_raw_write_seqcount_end(s);
 }
 
 /**
@@ -603,9 +603,9 @@ static inline void write_seqcount_t_end(seqcount_t *s)
  *      }
  */
 #define raw_write_seqcount_barrier(s)					\
-	raw_write_seqcount_t_barrier(seqprop_ptr(s))
+	do_raw_write_seqcount_barrier(seqprop_ptr(s))
 
-static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
+static inline void do_raw_write_seqcount_barrier(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -623,9 +623,9 @@ static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
  * will complete successfully and see data older than this.
  */
 #define write_seqcount_invalidate(s)					\
-	write_seqcount_t_invalidate(seqprop_ptr(s))
+	do_write_seqcount_invalidate(seqprop_ptr(s))
 
-static inline void write_seqcount_t_invalidate(seqcount_t *s)
+static inline void do_write_seqcount_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	kcsan_nestable_atomic_begin();
@@ -865,9 +865,9 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 }
 
 /*
- * For all seqlock_t write side functions, use write_seqcount_*t*_begin()
- * instead of the generic write_seqcount_begin(). This way, no redundant
- * lockdep_assert_held() checks are added.
+ * For all seqlock_t write side functions, use the the internal
+ * do_write_seqcount_begin() instead of generic write_seqcount_begin().
+ * This way, no redundant lockdep_assert_held() checks are added.
  */
 
 /**
@@ -886,7 +886,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	do_write_seqcount_begin(&sl->seqcount.seqcount);
 }
 
 /**
@@ -898,7 +898,7 @@ static inline void write_seqlock(seqlock_t *sl)
  */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock(&sl->lock);
 }
 
@@ -912,7 +912,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	do_write_seqcount_begin(&sl->seqcount.seqcount);
 }
 
 /**
@@ -925,7 +925,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
  */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
@@ -939,7 +939,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	do_write_seqcount_begin(&sl->seqcount.seqcount);
 }
 
 /**
@@ -951,7 +951,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
  */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock_irq(&sl->lock);
 }
 
@@ -960,7 +960,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	do_write_seqcount_begin(&sl->seqcount.seqcount);
 	return flags;
 }
 
@@ -989,7 +989,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: locking/core] seqlock: kernel-doc: Specify when preemption is automatically altered
  2020-11-04 18:38                         ` Linus Torvalds
  2020-11-04 19:54                           ` Ahmed S. Darwish
  2020-12-09 18:38                           ` [tip: locking/core] seqlock: Prefix internal seqcount_t-only macros with a "do_" tip-bot2 for Ahmed S. Darwish
@ 2020-12-09 18:38                           ` tip-bot2 for Ahmed S. Darwish
  2 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-12-09 18:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     cb262935a166bdef0ccfe6e2adffa00c0f2d038a
Gitweb:        https://git.kernel.org/tip/cb262935a166bdef0ccfe6e2adffa00c0f2d038a
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Sun, 06 Dec 2020 17:21:43 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Dec 2020 17:08:49 +01:00

seqlock: kernel-doc: Specify when preemption is automatically altered

The kernel-doc annotations for sequence counters write side functions
are incomplete: they do not specify when preemption is automatically
disabled and re-enabled.

This has confused a number of call-site developers. Fix it.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/CAHk-=wikhGExmprXgaW+MVXG1zsGpztBbVwOb23vetk41EtTBQ@mail.gmail.com
---
 include/linux/seqlock.h | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 235cbc6..2f7bb92 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -456,6 +456,8 @@ static inline int do_read_seqcount_retry(const seqcount_t *s, unsigned start)
 /**
  * raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
  * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
+ *
+ * Context: check write_seqcount_begin()
  */
 #define raw_write_seqcount_begin(s)					\
 do {									\
@@ -475,6 +477,8 @@ static inline void do_raw_write_seqcount_begin(seqcount_t *s)
 /**
  * raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep
  * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
+ *
+ * Context: check write_seqcount_end()
  */
 #define raw_write_seqcount_end(s)					\
 do {									\
@@ -498,6 +502,7 @@ static inline void do_raw_write_seqcount_end(seqcount_t *s)
  * @subclass: lockdep nesting level
  *
  * See Documentation/locking/lockdep-design.rst
+ * Context: check write_seqcount_begin()
  */
 #define write_seqcount_begin_nested(s, subclass)			\
 do {									\
@@ -519,11 +524,10 @@ static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
  * write_seqcount_begin() - start a seqcount_t write side critical section
  * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
  *
- * write_seqcount_begin opens a write side critical section of the given
- * seqcount_t.
- *
- * Context: seqcount_t write side critical sections must be serialized and
- * non-preemptible. If readers can be invoked from hardirq or softirq
+ * Context: sequence counter write side sections must be serialized and
+ * non-preemptible. Preemption will be automatically disabled if and
+ * only if the seqcount write serialization lock is associated, and
+ * preemptible.  If readers can be invoked from hardirq or softirq
  * context, interrupts or bottom halves must be respectively disabled.
  */
 #define write_seqcount_begin(s)						\
@@ -545,7 +549,8 @@ static inline void do_write_seqcount_begin(seqcount_t *s)
  * write_seqcount_end() - end a seqcount_t write side critical section
  * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
  *
- * The write section must've been opened with write_seqcount_begin().
+ * Context: Preemption will be automatically re-enabled if and only if
+ * the seqcount write serialization lock is associated, and preemptible.
  */
 #define write_seqcount_end(s)						\
 do {									\

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-12-09 18:42 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
2020-10-30 14:46 ` [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast() Jason Gunthorpe
2020-10-30 16:29   ` Jan Kara
2020-10-30 21:31   ` John Hubbard
2020-10-30 22:36   ` Peter Xu
2020-11-02 22:19 ` [PATCH v2 0/2] Add a seqcount between gup_fast and copy_page_range() Ahmed S. Darwish
2020-11-02 22:39   ` Linus Torvalds
2020-11-02 23:18     ` Ahmed S. Darwish
     [not found] ` <2-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
2020-10-30 16:51   ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Jan Kara
     [not found]     ` <20201030170226.GF2620339@nvidia.com>
2020-11-02  8:31       ` Jan Kara
2020-10-30 22:52   ` Peter Xu
     [not found]     ` <20201030235121.GQ2620339@nvidia.com>
2020-10-31 15:26       ` Peter Xu
2020-11-03  0:33         ` Ahmed S. Darwish
2020-11-03  0:17       ` Ahmed S. Darwish
     [not found]         ` <20201103002532.GL2620339@nvidia.com>
2020-11-03  0:41           ` Ahmed S. Darwish
2020-11-03  2:20             ` John Hubbard
2020-11-03  6:52               ` Ahmed S. Darwish
2020-11-03 17:40                 ` Linus Torvalds
2020-11-04  1:32                   ` Ahmed S. Darwish
2020-11-04  2:01                     ` John Hubbard
2020-11-04  3:17                       ` Ahmed S. Darwish
2020-11-04 18:38                         ` Linus Torvalds
2020-11-04 19:54                           ` Ahmed S. Darwish
2020-12-09 18:38                           ` [tip: locking/core] seqlock: Prefix internal seqcount_t-only macros with a "do_" tip-bot2 for Ahmed S. Darwish
2020-12-09 18:38                           ` [tip: locking/core] seqlock: kernel-doc: Specify when preemption is automatically altered tip-bot2 for Ahmed S. Darwish
2020-11-10 11:53                         ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Peter Zijlstra
2020-12-03 10:35                           ` [tip: locking/core] seqlock: Rename __seqprop() users tip-bot2 for Peter Zijlstra
2020-11-03 17:03         ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Peter Xu
2020-11-02 23:58   ` Ahmed S. Darwish

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).