[-next,v2] mm: annotate a data race in page_zonenum()
diff mbox series

Message ID 1581619089-14472-1-git-send-email-cai@lca.pw
State In Next
Commit 8336b603b001e66e325341c1ff368ad01da921b6
Headers show
Series
  • [-next,v2] mm: annotate a data race in page_zonenum()
Related show

Commit Message

Qian Cai Feb. 13, 2020, 6:38 p.m. UTC
BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page

 write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
  page_cpupid_xchg_last+0x51/0x80
  page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
  wp_page_reuse+0x3e/0xc0
  wp_page_reuse at mm/memory.c:2453
  do_wp_page+0x472/0x7b0
  do_wp_page at mm/memory.c:2798
  __handle_mm_fault+0xcb0/0xd00
  handle_pte_fault at mm/memory.c:4049
  (inlined by) __handle_mm_fault at mm/memory.c:4163
  handle_mm_fault+0xfc/0x2f0
  handle_mm_fault at mm/memory.c:4200
  do_page_fault+0x263/0x6f9
  do_user_addr_fault at arch/x86/mm/fault.c:1465
  (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
  page_fault+0x34/0x40

 read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
  put_page+0x15a/0x1f0
  page_zonenum at include/linux/mm.h:923
  (inlined by) is_zone_device_page at include/linux/mm.h:929
  (inlined by) page_is_devmap_managed at include/linux/mm.h:948
  (inlined by) put_page at include/linux/mm.h:1023
  wp_page_copy+0x571/0x930
  wp_page_copy at mm/memory.c:2615
  do_wp_page+0x107/0x7b0
  __handle_mm_fault+0xcb0/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

A page never changes its zone number. The zone number happens to be
stored in the same word as other bits which are modified, but the zone
number bits will never be modified by any other write, so it can accept
a reload of the zone bits after an intervening write and it don't need
to use READ_ONCE(). Thus, annotate this data race using
ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
writes to it.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: use ASSERT_EXCLUSIVE_BITS().

BTW, not sure if it is easier for Andrew with Paul to pick this up (with
Andrew's ACK), since ASSERT_EXCLUSIVE_BITS() is in -rcu tree only (or likely
tomorrow's -next tree).

 include/linux/mm.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Paul E. McKenney Feb. 13, 2020, 7:26 p.m. UTC | #1
On Thu, Feb 13, 2020 at 01:38:09PM -0500, Qian Cai wrote:
>  BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> 
>  write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
>   page_cpupid_xchg_last+0x51/0x80
>   page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
>   wp_page_reuse+0x3e/0xc0
>   wp_page_reuse at mm/memory.c:2453
>   do_wp_page+0x472/0x7b0
>   do_wp_page at mm/memory.c:2798
>   __handle_mm_fault+0xcb0/0xd00
>   handle_pte_fault at mm/memory.c:4049
>   (inlined by) __handle_mm_fault at mm/memory.c:4163
>   handle_mm_fault+0xfc/0x2f0
>   handle_mm_fault at mm/memory.c:4200
>   do_page_fault+0x263/0x6f9
>   do_user_addr_fault at arch/x86/mm/fault.c:1465
>   (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
>   page_fault+0x34/0x40
> 
>  read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
>   put_page+0x15a/0x1f0
>   page_zonenum at include/linux/mm.h:923
>   (inlined by) is_zone_device_page at include/linux/mm.h:929
>   (inlined by) page_is_devmap_managed at include/linux/mm.h:948
>   (inlined by) put_page at include/linux/mm.h:1023
>   wp_page_copy+0x571/0x930
>   wp_page_copy at mm/memory.c:2615
>   do_wp_page+0x107/0x7b0
>   __handle_mm_fault+0xcb0/0xd00
>   handle_mm_fault+0xfc/0x2f0
>   do_page_fault+0x263/0x6f9
>   page_fault+0x34/0x40
> 
>  Reported by Kernel Concurrency Sanitizer on:
>  CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
>  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> 
> A page never changes its zone number. The zone number happens to be
> stored in the same word as other bits which are modified, but the zone
> number bits will never be modified by any other write, so it can accept
> a reload of the zone bits after an intervening write and it don't need
> to use READ_ONCE(). Thus, annotate this data race using
> ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> writes to it.
> 
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
> 
> v2: use ASSERT_EXCLUSIVE_BITS().
> 
> BTW, not sure if it is easier for Andrew with Paul to pick this up (with
> Andrew's ACK), since ASSERT_EXCLUSIVE_BITS() is in -rcu tree only (or likely
> tomorrow's -next tree).

Here are the options I know of, any of which work for me:

1.	I take the patch given appropriate acks/reviews.

2.	Someone hangs onto the patch until the KCSAN infrastructure
	hits mainline and then sends it up via whatever path.

3.	One way to do #2 is to merge the -rcu tree's "kcsan" branch and
	then queue this patch on top of that, again sending this patch
	along once KCSAN hits mainline.  Unusually for the -rcu tree,
	the "kcsan" branch is not supposed to be rebased.

Either way, just let me know!

							Thanx, Paul

>  include/linux/mm.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 52269e56c514..0d70fafd055c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -920,6 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>  
>  static inline enum zone_type page_zonenum(const struct page *page)
>  {
> +	ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
>  	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
>  }
>  
> -- 
> 1.8.3.1
>
John Hubbard Feb. 13, 2020, 9:20 p.m. UTC | #2
On 2/13/20 10:38 AM, Qian Cai wrote:
>  BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> 
>  write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
>   page_cpupid_xchg_last+0x51/0x80
>   page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
>   wp_page_reuse+0x3e/0xc0
>   wp_page_reuse at mm/memory.c:2453
>   do_wp_page+0x472/0x7b0
>   do_wp_page at mm/memory.c:2798
>   __handle_mm_fault+0xcb0/0xd00
>   handle_pte_fault at mm/memory.c:4049
>   (inlined by) __handle_mm_fault at mm/memory.c:4163
>   handle_mm_fault+0xfc/0x2f0
>   handle_mm_fault at mm/memory.c:4200
>   do_page_fault+0x263/0x6f9
>   do_user_addr_fault at arch/x86/mm/fault.c:1465
>   (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
>   page_fault+0x34/0x40
> 
>  read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
>   put_page+0x15a/0x1f0
>   page_zonenum at include/linux/mm.h:923
>   (inlined by) is_zone_device_page at include/linux/mm.h:929
>   (inlined by) page_is_devmap_managed at include/linux/mm.h:948
>   (inlined by) put_page at include/linux/mm.h:1023
>   wp_page_copy+0x571/0x930
>   wp_page_copy at mm/memory.c:2615
>   do_wp_page+0x107/0x7b0
>   __handle_mm_fault+0xcb0/0xd00
>   handle_mm_fault+0xfc/0x2f0
>   do_page_fault+0x263/0x6f9
>   page_fault+0x34/0x40
> 
>  Reported by Kernel Concurrency Sanitizer on:
>  CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
>  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> 
> A page never changes its zone number. The zone number happens to be
> stored in the same word as other bits which are modified, but the zone
> number bits will never be modified by any other write, so it can accept
> a reload of the zone bits after an intervening write and it don't need
> to use READ_ONCE(). Thus, annotate this data race using
> ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> writes to it.
> 
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
> 
> v2: use ASSERT_EXCLUSIVE_BITS().


Much cleaner, thanks to this new macro. You can add:


    Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
Qian Cai Feb. 14, 2020, 2:30 p.m. UTC | #3
On Thu, 2020-02-13 at 11:26 -0800, Paul E. McKenney wrote:
> On Thu, Feb 13, 2020 at 01:38:09PM -0500, Qian Cai wrote:
> >  BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > 
> >  write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> >   page_cpupid_xchg_last+0x51/0x80
> >   page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> >   wp_page_reuse+0x3e/0xc0
> >   wp_page_reuse at mm/memory.c:2453
> >   do_wp_page+0x472/0x7b0
> >   do_wp_page at mm/memory.c:2798
> >   __handle_mm_fault+0xcb0/0xd00
> >   handle_pte_fault at mm/memory.c:4049
> >   (inlined by) __handle_mm_fault at mm/memory.c:4163
> >   handle_mm_fault+0xfc/0x2f0
> >   handle_mm_fault at mm/memory.c:4200
> >   do_page_fault+0x263/0x6f9
> >   do_user_addr_fault at arch/x86/mm/fault.c:1465
> >   (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> >   page_fault+0x34/0x40
> > 
> >  read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> >   put_page+0x15a/0x1f0
> >   page_zonenum at include/linux/mm.h:923
> >   (inlined by) is_zone_device_page at include/linux/mm.h:929
> >   (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> >   (inlined by) put_page at include/linux/mm.h:1023
> >   wp_page_copy+0x571/0x930
> >   wp_page_copy at mm/memory.c:2615
> >   do_wp_page+0x107/0x7b0
> >   __handle_mm_fault+0xcb0/0xd00
> >   handle_mm_fault+0xfc/0x2f0
> >   do_page_fault+0x263/0x6f9
> >   page_fault+0x34/0x40
> > 
> >  Reported by Kernel Concurrency Sanitizer on:
> >  CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
> >  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > 
> > A page never changes its zone number. The zone number happens to be
> > stored in the same word as other bits which are modified, but the zone
> > number bits will never be modified by any other write, so it can accept
> > a reload of the zone bits after an intervening write and it don't need
> > to use READ_ONCE(). Thus, annotate this data race using
> > ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> > writes to it.
> > 
> > Suggested-by: Marco Elver <elver@google.com>
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> > 
> > v2: use ASSERT_EXCLUSIVE_BITS().
> > 
> > BTW, not sure if it is easier for Andrew with Paul to pick this up (with
> > Andrew's ACK), since ASSERT_EXCLUSIVE_BITS() is in -rcu tree only (or likely
> > tomorrow's -next tree).
> 
> Here are the options I know of, any of which work for me:
> 
> 1.	I take the patch given appropriate acks/reviews.
> 
> 2.	Someone hangs onto the patch until the KCSAN infrastructure
> 	hits mainline and then sends it up via whatever path.
> 
> 3.	One way to do #2 is to merge the -rcu tree's "kcsan" branch and
> 	then queue this patch on top of that, again sending this patch
> 	along once KCSAN hits mainline.  Unusually for the -rcu tree,
> 	the "kcsan" branch is not supposed to be rebased.
> 
> Either way, just let me know!

I suppose it is up to Andrew what he prefers. I may have another pile of patches
depends on some of those KCSAN things (except the data_race() macro) are coming,
so I'd like to find a way to get into -next first rather than waiting to send
them in April (once KCSAN hit the mainline) if possible.

> 
> 							Thanx, Paul
> 
> >  include/linux/mm.h | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 52269e56c514..0d70fafd055c 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -920,6 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> >  
> >  static inline enum zone_type page_zonenum(const struct page *page)
> >  {
> > +	ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
> >  	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> >  }
> >  
> > -- 
> > 1.8.3.1
> >
Paul E. McKenney Feb. 14, 2020, 4:16 p.m. UTC | #4
On Fri, Feb 14, 2020 at 09:30:26AM -0500, Qian Cai wrote:
> On Thu, 2020-02-13 at 11:26 -0800, Paul E. McKenney wrote:
> > On Thu, Feb 13, 2020 at 01:38:09PM -0500, Qian Cai wrote:
> > >  BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > > 
> > >  write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> > >   page_cpupid_xchg_last+0x51/0x80
> > >   page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> > >   wp_page_reuse+0x3e/0xc0
> > >   wp_page_reuse at mm/memory.c:2453
> > >   do_wp_page+0x472/0x7b0
> > >   do_wp_page at mm/memory.c:2798
> > >   __handle_mm_fault+0xcb0/0xd00
> > >   handle_pte_fault at mm/memory.c:4049
> > >   (inlined by) __handle_mm_fault at mm/memory.c:4163
> > >   handle_mm_fault+0xfc/0x2f0
> > >   handle_mm_fault at mm/memory.c:4200
> > >   do_page_fault+0x263/0x6f9
> > >   do_user_addr_fault at arch/x86/mm/fault.c:1465
> > >   (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> > >   page_fault+0x34/0x40
> > > 
> > >  read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> > >   put_page+0x15a/0x1f0
> > >   page_zonenum at include/linux/mm.h:923
> > >   (inlined by) is_zone_device_page at include/linux/mm.h:929
> > >   (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> > >   (inlined by) put_page at include/linux/mm.h:1023
> > >   wp_page_copy+0x571/0x930
> > >   wp_page_copy at mm/memory.c:2615
> > >   do_wp_page+0x107/0x7b0
> > >   __handle_mm_fault+0xcb0/0xd00
> > >   handle_mm_fault+0xfc/0x2f0
> > >   do_page_fault+0x263/0x6f9
> > >   page_fault+0x34/0x40
> > > 
> > >  Reported by Kernel Concurrency Sanitizer on:
> > >  CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
> > >  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > > 
> > > A page never changes its zone number. The zone number happens to be
> > > stored in the same word as other bits which are modified, but the zone
> > > number bits will never be modified by any other write, so it can accept
> > > a reload of the zone bits after an intervening write and it don't need
> > > to use READ_ONCE(). Thus, annotate this data race using
> > > ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> > > writes to it.
> > > 
> > > Suggested-by: Marco Elver <elver@google.com>
> > > Signed-off-by: Qian Cai <cai@lca.pw>
> > > ---
> > > 
> > > v2: use ASSERT_EXCLUSIVE_BITS().
> > > 
> > > BTW, not sure if it is easier for Andrew with Paul to pick this up (with
> > > Andrew's ACK), since ASSERT_EXCLUSIVE_BITS() is in -rcu tree only (or likely
> > > tomorrow's -next tree).
> > 
> > Here are the options I know of, any of which work for me:
> > 
> > 1.	I take the patch given appropriate acks/reviews.
> > 
> > 2.	Someone hangs onto the patch until the KCSAN infrastructure
> > 	hits mainline and then sends it up via whatever path.
> > 
> > 3.	One way to do #2 is to merge the -rcu tree's "kcsan" branch and
> > 	then queue this patch on top of that, again sending this patch
> > 	along once KCSAN hits mainline.  Unusually for the -rcu tree,
> > 	the "kcsan" branch is not supposed to be rebased.
> > 
> > Either way, just let me know!
> 
> I suppose it is up to Andrew what he prefers. I may have another pile of patches
> depends on some of those KCSAN things (except the data_race() macro) are coming,
> so I'd like to find a way to get into -next first rather than waiting to send
> them in April (once KCSAN hit the mainline) if possible.

Any of the three options above would get you into -next quickly.

So just let me know how you would like to proceed.

							Thanx, Paul

> > >  include/linux/mm.h | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 52269e56c514..0d70fafd055c 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -920,6 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> > >  
> > >  static inline enum zone_type page_zonenum(const struct page *page)
> > >  {
> > > +	ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
> > >  	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> > >  }
> > >  
> > > -- 
> > > 1.8.3.1
> > >
Qian Cai Feb. 18, 2020, 12:42 p.m. UTC | #5
> On Feb 14, 2020, at 11:16 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> Any of the three options above would get you into -next quickly.
> 
> So just let me know how you would like to proceed.

Unless Andrew has any objection, I think it makes sense that you pick this up as it is pretty much a low risk patch.
Qian Cai Feb. 19, 2020, 9:55 p.m. UTC | #6
Andrew, since you had picked the similar one which also depends
on ASSERT_EXCLUSIVE_*, can you pick up this patch as well?

On Thu, 2020-02-13 at 13:20 -0800, John Hubbard wrote:
> On 2/13/20 10:38 AM, Qian Cai wrote:
> >  BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > 
> >  write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> >   page_cpupid_xchg_last+0x51/0x80
> >   page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> >   wp_page_reuse+0x3e/0xc0
> >   wp_page_reuse at mm/memory.c:2453
> >   do_wp_page+0x472/0x7b0
> >   do_wp_page at mm/memory.c:2798
> >   __handle_mm_fault+0xcb0/0xd00
> >   handle_pte_fault at mm/memory.c:4049
> >   (inlined by) __handle_mm_fault at mm/memory.c:4163
> >   handle_mm_fault+0xfc/0x2f0
> >   handle_mm_fault at mm/memory.c:4200
> >   do_page_fault+0x263/0x6f9
> >   do_user_addr_fault at arch/x86/mm/fault.c:1465
> >   (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> >   page_fault+0x34/0x40
> > 
> >  read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> >   put_page+0x15a/0x1f0
> >   page_zonenum at include/linux/mm.h:923
> >   (inlined by) is_zone_device_page at include/linux/mm.h:929
> >   (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> >   (inlined by) put_page at include/linux/mm.h:1023
> >   wp_page_copy+0x571/0x930
> >   wp_page_copy at mm/memory.c:2615
> >   do_wp_page+0x107/0x7b0
> >   __handle_mm_fault+0xcb0/0xd00
> >   handle_mm_fault+0xfc/0x2f0
> >   do_page_fault+0x263/0x6f9
> >   page_fault+0x34/0x40
> > 
> >  Reported by Kernel Concurrency Sanitizer on:
> >  CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
> >  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > 
> > A page never changes its zone number. The zone number happens to be
> > stored in the same word as other bits which are modified, but the zone
> > number bits will never be modified by any other write, so it can accept
> > a reload of the zone bits after an intervening write and it don't need
> > to use READ_ONCE(). Thus, annotate this data race using
> > ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> > writes to it.
> > 
> > Suggested-by: Marco Elver <elver@google.com>
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> > 
> > v2: use ASSERT_EXCLUSIVE_BITS().
> 
> 
> Much cleaner, thanks to this new macro. You can add:
> 
> 
>     Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> 
> 
> thanks,
Paul E. McKenney Feb. 19, 2020, 11:47 p.m. UTC | #7
On Wed, Feb 19, 2020 at 04:55:40PM -0500, Qian Cai wrote:
> Andrew, since you had picked the similar one which also depends
> on ASSERT_EXCLUSIVE_*, can you pick up this patch as well?
> 
> On Thu, 2020-02-13 at 13:20 -0800, John Hubbard wrote:
> > On 2/13/20 10:38 AM, Qian Cai wrote:
> > >  BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > > 
> > >  write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> > >   page_cpupid_xchg_last+0x51/0x80
> > >   page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> > >   wp_page_reuse+0x3e/0xc0
> > >   wp_page_reuse at mm/memory.c:2453
> > >   do_wp_page+0x472/0x7b0
> > >   do_wp_page at mm/memory.c:2798
> > >   __handle_mm_fault+0xcb0/0xd00
> > >   handle_pte_fault at mm/memory.c:4049
> > >   (inlined by) __handle_mm_fault at mm/memory.c:4163
> > >   handle_mm_fault+0xfc/0x2f0
> > >   handle_mm_fault at mm/memory.c:4200
> > >   do_page_fault+0x263/0x6f9
> > >   do_user_addr_fault at arch/x86/mm/fault.c:1465
> > >   (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> > >   page_fault+0x34/0x40
> > > 
> > >  read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> > >   put_page+0x15a/0x1f0
> > >   page_zonenum at include/linux/mm.h:923
> > >   (inlined by) is_zone_device_page at include/linux/mm.h:929
> > >   (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> > >   (inlined by) put_page at include/linux/mm.h:1023
> > >   wp_page_copy+0x571/0x930
> > >   wp_page_copy at mm/memory.c:2615
> > >   do_wp_page+0x107/0x7b0
> > >   __handle_mm_fault+0xcb0/0xd00
> > >   handle_mm_fault+0xfc/0x2f0
> > >   do_page_fault+0x263/0x6f9
> > >   page_fault+0x34/0x40
> > > 
> > >  Reported by Kernel Concurrency Sanitizer on:
> > >  CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
> > >  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > > 
> > > A page never changes its zone number. The zone number happens to be
> > > stored in the same word as other bits which are modified, but the zone
> > > number bits will never be modified by any other write, so it can accept
> > > a reload of the zone bits after an intervening write and it don't need
> > > to use READ_ONCE(). Thus, annotate this data race using
> > > ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
> > > writes to it.
> > > 
> > > Suggested-by: Marco Elver <elver@google.com>
> > > Signed-off-by: Qian Cai <cai@lca.pw>
> > > ---
> > > 
> > > v2: use ASSERT_EXCLUSIVE_BITS().
> > 
> > 
> > Much cleaner, thanks to this new macro. You can add:
> > 
> > 
> >     Reviewed-by: John Hubbard <jhubbard@nvidia.com>

It looks like Andrew already pulled this one in, but let me know if
he drops it for whatever reason.

							Thanx, Paul

Patch
diff mbox series

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..0d70fafd055c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -920,6 +920,7 @@  vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 
 static inline enum zone_type page_zonenum(const struct page *page)
 {
+	ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
 	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
 }