All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next] mm: mark a intentional data race in page_zonenum()
@ 2020-02-06  3:52 Qian Cai
  2020-02-06  4:50 ` John Hubbard
  0 siblings, 1 reply; 11+ messages in thread
From: Qian Cai @ 2020-02-06  3:52 UTC (permalink / raw)
  To: akpm
  Cc: jhubbard, ira.weiny, dan.j.williams, jack, linux-mm,
	linux-kernel, Qian Cai

The commit 07d802699528 ("mm: devmap: refactor 1-based refcounting for
ZONE_DEVICE pages") introduced a data race as page->flags could be
accessed concurrently as noticied by KCSAN,

 BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page

 write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
  page_cpupid_xchg_last+0x51/0x80
  page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
  wp_page_reuse+0x3e/0xc0
  wp_page_reuse at mm/memory.c:2453
  do_wp_page+0x472/0x7b0
  do_wp_page at mm/memory.c:2798
  __handle_mm_fault+0xcb0/0xd00
  handle_pte_fault at mm/memory.c:4049
  (inlined by) __handle_mm_fault at mm/memory.c:4163
  handle_mm_fault+0xfc/0x2f0
  handle_mm_fault at mm/memory.c:4200
  do_page_fault+0x263/0x6f9
  do_user_addr_fault at arch/x86/mm/fault.c:1465
  (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
  page_fault+0x34/0x40

 read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
  put_page+0x15a/0x1f0
  page_zonenum at include/linux/mm.h:923
  (inlined by) is_zone_device_page at include/linux/mm.h:929
  (inlined by) page_is_devmap_managed at include/linux/mm.h:948
  (inlined by) put_page at include/linux/mm.h:1023
  wp_page_copy+0x571/0x930
  wp_page_copy at mm/memory.c:2615
  do_wp_page+0x107/0x7b0
  __handle_mm_fault+0xcb0/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

Both the read and write are done only with the non-exclusive mmap_sem
held. Since the read only check for a specific bit in the flag, even if
load tearing happens, it will be harmless, so just mark it as an
intentional data races using the data_race() macro.

Signed-off-by: Qian Cai <cai@lca.pw>
---
 include/linux/mm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..cafccad584c2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
 
 static inline enum zone_type page_zonenum(const struct page *page)
 {
-	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
+	return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
 }
 
 #ifdef CONFIG_ZONE_DEVICE
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
  2020-02-06  3:52 [PATCH -next] mm: mark a intentional data race in page_zonenum() Qian Cai
@ 2020-02-06  4:50 ` John Hubbard
  2020-02-06  9:04   ` Jan Kara
  2020-02-06 14:01     ` Qian Cai
  0 siblings, 2 replies; 11+ messages in thread
From: John Hubbard @ 2020-02-06  4:50 UTC (permalink / raw)
  To: Qian Cai, akpm; +Cc: ira.weiny, dan.j.williams, jack, linux-mm, linux-kernel

On 2/5/20 7:52 PM, Qian Cai wrote:
> The commit 07d802699528 ("mm: devmap: refactor 1-based refcounting for
> ZONE_DEVICE pages") introduced a data race as page->flags could be

Hi,

I really don't think so. This "race" was there long before that commit.
Anyway, more below:

> accessed concurrently as noticied by KCSAN,
> 
>   BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> 
>   write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
>    page_cpupid_xchg_last+0x51/0x80
>    page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
>    wp_page_reuse+0x3e/0xc0
>    wp_page_reuse at mm/memory.c:2453
>    do_wp_page+0x472/0x7b0
>    do_wp_page at mm/memory.c:2798
>    __handle_mm_fault+0xcb0/0xd00
>    handle_pte_fault at mm/memory.c:4049
>    (inlined by) __handle_mm_fault at mm/memory.c:4163
>    handle_mm_fault+0xfc/0x2f0
>    handle_mm_fault at mm/memory.c:4200
>    do_page_fault+0x263/0x6f9
>    do_user_addr_fault at arch/x86/mm/fault.c:1465
>    (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
>    page_fault+0x34/0x40
> 
>   read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
>    put_page+0x15a/0x1f0
>    page_zonenum at include/linux/mm.h:923
>    (inlined by) is_zone_device_page at include/linux/mm.h:929
>    (inlined by) page_is_devmap_managed at include/linux/mm.h:948
>    (inlined by) put_page at include/linux/mm.h:1023
>    wp_page_copy+0x571/0x930
>    wp_page_copy at mm/memory.c:2615
>    do_wp_page+0x107/0x7b0
>    __handle_mm_fault+0xcb0/0xd00
>    handle_mm_fault+0xfc/0x2f0
>    do_page_fault+0x263/0x6f9
>    page_fault+0x34/0x40
> 
>   Reported by Kernel Concurrency Sanitizer on:
>   CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
>   Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> 
> Both the read and write are done only with the non-exclusive mmap_sem
> held. Since the read only check for a specific bit in the flag, even if


Perhaps a clearer explanation is that the read of the page flags is always
looking at a bit range (zone number: up to 3 bits) that is not being written to by
the writer.


> load tearing happens, it will be harmless, so just mark it as an
> intentional data races using the data_race() macro.
> 
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>   include/linux/mm.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 52269e56c514..cafccad584c2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
>   
>   static inline enum zone_type page_zonenum(const struct page *page)
>   {
> -	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> +	return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);


I don't know about this. Lots of the kernel is written to do this sort
of thing, and adding a load of "data_race()" everywhere is...well, I'm not
sure if it's really the best way.  I wonder: could we maybe teach this
kcsan thing to understand a few of the key idioms, particularly about page
flags, instead of annotating all over the place?



thanks,
-- 
John Hubbard
NVIDIA


>   }
>   
>   #ifdef CONFIG_ZONE_DEVICE
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
  2020-02-06  4:50 ` John Hubbard
@ 2020-02-06  9:04   ` Jan Kara
  2020-02-06 11:14     ` Qian Cai
  2020-02-06 14:01     ` Qian Cai
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Kara @ 2020-02-06  9:04 UTC (permalink / raw)
  To: John Hubbard
  Cc: Qian Cai, akpm, ira.weiny, dan.j.williams, jack, linux-mm, linux-kernel

On Wed 05-02-20 20:50:34, John Hubbard wrote:
> On 2/5/20 7:52 PM, Qian Cai wrote:
> > The commit 07d802699528 ("mm: devmap: refactor 1-based refcounting for
> > ZONE_DEVICE pages") introduced a data race as page->flags could be
> 
> Hi,
> 
> I really don't think so. This "race" was there long before that commit.
> Anyway, more below:
> 
> > accessed concurrently as noticied by KCSAN,
> > 
> >   BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > 
> >   write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> >    page_cpupid_xchg_last+0x51/0x80
> >    page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> >    wp_page_reuse+0x3e/0xc0
> >    wp_page_reuse at mm/memory.c:2453
> >    do_wp_page+0x472/0x7b0
> >    do_wp_page at mm/memory.c:2798
> >    __handle_mm_fault+0xcb0/0xd00
> >    handle_pte_fault at mm/memory.c:4049
> >    (inlined by) __handle_mm_fault at mm/memory.c:4163
> >    handle_mm_fault+0xfc/0x2f0
> >    handle_mm_fault at mm/memory.c:4200
> >    do_page_fault+0x263/0x6f9
> >    do_user_addr_fault at arch/x86/mm/fault.c:1465
> >    (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> >    page_fault+0x34/0x40
> > 
> >   read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> >    put_page+0x15a/0x1f0
> >    page_zonenum at include/linux/mm.h:923
> >    (inlined by) is_zone_device_page at include/linux/mm.h:929
> >    (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> >    (inlined by) put_page at include/linux/mm.h:1023
> >    wp_page_copy+0x571/0x930
> >    wp_page_copy at mm/memory.c:2615
> >    do_wp_page+0x107/0x7b0
> >    __handle_mm_fault+0xcb0/0xd00
> >    handle_mm_fault+0xfc/0x2f0
> >    do_page_fault+0x263/0x6f9
> >    page_fault+0x34/0x40
> > 
> >   Reported by Kernel Concurrency Sanitizer on:
> >   CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
> >   Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > 
> > Both the read and write are done only with the non-exclusive mmap_sem
> > held. Since the read only check for a specific bit in the flag, even if
> 
> Perhaps a clearer explanation is that the read of the page flags is
> always looking at a bit range (zone number: up to 3 bits) that is not
> being written to by the writer.

Well, that's not quite true because we do store full long there. You're
right that we do that with cmpxchg() and modify only some bits in that word
but that may be too difficult for the sanitizer to find out.

> > load tearing happens, it will be harmless, so just mark it as an
> > intentional data races using the data_race() macro.
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> >   include/linux/mm.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 52269e56c514..cafccad584c2 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
> >   static inline enum zone_type page_zonenum(const struct page *page)
> >   {
> > -	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> > +	return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
> 
> 
> I don't know about this. Lots of the kernel is written to do this sort
> of thing, and adding a load of "data_race()" everywhere is...well, I'm not
> sure if it's really the best way.  I wonder: could we maybe teach this
> kcsan thing to understand a few of the key idioms, particularly about page
> flags, instead of annotating all over the place?

So in this particular case, store tearing is non-issue because we use
atomic operation to store the value in page_cpupid_xchg_last(). I think it
would make some sense to use READ_ONCE(page->flags) here to prevent
compiler from loading page->flags several times - I have hard time finding
a reason why a compiler would want to do that but conceptually that
protection makes sense, it is for free performance wise, and will still
allow KCSAN to find a race in case we ever grow a place that modifies
page's zone non-atomically (which might be a real problem). And it should
also silence the KCSAN warning AFAIU.

								Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
  2020-02-06  9:04   ` Jan Kara
@ 2020-02-06 11:14     ` Qian Cai
  0 siblings, 0 replies; 11+ messages in thread
From: Qian Cai @ 2020-02-06 11:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: John Hubbard, akpm, ira.weiny, dan.j.williams, linux-mm, linux-kernel



> On Feb 6, 2020, at 4:04 AM, Jan Kara <jack@suse.cz> wrote:
> 
> So in this particular case, store tearing is non-issue because we use
> atomic operation to store the value in page_cpupid_xchg_last(). I think it
> would make some sense to use READ_ONCE(page->flags) here to prevent
> compiler from loading page->flags several times - I have hard time finding
> a reason why a compiler would want to do that but conceptually that
> protection makes sense, it is for free performance wise, and will still
> allow KCSAN to find a race in case we ever grow a place that modifies
> page's zone non-atomically (which might be a real problem). And it should
> also silence the KCSAN warning AFAIU.

Ah, read up to 3 bits might be an issue then. I’ll post an alternative version which uses READ_ONCE() just for the old page ( because the new page has not been published yet) in wp_page_copy() then.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
  2020-02-06  4:50 ` John Hubbard
@ 2020-02-06 14:01     ` Qian Cai
  2020-02-06 14:01     ` Qian Cai
  1 sibling, 0 replies; 11+ messages in thread
From: Qian Cai @ 2020-02-06 14:01 UTC (permalink / raw)
  To: John Hubbard, akpm
  Cc: ira.weiny, dan.j.williams, jack, linux-mm, linux-kernel, elver

On Wed, 2020-02-05 at 20:50 -0800, John Hubbard wrote:
> On 2/5/20 7:52 PM, Qian Cai wrote:
> > The commit 07d802699528 ("mm: devmap: refactor 1-based refcounting for
> > ZONE_DEVICE pages") introduced a data race as page->flags could be
> 
> Hi,
> 
> I really don't think so. This "race" was there long before that commit.
> Anyway, more below:
> 
> > accessed concurrently as noticied by KCSAN,
> > 
> >   BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > 
> >   write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> >    page_cpupid_xchg_last+0x51/0x80
> >    page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> >    wp_page_reuse+0x3e/0xc0
> >    wp_page_reuse at mm/memory.c:2453
> >    do_wp_page+0x472/0x7b0
> >    do_wp_page at mm/memory.c:2798
> >    __handle_mm_fault+0xcb0/0xd00
> >    handle_pte_fault at mm/memory.c:4049
> >    (inlined by) __handle_mm_fault at mm/memory.c:4163
> >    handle_mm_fault+0xfc/0x2f0
> >    handle_mm_fault at mm/memory.c:4200
> >    do_page_fault+0x263/0x6f9
> >    do_user_addr_fault at arch/x86/mm/fault.c:1465
> >    (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> >    page_fault+0x34/0x40
> > 
> >   read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> >    put_page+0x15a/0x1f0
> >    page_zonenum at include/linux/mm.h:923
> >    (inlined by) is_zone_device_page at include/linux/mm.h:929
> >    (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> >    (inlined by) put_page at include/linux/mm.h:1023
> >    wp_page_copy+0x571/0x930
> >    wp_page_copy at mm/memory.c:2615
> >    do_wp_page+0x107/0x7b0
> >    __handle_mm_fault+0xcb0/0xd00
> >    handle_mm_fault+0xfc/0x2f0
> >    do_page_fault+0x263/0x6f9
> >    page_fault+0x34/0x40
> > 
> >   Reported by Kernel Concurrency Sanitizer on:
> >   CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
> >   Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > 
> > Both the read and write are done only with the non-exclusive mmap_sem
> > held. Since the read only check for a specific bit in the flag, even if
> 
> 
> Perhaps a clearer explanation is that the read of the page flags is always
> looking at a bit range (zone number: up to 3 bits) that is not being written to by
> the writer.
> 
> 
> > load tearing happens, it will be harmless, so just mark it as an
> > intentional data races using the data_race() macro.
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> >   include/linux/mm.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 52269e56c514..cafccad584c2 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
> >   
> >   static inline enum zone_type page_zonenum(const struct page *page)
> >   {
> > -	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> > +	return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
> 
> 
> I don't know about this. Lots of the kernel is written to do this sort
> of thing, and adding a load of "data_race()" everywhere is...well, I'm not
> sure if it's really the best way.  I wonder: could we maybe teach this
> kcsan thing to understand a few of the key idioms, particularly about page
> flags, instead of annotating all over the place?

My understanding is that it is rather difficult to change the compilers, but it
is a good question and I Cc Marco who is the maintainer for KCSAN that might
give you a definite answer.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
@ 2020-02-06 14:01     ` Qian Cai
  0 siblings, 0 replies; 11+ messages in thread
From: Qian Cai @ 2020-02-06 14:01 UTC (permalink / raw)
  To: John Hubbard, akpm
  Cc: ira.weiny, dan.j.williams, jack, linux-mm, linux-kernel, elver

On Wed, 2020-02-05 at 20:50 -0800, John Hubbard wrote:
> On 2/5/20 7:52 PM, Qian Cai wrote:
> > The commit 07d802699528 ("mm: devmap: refactor 1-based refcounting for
> > ZONE_DEVICE pages") introduced a data race as page->flags could be
> 
> Hi,
> 
> I really don't think so. This "race" was there long before that commit.
> Anyway, more below:
> 
> > accessed concurrently as noticied by KCSAN,
> > 
> >   BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > 
> >   write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> >    page_cpupid_xchg_last+0x51/0x80
> >    page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> >    wp_page_reuse+0x3e/0xc0
> >    wp_page_reuse at mm/memory.c:2453
> >    do_wp_page+0x472/0x7b0
> >    do_wp_page at mm/memory.c:2798
> >    __handle_mm_fault+0xcb0/0xd00
> >    handle_pte_fault at mm/memory.c:4049
> >    (inlined by) __handle_mm_fault at mm/memory.c:4163
> >    handle_mm_fault+0xfc/0x2f0
> >    handle_mm_fault at mm/memory.c:4200
> >    do_page_fault+0x263/0x6f9
> >    do_user_addr_fault at arch/x86/mm/fault.c:1465
> >    (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> >    page_fault+0x34/0x40
> > 
> >   read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> >    put_page+0x15a/0x1f0
> >    page_zonenum at include/linux/mm.h:923
> >    (inlined by) is_zone_device_page at include/linux/mm.h:929
> >    (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> >    (inlined by) put_page at include/linux/mm.h:1023
> >    wp_page_copy+0x571/0x930
> >    wp_page_copy at mm/memory.c:2615
> >    do_wp_page+0x107/0x7b0
> >    __handle_mm_fault+0xcb0/0xd00
> >    handle_mm_fault+0xfc/0x2f0
> >    do_page_fault+0x263/0x6f9
> >    page_fault+0x34/0x40
> > 
> >   Reported by Kernel Concurrency Sanitizer on:
> >   CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
> >   Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > 
> > Both the read and write are done only with the non-exclusive mmap_sem
> > held. Since the read only check for a specific bit in the flag, even if
> 
> 
> Perhaps a clearer explanation is that the read of the page flags is always
> looking at a bit range (zone number: up to 3 bits) that is not being written to by
> the writer.
> 
> 
> > load tearing happens, it will be harmless, so just mark it as an
> > intentional data races using the data_race() macro.
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> >   include/linux/mm.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 52269e56c514..cafccad584c2 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
> >   
> >   static inline enum zone_type page_zonenum(const struct page *page)
> >   {
> > -	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> > +	return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
> 
> 
> I don't know about this. Lots of the kernel is written to do this sort
> of thing, and adding a load of "data_race()" everywhere is...well, I'm not
> sure if it's really the best way.  I wonder: could we maybe teach this
> kcsan thing to understand a few of the key idioms, particularly about page
> flags, instead of annotating all over the place?

My understanding is that it is rather difficult to change the compilers, but it
is a good question and I Cc Marco who is the maintainer for KCSAN that might
give you a definite answer.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
  2020-02-06 14:01     ` Qian Cai
@ 2020-02-06 14:35       ` Marco Elver
  -1 siblings, 0 replies; 11+ messages in thread
From: Marco Elver @ 2020-02-06 14:35 UTC (permalink / raw)
  To: Qian Cai
  Cc: John Hubbard, Andrew Morton, ira.weiny, dan.j.williams, jack,
	Linux Memory Management List, LKML

On Thu, 6 Feb 2020 at 15:01, Qian Cai <cai@lca.pw> wrote:
>
> On Wed, 2020-02-05 at 20:50 -0800, John Hubbard wrote:
> > On 2/5/20 7:52 PM, Qian Cai wrote:
> > > The commit 07d802699528 ("mm: devmap: refactor 1-based refcounting for
> > > ZONE_DEVICE pages") introduced a data race as page->flags could be
> >
> > Hi,
> >
> > I really don't think so. This "race" was there long before that commit.
> > Anyway, more below:
> >
> > > accessed concurrently as noticied by KCSAN,
> > >
> > >   BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > >
> > >   write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> > >    page_cpupid_xchg_last+0x51/0x80
> > >    page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> > >    wp_page_reuse+0x3e/0xc0
> > >    wp_page_reuse at mm/memory.c:2453
> > >    do_wp_page+0x472/0x7b0
> > >    do_wp_page at mm/memory.c:2798
> > >    __handle_mm_fault+0xcb0/0xd00
> > >    handle_pte_fault at mm/memory.c:4049
> > >    (inlined by) __handle_mm_fault at mm/memory.c:4163
> > >    handle_mm_fault+0xfc/0x2f0
> > >    handle_mm_fault at mm/memory.c:4200
> > >    do_page_fault+0x263/0x6f9
> > >    do_user_addr_fault at arch/x86/mm/fault.c:1465
> > >    (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> > >    page_fault+0x34/0x40
> > >
> > >   read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> > >    put_page+0x15a/0x1f0
> > >    page_zonenum at include/linux/mm.h:923
> > >    (inlined by) is_zone_device_page at include/linux/mm.h:929
> > >    (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> > >    (inlined by) put_page at include/linux/mm.h:1023
> > >    wp_page_copy+0x571/0x930
> > >    wp_page_copy at mm/memory.c:2615
> > >    do_wp_page+0x107/0x7b0
> > >    __handle_mm_fault+0xcb0/0xd00
> > >    handle_mm_fault+0xfc/0x2f0
> > >    do_page_fault+0x263/0x6f9
> > >    page_fault+0x34/0x40
> > >
> > >   Reported by Kernel Concurrency Sanitizer on:
> > >   CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
> > >   Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > >
> > > Both the read and write are done only with the non-exclusive mmap_sem
> > > held. Since the read only check for a specific bit in the flag, even if
> >
> >
> > Perhaps a clearer explanation is that the read of the page flags is always
> > looking at a bit range (zone number: up to 3 bits) that is not being written to by
> > the writer.
> >
> >
> > > load tearing happens, it will be harmless, so just mark it as an
> > > intentional data races using the data_race() macro.
> > >
> > > Signed-off-by: Qian Cai <cai@lca.pw>
> > > ---
> > >   include/linux/mm.h | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 52269e56c514..cafccad584c2 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
> > >
> > >   static inline enum zone_type page_zonenum(const struct page *page)
> > >   {
> > > -   return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> > > +   return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
> >
> >
> > I don't know about this. Lots of the kernel is written to do this sort
> > of thing, and adding a load of "data_race()" everywhere is...well, I'm not
> > sure if it's really the best way.  I wonder: could we maybe teach this
> > kcsan thing to understand a few of the key idioms, particularly about page
> > flags, instead of annotating all over the place?
>
> My understanding is that it is rather difficult to change the compilers, but it
> is a good question and I Cc Marco who is the maintainer for KCSAN that might
> give you a definite answer.

The problem is that there is no general idiom where we could say with
confidence that a data race is safe across the whole kernel. Here it
might not matter, but somewhere else it might matter a lot.

If you think that it turns out the entire file may be littered with
'data_race()', and you do not want to use annotations, you can
blacklist the file. I already had to do this for other files in mm/,
because concurrent flag modification/checking is pervasive and a lot
of them seem 'benign'. We decided to revisit those files later.

Feel free to add 'KCSAN_SANITIZE_memory.o := n' or whatever other
files you think are full of these to mm/Makefile.

The only problem I see with that is that it's not obvious what is
concurrently modified and what isn't. The annotations would have
helped document what is happening.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
@ 2020-02-06 14:35       ` Marco Elver
  0 siblings, 0 replies; 11+ messages in thread
From: Marco Elver @ 2020-02-06 14:35 UTC (permalink / raw)
  To: Qian Cai
  Cc: John Hubbard, Andrew Morton, ira.weiny, dan.j.williams, jack,
	Linux Memory Management List, LKML

On Thu, 6 Feb 2020 at 15:01, Qian Cai <cai@lca.pw> wrote:
>
> On Wed, 2020-02-05 at 20:50 -0800, John Hubbard wrote:
> > On 2/5/20 7:52 PM, Qian Cai wrote:
> > > The commit 07d802699528 ("mm: devmap: refactor 1-based refcounting for
> > > ZONE_DEVICE pages") introduced a data race as page->flags could be
> >
> > Hi,
> >
> > I really don't think so. This "race" was there long before that commit.
> > Anyway, more below:
> >
> > > accessed concurrently as noticied by KCSAN,
> > >
> > >   BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> > >
> > >   write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
> > >    page_cpupid_xchg_last+0x51/0x80
> > >    page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
> > >    wp_page_reuse+0x3e/0xc0
> > >    wp_page_reuse at mm/memory.c:2453
> > >    do_wp_page+0x472/0x7b0
> > >    do_wp_page at mm/memory.c:2798
> > >    __handle_mm_fault+0xcb0/0xd00
> > >    handle_pte_fault at mm/memory.c:4049
> > >    (inlined by) __handle_mm_fault at mm/memory.c:4163
> > >    handle_mm_fault+0xfc/0x2f0
> > >    handle_mm_fault at mm/memory.c:4200
> > >    do_page_fault+0x263/0x6f9
> > >    do_user_addr_fault at arch/x86/mm/fault.c:1465
> > >    (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
> > >    page_fault+0x34/0x40
> > >
> > >   read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
> > >    put_page+0x15a/0x1f0
> > >    page_zonenum at include/linux/mm.h:923
> > >    (inlined by) is_zone_device_page at include/linux/mm.h:929
> > >    (inlined by) page_is_devmap_managed at include/linux/mm.h:948
> > >    (inlined by) put_page at include/linux/mm.h:1023
> > >    wp_page_copy+0x571/0x930
> > >    wp_page_copy at mm/memory.c:2615
> > >    do_wp_page+0x107/0x7b0
> > >    __handle_mm_fault+0xcb0/0xd00
> > >    handle_mm_fault+0xfc/0x2f0
> > >    do_page_fault+0x263/0x6f9
> > >    page_fault+0x34/0x40
> > >
> > >   Reported by Kernel Concurrency Sanitizer on:
> > >   CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
> > >   Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > >
> > > Both the read and write are done only with the non-exclusive mmap_sem
> > > held. Since the read only check for a specific bit in the flag, even if
> >
> >
> > Perhaps a clearer explanation is that the read of the page flags is always
> > looking at a bit range (zone number: up to 3 bits) that is not being written to by
> > the writer.
> >
> >
> > > load tearing happens, it will be harmless, so just mark it as an
> > > intentional data races using the data_race() macro.
> > >
> > > Signed-off-by: Qian Cai <cai@lca.pw>
> > > ---
> > >   include/linux/mm.h | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 52269e56c514..cafccad584c2 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
> > >
> > >   static inline enum zone_type page_zonenum(const struct page *page)
> > >   {
> > > -   return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> > > +   return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
> >
> >
> > I don't know about this. Lots of the kernel is written to do this sort
> > of thing, and adding a load of "data_race()" everywhere is...well, I'm not
> > sure if it's really the best way.  I wonder: could we maybe teach this
> > kcsan thing to understand a few of the key idioms, particularly about page
> > flags, instead of annotating all over the place?
>
> My understanding is that it is rather difficult to change the compilers, but it
> is a good question and I Cc Marco who is the maintainer for KCSAN that might
> give you a definite answer.

The problem is that there is no general idiom where we could say with
confidence that a data race is safe across the whole kernel. Here it
might not matter, but somewhere else it might matter a lot.

If you think that it turns out the entire file may be littered with
'data_race()', and you do not want to use annotations, you can
blacklist the file. I already had to do this for other files in mm/,
because concurrent flag modification/checking is pervasive and a lot
of them seem 'benign'. We decided to revisit those files later.

Feel free to add 'KCSAN_SANITIZE_memory.o := n' or whatever other
files you think are full of these to mm/Makefile.

The only problem I see with that is that it's not obvious what is
concurrently modified and what isn't. The annotations would have
helped document what is happening.

Thanks,
-- Marco


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
  2020-02-06 14:35       ` Marco Elver
  (?)
@ 2020-02-06 23:18       ` John Hubbard
  2020-02-07 13:18           ` Marco Elver
  -1 siblings, 1 reply; 11+ messages in thread
From: John Hubbard @ 2020-02-06 23:18 UTC (permalink / raw)
  To: Marco Elver, Qian Cai
  Cc: Andrew Morton, ira.weiny, dan.j.williams, jack,
	Linux Memory Management List, LKML

On 2/6/20 6:35 AM, Marco Elver wrote:
...
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 52269e56c514..cafccad584c2 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
>>>>
>>>>   static inline enum zone_type page_zonenum(const struct page *page)
>>>>   {
>>>> -   return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
>>>> +   return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
>>>
>>>
>>> I don't know about this. Lots of the kernel is written to do this sort
>>> of thing, and adding a load of "data_race()" everywhere is...well, I'm not
>>> sure if it's really the best way.  I wonder: could we maybe teach this
>>> kcsan thing to understand a few of the key idioms, particularly about page
>>> flags, instead of annotating all over the place?
>>
>> My understanding is that it is rather difficult to change the compilers, but it
>> is a good question and I Cc Marco who is the maintainer for KCSAN that might
>> give you a definite answer.
> 
> The problem is that there is no general idiom where we could say with
> confidence that a data race is safe across the whole kernel. Here it


Yes. I'm grasping at straws now, but...what about the idiom that page_zonenum()
uses: a set of bits that are "always" (after a certain early point) read-only?
What are your thoughts on that?


> might not matter, but somewhere else it might matter a lot.
> 
> If you think that it turns out the entire file may be littered with
> 'data_race()', and you do not want to use annotations, you can
> blacklist the file. I already had to do this for other files in mm/,
> because concurrent flag modification/checking is pervasive and a lot
> of them seem 'benign'. We decided to revisit those files later.
> 
> Feel free to add 'KCSAN_SANITIZE_memory.o := n' or whatever other
> files you think are full of these to mm/Makefile.
> 
> The only problem I see with that is that it's not obvious what is
> concurrently modified and what isn't. The annotations would have
> helped document what is happening.
> 
> Thanks,
> -- Marco
> 


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
  2020-02-06 23:18       ` John Hubbard
@ 2020-02-07 13:18           ` Marco Elver
  0 siblings, 0 replies; 11+ messages in thread
From: Marco Elver @ 2020-02-07 13:18 UTC (permalink / raw)
  To: John Hubbard
  Cc: Qian Cai, Andrew Morton, ira.weiny, dan.j.williams, jack,
	Linux Memory Management List, LKML

On Fri, 7 Feb 2020 at 00:18, John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 2/6/20 6:35 AM, Marco Elver wrote:
> ...
> >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>>> index 52269e56c514..cafccad584c2 100644
> >>>> --- a/include/linux/mm.h
> >>>> +++ b/include/linux/mm.h
> >>>> @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
> >>>>
> >>>>   static inline enum zone_type page_zonenum(const struct page *page)
> >>>>   {
> >>>> -   return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> >>>> +   return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
> >>>
> >>>
> >>> I don't know about this. Lots of the kernel is written to do this sort
> >>> of thing, and adding a load of "data_race()" everywhere is...well, I'm not
> >>> sure if it's really the best way.  I wonder: could we maybe teach this
> >>> kcsan thing to understand a few of the key idioms, particularly about page
> >>> flags, instead of annotating all over the place?
> >>
> >> My understanding is that it is rather difficult to change the compilers, but it
> >> is a good question and I Cc Marco who is the maintainer for KCSAN that might
> >> give you a definite answer.
> >
> > The problem is that there is no general idiom where we could say with
> > confidence that a data race is safe across the whole kernel. Here it
>
> Yes. I'm grasping at straws now, but...what about the idiom that page_zonenum()
> uses: a set of bits that are "always" (after a certain early point) read-only?
> What are your thoughts on that?

I have replied to the other thread.

Thanks,
-- Marco



> > might not matter, but somewhere else it might matter a lot.
> >
> > If you think that it turns out the entire file may be littered with
> > 'data_race()', and you do not want to use annotations, you can
> > blacklist the file. I already had to do this for other files in mm/,
> > because concurrent flag modification/checking is pervasive and a lot
> > of them seem 'benign'. We decided to revisit those files later.
> >
> > Feel free to add 'KCSAN_SANITIZE_memory.o := n' or whatever other
> > files you think are full of these to mm/Makefile.
> >
> > The only problem I see with that is that it's not obvious what is
> > concurrently modified and what isn't. The annotations would have
> > helped document what is happening.
> >
> > Thanks,
> > -- Marco
> >
>
>
> thanks,
> --
> John Hubbard
> NVIDIA

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
@ 2020-02-07 13:18           ` Marco Elver
  0 siblings, 0 replies; 11+ messages in thread
From: Marco Elver @ 2020-02-07 13:18 UTC (permalink / raw)
  To: John Hubbard
  Cc: Qian Cai, Andrew Morton, ira.weiny, dan.j.williams, jack,
	Linux Memory Management List, LKML

On Fri, 7 Feb 2020 at 00:18, John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 2/6/20 6:35 AM, Marco Elver wrote:
> ...
> >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>>> index 52269e56c514..cafccad584c2 100644
> >>>> --- a/include/linux/mm.h
> >>>> +++ b/include/linux/mm.h
> >>>> @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
> >>>>
> >>>>   static inline enum zone_type page_zonenum(const struct page *page)
> >>>>   {
> >>>> -   return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> >>>> +   return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
> >>>
> >>>
> >>> I don't know about this. Lots of the kernel is written to do this sort
> >>> of thing, and adding a load of "data_race()" everywhere is...well, I'm not
> >>> sure if it's really the best way.  I wonder: could we maybe teach this
> >>> kcsan thing to understand a few of the key idioms, particularly about page
> >>> flags, instead of annotating all over the place?
> >>
> >> My understanding is that it is rather difficult to change the compilers, but it
> >> is a good question and I Cc Marco who is the maintainer for KCSAN that might
> >> give you a definite answer.
> >
> > The problem is that there is no general idiom where we could say with
> > confidence that a data race is safe across the whole kernel. Here it
>
> Yes. I'm grasping at straws now, but...what about the idiom that page_zonenum()
> uses: a set of bits that are "always" (after a certain early point) read-only?
> What are your thoughts on that?

I have replied to the other thread.

Thanks,
-- Marco



> > might not matter, but somewhere else it might matter a lot.
> >
> > If you think that it turns out the entire file may be littered with
> > 'data_race()', and you do not want to use annotations, you can
> > blacklist the file. I already had to do this for other files in mm/,
> > because concurrent flag modification/checking is pervasive and a lot
> > of them seem 'benign'. We decided to revisit those files later.
> >
> > Feel free to add 'KCSAN_SANITIZE_memory.o := n' or whatever other
> > files you think are full of these to mm/Makefile.
> >
> > The only problem I see with that is that it's not obvious what is
> > concurrently modified and what isn't. The annotations would have
> > helped document what is happening.
> >
> > Thanks,
> > -- Marco
> >
>
>
> thanks,
> --
> John Hubbard
> NVIDIA


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-02-07 13:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-06  3:52 [PATCH -next] mm: mark a intentional data race in page_zonenum() Qian Cai
2020-02-06  4:50 ` John Hubbard
2020-02-06  9:04   ` Jan Kara
2020-02-06 11:14     ` Qian Cai
2020-02-06 14:01   ` Qian Cai
2020-02-06 14:01     ` Qian Cai
2020-02-06 14:35     ` Marco Elver
2020-02-06 14:35       ` Marco Elver
2020-02-06 23:18       ` John Hubbard
2020-02-07 13:18         ` Marco Elver
2020-02-07 13:18           ` Marco Elver

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.