All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Qian Cai <cai@lca.pw>, <akpm@linux-foundation.org>
Cc: <ira.weiny@intel.com>, <dan.j.williams@intel.com>, <jack@suse.cz>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -next] mm: mark a intentional data race in page_zonenum()
Date: Wed, 5 Feb 2020 20:50:34 -0800	[thread overview]
Message-ID: <480a7dde-f678-c07b-2231-4da8e0a38753@nvidia.com> (raw)
In-Reply-To: <20200206035235.2537-1-cai@lca.pw>

On 2/5/20 7:52 PM, Qian Cai wrote:
> The commit 07d802699528 ("mm: devmap: refactor 1-based refcounting for
> ZONE_DEVICE pages") introduced a data race as page->flags could be

Hi,

I really don't think so. This "race" was there long before that commit.
Anyway, more below:

> accessed concurrently as noticied by KCSAN,
> 
>   BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
> 
>   write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
>    page_cpupid_xchg_last+0x51/0x80
>    page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
>    wp_page_reuse+0x3e/0xc0
>    wp_page_reuse at mm/memory.c:2453
>    do_wp_page+0x472/0x7b0
>    do_wp_page at mm/memory.c:2798
>    __handle_mm_fault+0xcb0/0xd00
>    handle_pte_fault at mm/memory.c:4049
>    (inlined by) __handle_mm_fault at mm/memory.c:4163
>    handle_mm_fault+0xfc/0x2f0
>    handle_mm_fault at mm/memory.c:4200
>    do_page_fault+0x263/0x6f9
>    do_user_addr_fault at arch/x86/mm/fault.c:1465
>    (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
>    page_fault+0x34/0x40
> 
>   read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
>    put_page+0x15a/0x1f0
>    page_zonenum at include/linux/mm.h:923
>    (inlined by) is_zone_device_page at include/linux/mm.h:929
>    (inlined by) page_is_devmap_managed at include/linux/mm.h:948
>    (inlined by) put_page at include/linux/mm.h:1023
>    wp_page_copy+0x571/0x930
>    wp_page_copy at mm/memory.c:2615
>    do_wp_page+0x107/0x7b0
>    __handle_mm_fault+0xcb0/0xd00
>    handle_mm_fault+0xfc/0x2f0
>    do_page_fault+0x263/0x6f9
>    page_fault+0x34/0x40
> 
>   Reported by Kernel Concurrency Sanitizer on:
>   CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
>   Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> 
> Both the read and write are done only with the non-exclusive mmap_sem
> held. Since the read only check for a specific bit in the flag, even if


Perhaps a clearer explanation is that the read of the page flags is always
looking at a bit range (zone number: up to 3 bits) that is not being written to by
the writer.


> load tearing happens, it will be harmless, so just mark it as an
> intentional data races using the data_race() macro.
> 
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>   include/linux/mm.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 52269e56c514..cafccad584c2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
>   
>   static inline enum zone_type page_zonenum(const struct page *page)
>   {
> -	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> +	return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);


I don't know about this. Lots of the kernel is written to do this sort
of thing, and adding a load of "data_race()" everywhere is...well, I'm not
sure if it's really the best way.  I wonder: could we maybe teach this
kcsan thing to understand a few of the key idioms, particularly about page
flags, instead of annotating all over the place?



thanks,
-- 
John Hubbard
NVIDIA


>   }
>   
>   #ifdef CONFIG_ZONE_DEVICE
> 



  reply	other threads:[~2020-02-06  4:53 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-06  3:52 [PATCH -next] mm: mark a intentional data race in page_zonenum() Qian Cai
2020-02-06  4:50 ` John Hubbard [this message]
2020-02-06  9:04   ` Jan Kara
2020-02-06 11:14     ` Qian Cai
2020-02-06 14:01   ` Qian Cai
2020-02-06 14:01     ` Qian Cai
2020-02-06 14:35     ` Marco Elver
2020-02-06 14:35       ` Marco Elver
2020-02-06 23:18       ` John Hubbard
2020-02-07 13:18         ` Marco Elver
2020-02-07 13:18           ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=480a7dde-f678-c07b-2231-4da8e0a38753@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=dan.j.williams@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.