All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
@ 2023-08-28 21:14 Mirsad Todorovac
  2023-08-29 19:13 ` Matthew Wilcox
  2023-08-31 14:52 ` Matthew Wilcox
  0 siblings, 2 replies; 11+ messages in thread
From: Mirsad Todorovac @ 2023-08-28 21:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

[-- Attachment #1: Type: text/plain, Size: 6376 bytes --]

Hi,

In the vanilla torvalds tree 6.5 kernel on the Ubuntu 22.04 system, KCSAN found another data race:

[   34.082749] ==================================================================
[   34.089209] BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io

[   34.102069] write (marked) to 0xffffef9a44978bc0 of 8 bytes by interrupt on cpu 28:
[   34.108569] mpage_read_end_io (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/bitops.h:55 /home/marvin/linux/kernel/linux_torvalds/./include/asm-generic/bitops/instrumented-atomic.h:29 /home/marvin/linux/kernel/linux_torvalds/./include/linux/page-flags.h:739 /home/marvin/linux/kernel/linux_torvalds/fs/mpage.c:55)
[   34.108581] bio_endio (/home/marvin/linux/kernel/linux_torvalds/block/bio.c:1617)
[   34.108590] blk_mq_end_request_batch (/home/marvin/linux/kernel/linux_torvalds/block/blk-mq.c:850 /home/marvin/linux/kernel/linux_torvalds/block/blk-mq.c:1088)
[   34.108601] nvme_pci_complete_batch (/home/marvin/linux/kernel/linux_torvalds/drivers/nvme/host/pci.c:986) nvme
[   34.108644] nvme_irq (/home/marvin/linux/kernel/linux_torvalds/drivers/nvme/host/pci.c:1086) nvme
[   34.108686] __handle_irq_event_percpu (/home/marvin/linux/kernel/linux_torvalds/kernel/irq/handle.c:158)
[   34.108698] handle_irq_event (/home/marvin/linux/kernel/linux_torvalds/kernel/irq/handle.c:195 /home/marvin/linux/kernel/linux_torvalds/kernel/irq/handle.c:210)
[   34.108710] handle_edge_irq (/home/marvin/linux/kernel/linux_torvalds/kernel/irq/chip.c:836)
[   34.108722] __common_interrupt (/home/marvin/linux/kernel/linux_torvalds/./include/linux/irqdesc.h:161 /home/marvin/linux/kernel/linux_torvalds/arch/x86/kernel/irq.c:238 /home/marvin/linux/kernel/linux_torvalds/arch/x86/kernel/irq.c:257)
[   34.108731] common_interrupt (/home/marvin/linux/kernel/linux_torvalds/arch/x86/kernel/irq.c:247 (discriminator 14))
[   34.108743] asm_common_interrupt (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/idtentry.h:636)
[   34.108754] cpuidle_enter_state (/home/marvin/linux/kernel/linux_torvalds/drivers/cpuidle/cpuidle.c:291)
[   34.108766] cpuidle_enter (/home/marvin/linux/kernel/linux_torvalds/drivers/cpuidle/cpuidle.c:390)
[   34.108776] call_cpuidle (/home/marvin/linux/kernel/linux_torvalds/kernel/sched/idle.c:135)
[   34.108787] do_idle (/home/marvin/linux/kernel/linux_torvalds/kernel/sched/idle.c:219 /home/marvin/linux/kernel/linux_torvalds/kernel/sched/idle.c:282)
[   34.108795] cpu_startup_entry (/home/marvin/linux/kernel/linux_torvalds/kernel/sched/idle.c:378 (discriminator 1))
[   34.108803] start_secondary (/home/marvin/linux/kernel/linux_torvalds/arch/x86/kernel/smpboot.c:210 /home/marvin/linux/kernel/linux_torvalds/arch/x86/kernel/smpboot.c:294)
[   34.108814] secondary_startup_64_no_verify (/home/marvin/linux/kernel/linux_torvalds/arch/x86/kernel/head_64.S:441)

[   34.115221] read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
[   34.121702] folio_batch_move_lru (/home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1814 /home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1824 /home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1636 /home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1659 /home/marvin/linux/kernel/linux_torvalds/mm/swap.c:216)
[   34.121713] folio_batch_add_and_move (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:235)
[   34.121724] folio_add_lru (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/preempt.h:95 /home/marvin/linux/kernel/linux_torvalds/mm/swap.c:518)
[   34.121735] folio_add_lru_vma (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:538)
[   34.121746] do_anonymous_page (/home/marvin/linux/kernel/linux_torvalds/mm/memory.c:4146)
[   34.121757] __handle_mm_fault (/home/marvin/linux/kernel/linux_torvalds/mm/memory.c:3662 /home/marvin/linux/kernel/linux_torvalds/mm/memory.c:4939 /home/marvin/linux/kernel/linux_torvalds/mm/memory.c:5079)
[   34.121770] handle_mm_fault (/home/marvin/linux/kernel/linux_torvalds/mm/memory.c:5233)
[   34.121782] do_user_addr_fault (/home/marvin/linux/kernel/linux_torvalds/arch/x86/mm/fault.c:1392)
[   34.121794] exc_page_fault (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/paravirt.h:695 /home/marvin/linux/kernel/linux_torvalds/arch/x86/mm/fault.c:1494 /home/marvin/linux/kernel/linux_torvalds/arch/x86/mm/fault.c:1542)
[   34.121804] asm_exc_page_fault (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/idtentry.h:570)
[   34.121815] copyout (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/uaccess_64.h:112 /home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/uaccess_64.h:133 /home/marvin/linux/kernel/linux_torvalds/lib/iov_iter.c:168)
[   34.121827] _copy_to_iter (/home/marvin/linux/kernel/linux_torvalds/lib/iov_iter.c:316 (discriminator 5))
[   34.121835] copy_page_to_iter (/home/marvin/linux/kernel/linux_torvalds/lib/iov_iter.c:483 /home/marvin/linux/kernel/linux_torvalds/lib/iov_iter.c:468)
[   34.121843] filemap_read (/home/marvin/linux/kernel/linux_torvalds/mm/filemap.c:2712)
[   34.121854] blkdev_read_iter (/home/marvin/linux/kernel/linux_torvalds/block/fops.c:620)
[   34.121866] vfs_read (/home/marvin/linux/kernel/linux_torvalds/./include/linux/fs.h:1871 /home/marvin/linux/kernel/linux_torvalds/fs/read_write.c:389 /home/marvin/linux/kernel/linux_torvalds/fs/read_write.c:470)
[   34.121877] ksys_read (/home/marvin/linux/kernel/linux_torvalds/fs/read_write.c:613)
[   34.121887] __x64_sys_read (/home/marvin/linux/kernel/linux_torvalds/fs/read_write.c:621)
[   34.121898] do_syscall_64 (/home/marvin/linux/kernel/linux_torvalds/arch/x86/entry/common.c:50 /home/marvin/linux/kernel/linux_torvalds/arch/x86/entry/common.c:80)
[   34.121907] entry_SYSCALL_64_after_hwframe (/home/marvin/linux/kernel/linux_torvalds/arch/x86/entry/entry_64.S:120)

[   34.128249] value changed: 0x0017ffffc0020001 -> 0x0017ffffc0020004

[   34.141197] Reported by Kernel Concurrency Sanitizer on:
[   34.147749] CPU: 12 PID: 348 Comm: systemd-udevd Not tainted 6.5.0-kcsan-00001-g7b800ecbe71c #5
[   34.147760] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[   34.147766] ==================================================================

Please find attached config.

Best regards,
Mirsad Todorovac

[-- Attachment #2: config-6.5.0-kcsan-00001-g7b800ecbe71c.xz --]
[-- Type: application/x-xz, Size: 57816 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-08-28 21:14 BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io Mirsad Todorovac
@ 2023-08-29 19:13 ` Matthew Wilcox
  2023-08-30 11:43   ` Mirsad Todorovac
  2023-08-31 14:52 ` Matthew Wilcox
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2023-08-29 19:13 UTC (permalink / raw)
  To: Mirsad Todorovac
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Mon, Aug 28, 2023 at 11:14:23PM +0200, Mirsad Todorovac wrote:
> In the vanilla torvalds tree 6.5 kernel on the Ubuntu 22.04 system, KCSAN found another data race:

KCSAN is wrong.

> [   34.102069] write (marked) to 0xffffef9a44978bc0 of 8 bytes by interrupt on cpu 28:
> [   34.108569] mpage_read_end_io (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/bitops.h:55 /home/marvin/linux/kernel/linux_torvalds/./include/asm-generic/bitops/instrumented-atomic.h:29 /home/marvin/linux/kernel/linux_torvalds/./include/linux/page-flags.h:739 /home/marvin/linux/kernel/linux_torvalds/fs/mpage.c:55)

        bio_for_each_folio_all(fi, bio) {
                if (err)
                        folio_set_error(fi.folio);
                else
                        folio_mark_uptodate(fi.folio);
                folio_unlock(fi.folio);
        }

It's noting the write to folio->flags in folio_mark_uptodate().  You can
see it's locked.  Also, the folio is under I/O.

> [   34.115221] read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
> [   34.121702] folio_batch_move_lru (/home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1814 /home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1824 /home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1636 /home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1659 /home/marvin/linux/kernel/linux_torvalds/mm/swap.c:216)

Here, it's noting the read to folio->flags that's part of page_to_nid().

> [   34.121713] folio_batch_add_and_move (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:235)
> [   34.121724] folio_add_lru (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/preempt.h:95 /home/marvin/linux/kernel/linux_torvalds/mm/swap.c:518)
> [   34.121735] folio_add_lru_vma (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:538)
> [   34.121746] do_anonymous_page (/home/marvin/linux/kernel/linux_torvalds/mm/memory.c:4146)

Here we can see the page is freshly allocated.

So KCSAN has three things wrong here.  One is that the write to
folio_mark_uptodate() is setting a bit, that is nowhere near the bits
that are used for the node ID.  It can't know that; it doesn't track
writes at that granularity.

The second thing is that the node bits in folio->flags are immutable.
They're set at boot (or memory hotplug).  There is never a race risk when
reading them.  Presumably there needs to be some kind of annotation to
tell KCSAN that this is always safe.

The third thing is that these two accesses cannot race.  The write is
to a folio which is under I/O, so cannot be freed.  The read is to a
folio which has just been allocated, so cannot be under I/O.  This is
some kind of failure of KCSAN.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-08-29 19:13 ` Matthew Wilcox
@ 2023-08-30 11:43   ` Mirsad Todorovac
  2023-08-30 13:56     ` Mirsad Todorovac
  0 siblings, 1 reply; 11+ messages in thread
From: Mirsad Todorovac @ 2023-08-30 11:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

Hi, Mr. Matthew,

On 8/29/23 21:13, Matthew Wilcox wrote:
> On Mon, Aug 28, 2023 at 11:14:23PM +0200, Mirsad Todorovac wrote:
>> In the vanilla torvalds tree 6.5 kernel on the Ubuntu 22.04 system, KCSAN found another data race:
> 
> KCSAN is wrong.

Thank you for evaluating this bug report to such a detail.

Well, I ain't giving up on KCSAN anyway, because it found some real life data races.

To express it more graphically, it is very unpleasant when the other core changes the data
from underneath you or it magically and unexpectedly changes in the course of some work ...

:-(

>> [   34.102069] write (marked) to 0xffffef9a44978bc0 of 8 bytes by interrupt on cpu 28:
>> [   34.108569] mpage_read_end_io (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/bitops.h:55 /home/marvin/linux/kernel/linux_torvalds/./include/asm-generic/bitops/instrumented-atomic.h:29 /home/marvin/linux/kernel/linux_torvalds/./include/linux/page-flags.h:739 /home/marvin/linux/kernel/linux_torvalds/fs/mpage.c:55)
> 
>          bio_for_each_folio_all(fi, bio) {
>                  if (err)
>                          folio_set_error(fi.folio);
>                  else
>                          folio_mark_uptodate(fi.folio);
>                  folio_unlock(fi.folio);
>          }

> It's noting the write to folio->flags in folio_mark_uptodate().  You can
> see it's locked.  Also, the folio is under I/O.

Yes, from folio_unlock(fi.folio), it appears that somewhere it was locked. But finding
where it was locked is beyond my understanding ATM.

I see folio_put() in other places, but it seems to increase refcount only, I did not where
it is locked, but this is probably just me ...

>> [   34.115221] read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
>> [   34.121702] folio_batch_move_lru (/home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1814 /home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1824 /home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1636 /home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1659 /home/marvin/linux/kernel/linux_torvalds/mm/swap.c:216)
> 
> Here, it's noting the read to folio->flags that's part of page_to_nid().
> 
>> [   34.121713] folio_batch_add_and_move (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:235)
>> [   34.121724] folio_add_lru (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/preempt.h:95 /home/marvin/linux/kernel/linux_torvalds/mm/swap.c:518)
>> [   34.121735] folio_add_lru_vma (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:538)
>> [   34.121746] do_anonymous_page (/home/marvin/linux/kernel/linux_torvalds/mm/memory.c:4146)
> 
> Here we can see the page is freshly allocated.
> 
> So KCSAN has three things wrong here.  One is that the write to
> folio_mark_uptodate() is setting a bit, that is nowhere near the bits
> that are used for the node ID.  It can't know that; it doesn't track
> writes at that granularity.
> 
> The second thing is that the node bits in folio->flags are immutable.
> They're set at boot (or memory hotplug).  There is never a race risk when
> reading them.  Presumably there needs to be some kind of annotation to
> tell KCSAN that this is always safe.
> 
> The third thing is that these two accesses cannot race.  The write is
> to a folio which is under I/O, so cannot be freed.  The read is to a
> folio which has just been allocated, so cannot be under I/O.  This is
> some kind of failure of KCSAN.

Based on your insight, I will assume that the bug report is resolved.

Thank you again for your time.

Best regards,
Mirsad Todorovac

-- 
Mirsad Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-08-30 11:43   ` Mirsad Todorovac
@ 2023-08-30 13:56     ` Mirsad Todorovac
  0 siblings, 0 replies; 11+ messages in thread
From: Mirsad Todorovac @ 2023-08-30 13:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

Hi, Mr. Matthew,

On 8/29/23 21:13, Matthew Wilcox wrote:
 > On Mon, Aug 28, 2023 at 11:14:23PM +0200, Mirsad Todorovac wrote:
 >> In the vanilla torvalds tree 6.5 kernel on the Ubuntu 22.04 system, KCSAN found another data race:
 >
 > KCSAN is wrong.

Thank you for evaluating this bug report to such a detail.

Well, I ain't giving up on KCSAN anyway, because it found some real life data races.

To express a data race more graphically, it is very unpleasant when the other core changes
the data from underneath you or it magically and unexpectedly changes in the course of some
work ...

🙁

 >> [   34.102069] write (marked) to 0xffffef9a44978bc0 of 8 bytes by interrupt on cpu 28:
 >> [   34.108569] mpage_read_end_io (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/bitops.h:55 
/home/marvin/linux/kernel/linux_torvalds/./include/asm-generic/bitops/instrumented-atomic.h:29 
/home/marvin/linux/kernel/linux_torvalds/./include/linux/page-flags.h:739 /home/marvin/linux/kernel/linux_torvalds/fs/mpage.c:55)
 >
 >          bio_for_each_folio_all(fi, bio) {
 >                  if (err)
 >                          folio_set_error(fi.folio);
 >                  else
 >                          folio_mark_uptodate(fi.folio);
 >                  folio_unlock(fi.folio);
 >          }

 > It's noting the write to folio->flags in folio_mark_uptodate().  You can
 > see it's locked.  Also, the folio is under I/O.

Yes, from folio_unlock(fi.folio), it appears that somewhere it was locked. But finding
where it was locked is beyond my understanding ATM.

I see folio_put() in other places, but it seems to increase refcount only, I did not where
it is locked, but this is probably just me ...

 >> [   34.115221] read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
 >> [   34.121702] folio_batch_move_lru (/home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1814 
/home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1824 
/home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1636 
/home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1659 /home/marvin/linux/kernel/linux_torvalds/mm/swap.c:216)
 >
 > Here, it's noting the read to folio->flags that's part of page_to_nid().
 >
 >> [   34.121713] folio_batch_add_and_move (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:235)
 >> [   34.121724] folio_add_lru (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/preempt.h:95 
/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:518)
 >> [   34.121735] folio_add_lru_vma (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:538)
 >> [   34.121746] do_anonymous_page (/home/marvin/linux/kernel/linux_torvalds/mm/memory.c:4146)
 >
 > Here we can see the page is freshly allocated.
 >
 > So KCSAN has three things wrong here.  One is that the write to
 > folio_mark_uptodate() is setting a bit, that is nowhere near the bits
 > that are used for the node ID.  It can't know that; it doesn't track
 > writes at that granularity.
 >
 > The second thing is that the node bits in folio->flags are immutable.
 > They're set at boot (or memory hotplug).  There is never a race risk when
 > reading them.  Presumably there needs to be some kind of annotation to
 > tell KCSAN that this is always safe.
 >
 > The third thing is that these two accesses cannot race.  The write is
 > to a folio which is under I/O, so cannot be freed.  The read is to a
 > folio which has just been allocated, so cannot be under I/O.  This is
 > some kind of failure of KCSAN.

Based on your insight, I will assume that the bug report is resolved.

Thank you again for your time.

Best regards,
Mirsad Todorovac

-- 
Mirsad Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-08-28 21:14 BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io Mirsad Todorovac
  2023-08-29 19:13 ` Matthew Wilcox
@ 2023-08-31 14:52 ` Matthew Wilcox
  2023-09-08 15:25   ` Matthew Wilcox
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2023-08-31 14:52 UTC (permalink / raw)
  To: Mirsad Todorovac
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Mon, Aug 28, 2023 at 11:14:23PM +0200, Mirsad Todorovac wrote:
>  BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io

This one's still niggling at me.  I've trimmed the timestamps and some
of the other irrelevant stuff out of this to make it easier to read.

>  value changed: 0x0017ffffc0020001 -> 0x0017ffffc0020004

Notionally I understand this.  This is page->flags and the PG_locked bit
was set initially, but after a short delay PG_locked was cleared and
PG_uptodate was set.  That's _normal_.  For many, many pages, we set the
locked bit, initiate a read; the device does a DMA, sends an interrupt;
the interrupt handler sets the PG_uptodate bit and clears the PG_locked
bit to indicate the page is no longer under I/O.

But what I don't understand is how we see this for _this_ page.

>  write (marked) to 0xffffef9a44978bc0 of 8 bytes by interrupt on cpu 28:
>  mpage_read_end_io (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 include/linux/page-flags.h:739 fs/mpage.c:55)
>  bio_endio (block/bio.c:1617)
>  blk_mq_end_request_batch (block/blk-mq.c:850 block/blk-mq.c:1088)
>  nvme_pci_complete_batch (drivers/nvme/host/pci.c:986) nvme
>  nvme_irq (drivers/nvme/host/pci.c:1086) nvme

This is the interrupt handler.  It's doing what it's supposed to;
marking the page uptodate and unlocking it.

>  read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
>  folio_batch_move_lru (./include/linux/mm.h:1814 ./include/linux/mm.h:1824 ./include/linux/memcontrol.h:1636 ./include/linux/memcontrol.h:1659 mm/swap.c:216)
>  folio_batch_add_and_move (mm/swap.c:235)
>  folio_add_lru (./arch/x86/include/asm/preempt.h:95 mm/swap.c:518)
>  folio_add_lru_vma (mm/swap.c:538)
>  do_anonymous_page (mm/memory.c:4146)

This is the part I don't understand.  The path to calling
folio_add_lru_vma() comes directly from vma_alloc_zeroed_movable_folio():

        folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
        if (!folio)
                goto oom;
        if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL))
                goto oom_free_page;
        folio_throttle_swaprate(folio, GFP_KERNEL);
        __folio_mark_uptodate(folio);
        entry = mk_pte(&folio->page, vma->vm_page_prot);
        entry = pte_sw_mkyoung(entry);
        if (vma->vm_flags & VM_WRITE)
                entry = pte_mkwrite(pte_mkdirty(entry));
        vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
                        &vmf->ptl);
        if (!vmf->pte)
                goto release;
        if (vmf_pte_changed(vmf)) {
                update_mmu_tlb(vma, vmf->address, vmf->pte);
                goto release;
        }
        ret = check_stable_address_space(vma->vm_mm);
        if (ret)
                goto release;
        /* Deliver the page fault to userland, check inside PT lock */
        if (userfaultfd_missing(vma)) {
                pte_unmap_unlock(vmf->pte, vmf->ptl);
                folio_put(folio);
                return handle_userfault(vmf, VM_UFFD_MISSING);
        }
        inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
        folio_add_new_anon_rmap(folio, vma, vmf->address);
        folio_add_lru_vma(folio, vma);

(sorry that's a lot of lines).  But there's _nowhere_ there that sets
PG_locked.  It's a freshly allocated page; all page flags (that are
actually flags; ignore the stuff up at the top) should be clear.  We
even check that with PAGE_FLAGS_CHECK_AT_PREP.  Plus, it doesn't
make sense that we'd start I/O; the page is freshly allocated, full of
zeroes; there's no backing store to read the page from.

It really feels like this page was freed while it was still under I/O
and it's been reallocated to this victim process.

I'm going to try a few things and see if I can figure this out.

>  __handle_mm_fault (mm/memory.c:3662 mm/memory.c:4939 mm/memory.c:5079)
>  handle_mm_fault (mm/memory.c:5233)
>  do_user_addr_fault (arch/x86/mm/fault.c:1392)
>  exc_page_fault (./arch/x86/include/asm/paravirt.h:695 arch/x86/mm/fault.c:1494 arch/x86/mm/fault.c:1542)
>  asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
>  copyout (./arch/x86/include/asm/uaccess_64.h:112 ./arch/x86/include/asm/uaccess_64.h:133 lib/iov_iter.c:168)
>  _copy_to_iter (lib/iov_iter.c:316 (discriminator 5))
>  copy_page_to_iter (lib/iov_iter.c:483 lib/iov_iter.c:468)
>  filemap_read (mm/filemap.c:2712)
>  blkdev_read_iter (block/fops.c:620)
>  vfs_read (./include/linux/fs.h:1871 fs/read_write.c:389 fs/read_write.c:470)
>  ksys_read (fs/read_write.c:613)
>  __x64_sys_read (fs/read_write.c:621)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-08-31 14:52 ` Matthew Wilcox
@ 2023-09-08 15:25   ` Matthew Wilcox
  2023-09-12 16:05     ` Mirsad Todorovac
  2023-09-18 12:15     ` Mirsad Todorovac
  0 siblings, 2 replies; 11+ messages in thread
From: Matthew Wilcox @ 2023-09-08 15:25 UTC (permalink / raw)
  To: Mirsad Todorovac
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Thu, Aug 31, 2023 at 03:52:49PM +0100, Matthew Wilcox wrote:
> >  read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
> >  folio_batch_move_lru (./include/linux/mm.h:1814 ./include/linux/mm.h:1824 ./include/linux/memcontrol.h:1636 ./include/linux/memcontrol.h:1659 mm/swap.c:216)
> >  folio_batch_add_and_move (mm/swap.c:235)
> >  folio_add_lru (./arch/x86/include/asm/preempt.h:95 mm/swap.c:518)
> >  folio_add_lru_vma (mm/swap.c:538)
> >  do_anonymous_page (mm/memory.c:4146)
> 
> This is the part I don't understand.  The path to calling
> folio_add_lru_vma() comes directly from vma_alloc_zeroed_movable_folio():
> 
[snip]
> 
> (sorry that's a lot of lines).  But there's _nowhere_ there that sets
> PG_locked.  It's a freshly allocated page; all page flags (that are
> actually flags; ignore the stuff up at the top) should be clear.  We
> even check that with PAGE_FLAGS_CHECK_AT_PREP.  Plus, it doesn't
> make sense that we'd start I/O; the page is freshly allocated, full of
> zeroes; there's no backing store to read the page from.
> 
> It really feels like this page was freed while it was still under I/O
> and it's been reallocated to this victim process.
> 
> I'm going to try a few things and see if I can figure this out.

I'm having trouble reproducing this.  Can you get it to happen reliably?

This is what I'm currently running with, and it doesn't trigger.
I'd expect it to if we were going to hit the KCSAN bug.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0c5be12f9336..d22e8798c326 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
 	page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
 
 out:
+	VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
 	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
 	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
 		__free_pages(page, order);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-09-08 15:25   ` Matthew Wilcox
@ 2023-09-12 16:05     ` Mirsad Todorovac
  2023-09-18 12:15     ` Mirsad Todorovac
  1 sibling, 0 replies; 11+ messages in thread
From: Mirsad Todorovac @ 2023-09-12 16:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme



On 9/8/23 17:25, Matthew Wilcox wrote:
> On Thu, Aug 31, 2023 at 03:52:49PM +0100, Matthew Wilcox wrote:
>>>   read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
>>>   folio_batch_move_lru (./include/linux/mm.h:1814 ./include/linux/mm.h:1824 ./include/linux/memcontrol.h:1636 ./include/linux/memcontrol.h:1659 mm/swap.c:216)
>>>   folio_batch_add_and_move (mm/swap.c:235)
>>>   folio_add_lru (./arch/x86/include/asm/preempt.h:95 mm/swap.c:518)
>>>   folio_add_lru_vma (mm/swap.c:538)
>>>   do_anonymous_page (mm/memory.c:4146)
>>
>> This is the part I don't understand.  The path to calling
>> folio_add_lru_vma() comes directly from vma_alloc_zeroed_movable_folio():
>>
> [snip]
>>
>> (sorry that's a lot of lines).  But there's _nowhere_ there that sets
>> PG_locked.  It's a freshly allocated page; all page flags (that are
>> actually flags; ignore the stuff up at the top) should be clear.  We
>> even check that with PAGE_FLAGS_CHECK_AT_PREP.  Plus, it doesn't
>> make sense that we'd start I/O; the page is freshly allocated, full of
>> zeroes; there's no backing store to read the page from.
>>
>> It really feels like this page was freed while it was still under I/O
>> and it's been reallocated to this victim process.
>>
>> I'm going to try a few things and see if I can figure this out.
> 
> I'm having trouble reproducing this.  Can you get it to happen reliably?
> 
> This is what I'm currently running with, and it doesn't trigger.
> I'd expect it to if we were going to hit the KCSAN bug.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0c5be12f9336..d22e8798c326 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
>   	page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
>   
>   out:
> +	VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
>   	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>   	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
>   		__free_pages(page, order);

Hello, Mr. Matthew,

I have applied your patch to 6.6-rc1 torvalds tree vanilla kernel, and so far there was not KCSAN
report on folio_batch_move_lru() repeated. :-/

I realise that it would be good if this is made reproducible, so I will try a couple of other things ...

I am not that familiar with the memory management core, so you should probably tell me what to expect.

So far I got the best match (but probably unrelated) from this KCSAN report
(from secondary_startup_64_no_verify() to bio_endio() the stacktrace is the same):

[   72.708078] ==================================================================
[   72.708094] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked

[   72.708113] write to 0xffff888105d5dd90 of 8 bytes by interrupt on cpu 19:
[   72.708123] xas_clear_mark (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/bitops.h:178 /home/marvin/linux/kernel/torvalds2/./include/asm-generic/bitops/instrumented-non-atomic.h:115 /home/marvin/linux/kernel/torvalds2/lib/xarray.c:102 /home/marvin/linux/kernel/torvalds2/lib/xarray.c:914)
[   72.708134] __xa_clear_mark (/home/marvin/linux/kernel/torvalds2/lib/xarray.c:1929)
[   72.708143] __folio_end_writeback (/home/marvin/linux/kernel/torvalds2/mm/page-writeback.c:2960)
[   72.708155] folio_end_writeback (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:1613)
[   72.708164] end_page_writeback (/home/marvin/linux/kernel/torvalds2/mm/folio-compat.c:28)
[   72.708175] btrfs_page_clear_writeback (/home/marvin/linux/kernel/torvalds2/fs/btrfs/subpage.c:646) btrfs
[   72.708758] extent_buffer_write_end_io (/home/marvin/linux/kernel/torvalds2/./include/linux/bio.h:84 /home/marvin/linux/kernel/torvalds2/fs/btrfs/extent_io.c:1622) btrfs
[   72.709343] __btrfs_bio_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:120) btrfs
[   72.709921] btrfs_orig_bbio_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:164) btrfs
[   72.710521] btrfs_orig_write_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:420) btrfs
[   72.711118] bio_endio (/home/marvin/linux/kernel/torvalds2/block/bio.c:1603)
[   72.711128] blk_mq_end_request_batch (/home/marvin/linux/kernel/torvalds2/block/blk-mq.c:851 /home/marvin/linux/kernel/torvalds2/block/blk-mq.c:1089)
[   72.711139] nvme_pci_complete_batch (/home/marvin/linux/kernel/torvalds2/drivers/nvme/host/pci.c:986) nvme
[   72.711185] nvme_irq (/home/marvin/linux/kernel/torvalds2/drivers/nvme/host/pci.c:1086) nvme
[   72.711226] __handle_irq_event_percpu (/home/marvin/linux/kernel/torvalds2/kernel/irq/handle.c:158)
[   72.711239] handle_irq_event (/home/marvin/linux/kernel/torvalds2/kernel/irq/handle.c:195 /home/marvin/linux/kernel/torvalds2/kernel/irq/handle.c:210)
[   72.711251] handle_edge_irq (/home/marvin/linux/kernel/torvalds2/kernel/irq/chip.c:833)
[   72.711262] __common_interrupt (/home/marvin/linux/kernel/torvalds2/./include/linux/irqdesc.h:161 /home/marvin/linux/kernel/torvalds2/arch/x86/kernel/irq.c:238 /home/marvin/linux/kernel/torvalds2/arch/x86/kernel/irq.c:257)
[   72.711272] common_interrupt (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/irq.c:247 (discriminator 14))
[   72.711282] asm_common_interrupt (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/idtentry.h:636)
[   72.711292] cpuidle_enter_state (/home/marvin/linux/kernel/torvalds2/drivers/cpuidle/cpuidle.c:291)
[   72.711301] cpuidle_enter (/home/marvin/linux/kernel/torvalds2/drivers/cpuidle/cpuidle.c:390)
[   72.711309] call_cpuidle (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:135)
[   72.711320] do_idle (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:219 /home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:282)
[   72.711328] cpu_startup_entry (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:378 (discriminator 1))
[   72.711337] start_secondary (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/smpboot.c:210 /home/marvin/linux/kernel/torvalds2/arch/x86/kernel/smpboot.c:294)
[   72.711349] secondary_startup_64_no_verify (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/head_64.S:433)

[   72.711366] read to 0xffff888105d5dd90 of 8 bytes by task 555 on cpu 17:
[   72.711377] xas_find_marked (/home/marvin/linux/kernel/torvalds2/./include/linux/xarray.h:1724 /home/marvin/linux/kernel/torvalds2/lib/xarray.c:1354)
[   72.711387] filemap_get_folios_tag (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:1978 /home/marvin/linux/kernel/torvalds2/mm/filemap.c:2266)
[   72.711396] __filemap_fdatawait_range (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:516)
[   72.711405] filemap_fdatawait_range (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:553)
[   72.711414] __btrfs_wait_marked_extents.isra.0 (/home/marvin/linux/kernel/torvalds2/fs/btrfs/transaction.c:1150) btrfs
[   72.712027] btrfs_write_and_wait_transaction (/home/marvin/linux/kernel/torvalds2/fs/btrfs/transaction.c:1169 /home/marvin/linux/kernel/torvalds2/fs/btrfs/transaction.c:1218) btrfs
[   72.712639] btrfs_commit_transaction (/home/marvin/linux/kernel/torvalds2/fs/btrfs/transaction.c:2500) btrfs
[   72.713251] transaction_kthread (/home/marvin/linux/kernel/torvalds2/fs/btrfs/disk-io.c:1537) btrfs
[   72.713856] kthread (/home/marvin/linux/kernel/torvalds2/kernel/kthread.c:388)
[   72.713865] ret_from_fork (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/process.c:147)
[   72.713876] ret_from_fork_asm (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/entry_64.S:312)

[   72.713889] value changed: 0x0f00c00fff000000 -> 0x0000000fff000000

[   72.713903] Reported by Kernel Concurrency Sanitizer on:
[   72.713910] CPU: 17 PID: 555 Comm: btrfs-transacti Tainted: G             L     6.6.0-rc1-kcsan-dirty #2
[   72.713920] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[   72.713927] ==================================================================

However, I have upgraded to 6.6-rc1 torvalds tree kernel in the meantime.

If you want me, I could test with the 6.5 + your patch.

Best regards,
Mirsad Todorovac

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-09-08 15:25   ` Matthew Wilcox
  2023-09-12 16:05     ` Mirsad Todorovac
@ 2023-09-18 12:15     ` Mirsad Todorovac
  2023-09-18 14:53       ` Matthew Wilcox
  1 sibling, 1 reply; 11+ messages in thread
From: Mirsad Todorovac @ 2023-09-18 12:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On 9/8/23 17:25, Matthew Wilcox wrote:
> On Thu, Aug 31, 2023 at 03:52:49PM +0100, Matthew Wilcox wrote:
>>>   read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
>>>   folio_batch_move_lru (./include/linux/mm.h:1814 ./include/linux/mm.h:1824 ./include/linux/memcontrol.h:1636 ./include/linux/memcontrol.h:1659 mm/swap.c:216)
>>>   folio_batch_add_and_move (mm/swap.c:235)
>>>   folio_add_lru (./arch/x86/include/asm/preempt.h:95 mm/swap.c:518)
>>>   folio_add_lru_vma (mm/swap.c:538)
>>>   do_anonymous_page (mm/memory.c:4146)
>>
>> This is the part I don't understand.  The path to calling
>> folio_add_lru_vma() comes directly from vma_alloc_zeroed_movable_folio():
>>
> [snip]
>>
>> (sorry that's a lot of lines).  But there's _nowhere_ there that sets
>> PG_locked.  It's a freshly allocated page; all page flags (that are
>> actually flags; ignore the stuff up at the top) should be clear.  We
>> even check that with PAGE_FLAGS_CHECK_AT_PREP.  Plus, it doesn't
>> make sense that we'd start I/O; the page is freshly allocated, full of
>> zeroes; there's no backing store to read the page from.
>>
>> It really feels like this page was freed while it was still under I/O
>> and it's been reallocated to this victim process.
>>
>> I'm going to try a few things and see if I can figure this out.
> 
> I'm having trouble reproducing this.  Can you get it to happen reliably?
> 
> This is what I'm currently running with, and it doesn't trigger.
> I'd expect it to if we were going to hit the KCSAN bug.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0c5be12f9336..d22e8798c326 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
>   	page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
>   
>   out:
> +	VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
>   	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>   	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
>   		__free_pages(page, order);

Hi,

Caught another instance of this bug involving folio_batch_move_lru: I don't seem that I can make it
happen reliably by the nature of the data racing conditions if I understood them well.

I have only found them in dmesg, I cannot determine what exactly the system was doing at that
spurious moment ...

Hope this will get some more light on the issue (6.6-rc2 vanilla torvalds tree kernel):

[  114.557937] ==================================================================
[  114.558262] BUG: KCSAN: data-race in btrfs_page_set_uptodate [btrfs] / folio_batch_move_lru

[  114.558902] write (marked) to 0xffffea0006f68f00 of 8 bytes by task 2678 on cpu 19:
[  114.558912] btrfs_page_set_uptodate (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/bitops.h:55 /home/marvin/linux/kernel/torvalds2/./include/asm-generic/bitops/instrumented-atomic.h:29 /home/marvin/linux/kernel/torvalds2/./include/linux/page-flags.h:741 /home/marvin/linux/kernel/torvalds2/./include/linux/page-flags.h:751 /home/marvin/linux/kernel/torvalds2/fs/btrfs/subpage.c:642) btrfs
[  114.559539] end_page_read (/home/marvin/linux/kernel/torvalds2/fs/btrfs/extent_io.c:445) btrfs
[  114.560166] end_bio_extent_readpage (/home/marvin/linux/kernel/torvalds2/fs/btrfs/extent_io.c:660) btrfs
[  114.560796] __btrfs_bio_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:120) btrfs
[  114.561434] btrfs_orig_bbio_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:164) btrfs
[  114.562079] btrfs_check_read_bio (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:324) btrfs
[  114.562724] btrfs_end_bio_work (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:359) btrfs
[  114.563360] process_one_work (/home/marvin/linux/kernel/torvalds2/kernel/workqueue.c:2630)
[  114.563371] worker_thread (/home/marvin/linux/kernel/torvalds2/kernel/workqueue.c:2697 /home/marvin/linux/kernel/torvalds2/kernel/workqueue.c:2784)
[  114.563381] kthread (/home/marvin/linux/kernel/torvalds2/kernel/kthread.c:388)
[  114.563390] ret_from_fork (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/process.c:147)
[  114.563401] ret_from_fork_asm (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/entry_64.S:312)

[  114.563416] read to 0xffffea0006f68f00 of 8 bytes by task 3540 on cpu 20:
[  114.563426] folio_batch_move_lru (/home/marvin/linux/kernel/torvalds2/./include/linux/mm.h:1849 /home/marvin/linux/kernel/torvalds2/./include/linux/mm.h:1859 /home/marvin/linux/kernel/torvalds2/./include/linux/memcontrol.h:1639 /home/marvin/linux/kernel/torvalds2/./include/linux/memcontrol.h:1662 /home/marvin/linux/kernel/torvalds2/mm/swap.c:216)
[  114.563436] folio_batch_add_and_move (/home/marvin/linux/kernel/torvalds2/mm/swap.c:235)
[  114.563446] folio_add_lru (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/preempt.h:95 /home/marvin/linux/kernel/torvalds2/mm/swap.c:518)
[  114.563455] filemap_add_folio (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:957)
[  114.563464] page_cache_ra_unbounded (/home/marvin/linux/kernel/torvalds2/mm/readahead.c:250)
[  114.563477] page_cache_ra_order (/home/marvin/linux/kernel/torvalds2/mm/readahead.c:547)
[  114.563490] ondemand_readahead (/home/marvin/linux/kernel/torvalds2/mm/readahead.c:669)
[  114.563499] page_cache_async_ra (/home/marvin/linux/kernel/torvalds2/mm/readahead.c:718)
[  114.563507] filemap_fault (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:3227 /home/marvin/linux/kernel/torvalds2/mm/filemap.c:3281)
[  114.563518] __do_fault (/home/marvin/linux/kernel/torvalds2/mm/memory.c:4204)
[  114.563528] do_fault (/home/marvin/linux/kernel/torvalds2/mm/memory.c:4568 /home/marvin/linux/kernel/torvalds2/mm/memory.c:4705)
[  114.563538] __handle_mm_fault (/home/marvin/linux/kernel/torvalds2/mm/memory.c:3669 /home/marvin/linux/kernel/torvalds2/mm/memory.c:4978 /home/marvin/linux/kernel/torvalds2/mm/memory.c:5119)
[  114.563549] handle_mm_fault (/home/marvin/linux/kernel/torvalds2/mm/memory.c:5284)
[  114.563560] do_user_addr_fault (/home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1413)
[  114.563572] exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/paravirt.h:695 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1513 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1561)
[  114.563582] asm_exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/idtentry.h:570)

[  114.563597] value changed: 0x0017ffffc0008101 -> 0x0017ffffc0008108

[  114.563612] Reported by Kernel Concurrency Sanitizer on:
[  114.563619] CPU: 20 PID: 3540 Comm: chrome Not tainted 6.6.0-rc2-kcsan-00003-g16819584c239-dirty #11
[  114.563630] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[  114.563636] ==================================================================

Best regards,
Mirsad Todorovac

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-09-18 12:15     ` Mirsad Todorovac
@ 2023-09-18 14:53       ` Matthew Wilcox
  2023-09-19 11:44         ` Mirsad Todorovac
  2023-10-03 20:12         ` Mirsad Todorovac
  0 siblings, 2 replies; 11+ messages in thread
From: Matthew Wilcox @ 2023-09-18 14:53 UTC (permalink / raw)
  To: Mirsad Todorovac
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Mon, Sep 18, 2023 at 02:15:05PM +0200, Mirsad Todorovac wrote:
> > This is what I'm currently running with, and it doesn't trigger.
> > I'd expect it to if we were going to hit the KCSAN bug.
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 0c5be12f9336..d22e8798c326 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
> >   	page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
> >   out:
> > +	VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
> >   	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
> >   	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
> >   		__free_pages(page, order);
> 
> Hi,
> 
> Caught another instance of this bug involving folio_batch_move_lru: I don't seem that I can make it
> happen reliably by the nature of the data racing conditions if I understood them well.

Were you running with this patch at the time, or was this actually
vanilla?  The problem is that, if my diagnosis is correct, both of the
tasks mentioned are victims; we have a prematurely freed page.  While
btrfs is clearly a user, it may not be btrfs's fault that the
page was also allocated as an anon page.

I'm trying to gather more data, and running with this patch will give
us more -- because it'll dump the entire struct page instead of just
the page->flags, like KCSAN is currently doing.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-09-18 14:53       ` Matthew Wilcox
@ 2023-09-19 11:44         ` Mirsad Todorovac
  2023-10-03 20:12         ` Mirsad Todorovac
  1 sibling, 0 replies; 11+ messages in thread
From: Mirsad Todorovac @ 2023-09-19 11:44 UTC (permalink / raw)
  To: Matthew Wilcox, Mirsad Todorovac
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On 9/18/2023 4:53 PM, Matthew Wilcox wrote:

> On Mon, Sep 18, 2023 at 02:15:05PM +0200, Mirsad Todorovac wrote:
>>> This is what I'm currently running with, and it doesn't trigger.
>>> I'd expect it to if we were going to hit the KCSAN bug.
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 0c5be12f9336..d22e8798c326 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
>>>    	page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
>>>    out:
>>> +	VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
>>>    	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>>>    	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
>>>    		__free_pages(page, order);
>> Hi,
>>
>> Caught another instance of this bug involving folio_batch_move_lru: I don't seem that I can make it
>> happen reliably by the nature of the data racing conditions if I understood them well.
> Were you running with this patch at the time, or was this actually
> vanilla?  The problem is that, if my diagnosis is correct, both of the
> tasks mentioned are victims; we have a prematurely freed page.  While
> btrfs is clearly a user, it may not be btrfs's fault that the
> page was also allocated as an anon page.
>
> I'm trying to gather more data, and running with this patch will give
> us more -- because it'll dump the entire struct page instead of just
> the page->flags, like KCSAN is currently doing.

Hi, Mr. Matthew,

Yes, I am using "vanilla with your VM_BUG_ON_PAGE()" patch all the time, 
as it seems non-disruptive and I am hoping to catch this spurious page 
alloc.

Best regards, Mirsad Todorovac


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io
  2023-09-18 14:53       ` Matthew Wilcox
  2023-09-19 11:44         ` Mirsad Todorovac
@ 2023-10-03 20:12         ` Mirsad Todorovac
  1 sibling, 0 replies; 11+ messages in thread
From: Mirsad Todorovac @ 2023-10-03 20:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Andrew Morton, linux-mm, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme



On 9/18/23 16:53, Matthew Wilcox wrote:
> On Mon, Sep 18, 2023 at 02:15:05PM +0200, Mirsad Todorovac wrote:
>>> This is what I'm currently running with, and it doesn't trigger.
>>> I'd expect it to if we were going to hit the KCSAN bug.
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 0c5be12f9336..d22e8798c326 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
>>>    	page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
>>>    out:
>>> +	VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
>>>    	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>>>    	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
>>>    		__free_pages(page, order);
>>
>> Hi,
>>
>> Caught another instance of this bug involving folio_batch_move_lru: I don't seem that I can make it
>> happen reliably by the nature of the data racing conditions if I understood them well.
> 
> Were you running with this patch at the time, or was this actually
> vanilla?  The problem is that, if my diagnosis is correct, both of the
> tasks mentioned are victims; we have a prematurely freed page.  While
> btrfs is clearly a user, it may not be btrfs's fault that the
> page was also allocated as an anon page.
> 
> I'm trying to gather more data, and running with this patch will give
> us more -- because it'll dump the entire struct page instead of just
> the page->flags, like KCSAN is currently doing.

As my learning curve adapts, I seem to be more aware of what you are talking about.

I still have to learn to cope with patches, diffs, fixes and pulls all together and
consistent.

Sometimes I feel like in the BORG maturation chamber when I try to learn the Linux kernel,
and I wonder if this is the Author of my story trying to make up "for the years that locust
had eaten". Or is it that I am just losing the plot.

I learn that I was conceited and not respecting the work you guys have done in thirty years
I wasted for one reason or another: objective difficulties and personal weaknesses.

Forgive me this moment of truth.

I certainly feel more motivated to catch the real culprit, rather than just the symptoms.

I will rebuild with your patch again and try to reproduce the problem.

Best regards
Mirsad Todorovac



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-03 20:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-28 21:14 BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io Mirsad Todorovac
2023-08-29 19:13 ` Matthew Wilcox
2023-08-30 11:43   ` Mirsad Todorovac
2023-08-30 13:56     ` Mirsad Todorovac
2023-08-31 14:52 ` Matthew Wilcox
2023-09-08 15:25   ` Matthew Wilcox
2023-09-12 16:05     ` Mirsad Todorovac
2023-09-18 12:15     ` Mirsad Todorovac
2023-09-18 14:53       ` Matthew Wilcox
2023-09-19 11:44         ` Mirsad Todorovac
2023-10-03 20:12         ` Mirsad Todorovac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.