linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next] lib: disable KCSAN for XArray
@ 2020-03-04  3:15 Qian Cai
  2020-03-04  3:33 ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Qian Cai @ 2020-03-04  3:15 UTC (permalink / raw)
  To: paulmck; +Cc: willy, elver, linux-kernel, Qian Cai

Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
could happen concurrently result in data races, but those operate only
on a single bit that are pretty much harmless. For example,

 write to 0xffffa0020ee705c8 of 8 bytes by task 39718 on cpu 0:
  xas_set_mark+0x8e/0x190
  arch___test_and_set_bit at arch/x86/include/asm/bitops.h:152
  (inlined by) __test_and_set_bit at include/asm-generic/bitops/instrumented-non-atomic.h:72
  (inlined by) node_set_mark at lib/xarray.c:93
  (inlined by) xas_set_mark at lib/xarray.c:879
  __test_set_page_writeback+0x5de/0x8c0
  iomap_writepage_map+0x8c6/0xf90
  iomap_do_writepage+0x12b/0x450
  write_cache_pages+0x523/0xb20
  iomap_writepages+0x47/0x80
  xfs_file_fsync+0xeb/0x450 [xfs]
  do_writepages+0x5e/0x130
  __filemap_fdatawrite_range+0x19e/0x1f0
  file_write_and_wait_range+0xc0/0x100
  xfs_file_fsync+0xeb/0x450 [xfs]
  vfs_fsync_range+0x71/0x110
  __x64_sys_msync+0x210/0x2a0
  do_syscall_64+0x91/0xb05
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 read to 0xffffa0020ee705c8 of 8 bytes by task 39717 on cpu 5:
  xas_find_marked+0xe9/0x750
  xas_find_chunk at include/linux/xarray.h:1625
  (inlined by) xas_find_marked at lib/xarray.c:1198
  find_get_pages_range_tag+0x1bf/0xa90
  pagevec_lookup_range_tag+0x46/0x70
  __filemap_fdatawait_range+0xbb/0x270
  file_write_and_wait_range+0xe0/0x100
  xfs_file_fsync+0xeb/0x450 [xfs]
  vfs_fsync_range+0x71/0x110
  __x64_sys_msync+0x210/0x2a0
  do_syscall_64+0x91/0xb05
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Qian Cai <cai@lca.pw>
---
 lib/Makefile | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/Makefile b/lib/Makefile
index 523dfe2063e2..989e702c275b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -28,6 +28,11 @@ endif
 # Used by KCSAN while enabled, avoid recursion.
 KCSAN_SANITIZE_random32.o := n
 
+# This produces frequent data race reports: most of them are due to races on
+# the same word but accesses to a single bit of that word. Re-enable KCSAN
+# for this when we have more consensus on what to do about them.
+KCSAN_SANITIZE_xarray.o := n
+
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o timerqueue.o xarray.o \
 	 idr.o extable.o sha1.o irq_regs.o argv_split.o \
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04  3:15 [PATCH -next] lib: disable KCSAN for XArray Qian Cai
@ 2020-03-04  3:33 ` Matthew Wilcox
  2020-03-04  3:55   ` Qian Cai
  2020-03-04  4:05   ` Paul E. McKenney
  0 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox @ 2020-03-04  3:33 UTC (permalink / raw)
  To: Qian Cai; +Cc: paulmck, elver, linux-kernel

On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> could happen concurrently result in data races, but those operate only
> on a single bit that are pretty much harmless. For example,

Those aren't data races.  The writes are protected by a spinlock and the
reads by the RCU read lock.  If the tool can't handle RCU protection,
it's not going to be much use.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04  3:33 ` Matthew Wilcox
@ 2020-03-04  3:55   ` Qian Cai
  2020-03-04  4:00     ` Matthew Wilcox
  2020-03-04  4:05   ` Paul E. McKenney
  1 sibling, 1 reply; 16+ messages in thread
From: Qian Cai @ 2020-03-04  3:55 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: paulmck, elver, linux-kernel



> On Mar 3, 2020, at 10:33 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
>> Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
>> could happen concurrently result in data races, but those operate only
>> on a single bit that are pretty much harmless. For example,
> 
> Those aren't data races.  The writes are protected by a spinlock and the
> reads by the RCU read lock.  If the tool can't handle RCU protection,
> it's not going to be much use.
> 

Maybe the commit log is a bit confusing if you did not look at the example closely
enough. It is actually one read and one write result in data races where one spin_lock()
and one rcu_read_lock()  can’t prevent that from happening. We also have,

[19522.548668][T39646] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked 
[19522.583776][T39646]  
[19522.593816][T39646] write to 0xffff9ffb56c5c238 of 8 bytes by interrupt on cpu 16: 
[19522.628560][T39646]  xas_clear_mark+0x8e/0x1a0 
[19522.648993][T39646]  __xa_clear_mark+0xe7/0x110 
[19522.669367][T39646]  test_clear_page_writeback+0x3e9/0x712 [19522.694638][T39646]  end_page_writeback+0xaa/0x2b0 
[19522.716850][T39646]  iomap_finish_ioend+0x213/0x3b0 
[19522.740070][T39646]  iomap_writepage_end_bio+0x41/0x50 
[19522.763835][T39646]  bio_endio+0x297/0x560 
[19522.782608][T39646]  dec_pending+0x218/0x430 [dm_mod] 
[19522.805389][T39646]  clone_endio+0xe4/0x2c0 [dm_mod] 
[19522.828014][T39646]  bio_endio+0x297/0x560 
[19522.846681][T39646]  blk_update_request+0x201/0x920 
[19522.868929][T39646]  scsi_end_request+0x6b/0x4b0 
[19522.889924][T39646]  scsi_io_completion+0xb7/0x7e0 
[19522.911744][T39646]  scsi_finish_command+0x1ed/0x2a0 
[19522.934411][T39646]  scsi_softirq_done+0x1c9/0x1d0 
[19522.956357][T39646]  blk_done_softirq+0x181/0x1d0 
[19522.977796][T39646]  __do_softirq+0xd9/0x57c 
[19522.997300][T39646]  irq_exit+0xa2/0xc0 
[19523.015469][T39646]  do_IRQ+0x87/0x180 
[19523.032848][T39646]  ret_from_intr+0x0/0x42 
[19523.054251][T39646]  delay_tsc+0x46/0x80 
[19523.074662][T39646]  __const_udelay+0x3c/0x40 
[19523.095161][T39646]  __udelay+0x10/0x20 
[19523.113321][T39646]  kcsan_setup_watchpoint+0x1ec/0x3a0 
[19523.137486][T39646]  __tsan_read8+0xf1/0x110 
[19523.156973][T39646]  xas_find_marked+0xe9/0x750 
[19523.177267][T39646]  find_get_pages_range_tag+0x1bf/0xa90 
[19523.201945][T39646]  pagevec_lookup_range_tag+0x46/0x70 
[19523.226242][T39646]  __filemap_fdatawait_range+0xbb/0x270 
[19523.250977][T39646]  file_write_and_wait_range+0xe0/0x100 
[19523.276077][T39646]  xfs_file_fsync+0xeb/0x450 [xfs] 
[19523.298789][T39646]  vfs_fsync_range+0x71/0x110 
[19523.320550][T39646]  xfs_file_buffered_aio_write+0x6cf/0x790 [xfs] 
[19523.350421][T39646]  xfs_file_write_iter+0x232/0x2d0 [xfs] 
[19523.375536][T39646]  do_iter_readv_writev+0x321/0x400 
[19523.398811][T39646]  do_iter_write+0xdf/0x2b0 
[19523.418758][T39646]  vfs_writev+0xe6/0x170 
[19523.437516][T39646]  do_writev+0xa8/0x140 
[19523.455844][T39646]  __x64_sys_writev+0x4e/0x60 
[19523.476515][T39646]  do_syscall_64+0x91/0xb05 
[19523.496441][T39646]  entry_SYSCALL_64_after_hwframe+0x49/0xbe 
[19523.522946][T39646]  
[19523.533125][T39646] read to 0xffff9ffb56c5c238 of 8 bytes by task 39646 on cpu 16: 
[19523.570758][T39646]  xas_find_marked+0xe9/0x750 
[19523.594276][T39646]  find_get_pages_range_tag+0x1bf/0xa90 
[19523.618877][T39646]  pagevec_lookup_range_tag+0x46/0x70 
[19523.642674][T39646]  __filemap_fdatawait_range+0xbb/0x270 
[19523.667295][T39646]  file_write_and_wait_range+0xe0/0x100 
[19523.692394][T39646]  xfs_file_fsync+0xeb/0x450 [xfs] 
[19523.715149][T39646]  vfs_fsync_range+0x71/0x110 
[19523.736239][T39646]  xfs_file_buffered_aio_write+0x6cf/0x790 [xfs] 
[19523.765345][T39646]  xfs_file_write_iter+0x232/0x2d0 [xfs] 
[19523.791398][T39646]  do_iter_readv_writev+0x321/0x400 
[19523.814709][T39646]  do_iter_write+0xdf/0x2b0 
[19523.834576][T39646]  vfs_writev+0xe6/0x170 
[19523.853486][T39646]  do_writev+0xa8/0x140 
[19523.871952][T39646]  __x64_sys_writev+0x4e/0x60 
[19523.893148][T39646]  do_syscall_64+0x91/0xb05 
[19523.914075][T39646]  entry_SYSCALL_64_after_hwframe+0x49/0xbe 


[19648.209937][T39683] BUG: KCSAN: data-race in find_get_pages_range_tag / xas_set_mark 
[19648.248321][T39683]  
[19648.258683][T39683] write to 0xffffa001c3340238 of 8 bytes by task 39682 on cpu 25: 
[19648.295245][T39683]  xas_set_mark+0x8e/0x190 
[19648.315514][T39683]  __test_set_page_writeback+0x5de/0x8c0 
[19648.341697][T39683]  iomap_writepage_map+0x8c6/0xf90 
[19648.364916][T39683]  iomap_do_writepage+0x12b/0x450 
[19648.388367][T39683]  write_cache_pages+0x523/0xb20 
[19648.410232][T39683]  iomap_writepages+0x47/0x80 
[19648.431404][T39683]  xfs_vm_writepages+0xc7/0x100 [xfs] 
[19648.455333][T39683]  do_writepages+0x5e/0x130 
[19648.476105][T39683]  __filemap_fdatawrite_range+0x19e/0x1f0 
[19648.502048][T39683]  file_write_and_wait_range+0xc0/0x100 
[19648.527175][T39683]  xfs_file_fsync+0xeb/0x450 [xfs] 
[19648.549886][T39683]  vfs_fsync_range+0x71/0x110 
[19648.570508][T39683]  __x64_sys_msync+0x210/0x2a0 
[19648.591742][T39683]  do_syscall_64+0x91/0xb05 
[19648.612093][T39683]  entry_SYSCALL_64_after_hwframe+0x49/0xbe 
[19648.638410][T39683]  
[19648.648486][T39683] read to 0xffffa001c3340238 of 8 bytes by task 39683 on cpu 26: 
[19648.684048][T39683]  find_get_pages_range_tag+0x549/0xa90 
xas_for_each_marked() —> xas_find_marked()
[19648.710117][T39683]  pagevec_lookup_range_tag+0x46/0x70 
[19648.737531][T39683]  __filemap_fdatawait_range+0xbb/0x270 
[19648.762541][T39683]  file_write_and_wait_range+0xe0/0x100 
[19648.787436][T39683]  xfs_file_fsync+0xeb/0x450 [xfs] 
[19648.810136][T39683]  vfs_fsync_range+0x71/0x110 
[19648.830827][T39683]  __x64_sys_msync+0x210/0x2a0 
[19648.852022][T39683]  do_syscall_64+0x91/0xb05 
[19648.871962][T39683]  entry_SYSCALL_64_after_hwframe+0x49/0xbe 
[19648.899496][T39683]  
[19648.909496][T39683] 1 lock held by doio/39683: 
[19648.930880][T39683]  #0: ffffffffaf286cc0 (rcu_read_lock){....}, at: find_get_pages_range_tag+0x10f/0xa90 
[19648.976219][T39683] irq event stamp: 2463763 
[19648.996565][T39683] hardirqs last  enabled at (2463763): [<ffffffffade03ec2>] trace_hardirqs_on_thunk+0x1a/0x1c 
[19649.043090][T39683] hardirqs last disabled at (2463761): [<ffffffffaec002e7>] __do_softirq+0x2e7/0x57c 
[19649.087414][T39683] softirqs last  enabled at (2463762): [<ffffffffaec0034c>] __do_softirq+0x34c/0x57c 
[19649.129838][T39683] softirqs last disabled at (2463635): [<ffffffffadec69f2>] irq_exit+0xa2/0xc0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04  3:55   ` Qian Cai
@ 2020-03-04  4:00     ` Matthew Wilcox
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2020-03-04  4:00 UTC (permalink / raw)
  To: Qian Cai; +Cc: paulmck, elver, linux-kernel

On Tue, Mar 03, 2020 at 10:55:02PM -0500, Qian Cai wrote:
> 
> 
> > On Mar 3, 2020, at 10:33 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> >> Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> >> could happen concurrently result in data races, but those operate only
> >> on a single bit that are pretty much harmless. For example,
> > 
> > Those aren't data races.  The writes are protected by a spinlock and the
> > reads by the RCU read lock.  If the tool can't handle RCU protection,
> > it's not going to be much use.
> > 
> 
> Maybe the commit log is a bit confusing if you did not look at the example closely
> enough. It is actually one read and one write result in data races where one spin_lock()
> and one rcu_read_lock()  can’t prevent that from happening. We also have,

Yes.  All these examples have the reader protected by the RCU read lock
and the writer protected by the spinlock.  Does the tool need some kind
of extra annotation here?

> [19522.548668][T39646] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked 
> [19522.583776][T39646]  
> [19522.593816][T39646] write to 0xffff9ffb56c5c238 of 8 bytes by interrupt on cpu 16: 
> [19522.628560][T39646]  xas_clear_mark+0x8e/0x1a0 
> [19522.648993][T39646]  __xa_clear_mark+0xe7/0x110 
> [19522.669367][T39646]  test_clear_page_writeback+0x3e9/0x712 [19522.694638][T39646]  end_page_writeback+0xaa/0x2b0 

Spinlock taken here ^^^

> [19523.533125][T39646] read to 0xffff9ffb56c5c238 of 8 bytes by task 39646 on cpu 16: 
> [19523.570758][T39646]  xas_find_marked+0xe9/0x750 
> [19523.594276][T39646]  find_get_pages_range_tag+0x1bf/0xa90 

RCU read lock taken here ^^^


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04  3:33 ` Matthew Wilcox
  2020-03-04  3:55   ` Qian Cai
@ 2020-03-04  4:05   ` Paul E. McKenney
  2020-03-04  4:33     ` Matthew Wilcox
  1 sibling, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2020-03-04  4:05 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Qian Cai, elver, linux-kernel

On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > could happen concurrently result in data races, but those operate only
> > on a single bit that are pretty much harmless. For example,
> 
> Those aren't data races.  The writes are protected by a spinlock and the
> reads by the RCU read lock.  If the tool can't handle RCU protection,
> it's not going to be much use.

Would KCSAN's ASSERT_EXCLUSIVE_BITS() help here?

If not, you lost me on this one.  RCU readers don't exclude lock-based
writers.

RCU readers -do- exclude pre-insertion initialization on the one hand,
and those post-removal accesses that follow a grace period, but only
if that grace period starts after the removal.  In addition, the
accesses due to rcu_dereference(), rcu_assign_pointer(), and similar
are guaranteed to work even if they are concurrent.

Or am I missing something subtle here?

That said, you are permitted to define "data race" for your subsystem
by choosing KCSAN settings, up to and including keeping KCSAN out
entirely.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04  4:05   ` Paul E. McKenney
@ 2020-03-04  4:33     ` Matthew Wilcox
  2020-03-04 14:10       ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2020-03-04  4:33 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Qian Cai, elver, linux-kernel

On Tue, Mar 03, 2020 at 08:05:15PM -0800, Paul E. McKenney wrote:
> On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > > could happen concurrently result in data races, but those operate only
> > > on a single bit that are pretty much harmless. For example,
> > 
> > Those aren't data races.  The writes are protected by a spinlock and the
> > reads by the RCU read lock.  If the tool can't handle RCU protection,
> > it's not going to be much use.
> 
> Would KCSAN's ASSERT_EXCLUSIVE_BITS() help here?

I'm quite lost in the sea of new macros that have been added to help
with KCSAN.  It doesn't help that they're in -next somewhere that I
can't find, and not in mainline yet.  Is there documentation somewhere?

> RCU readers -do- exclude pre-insertion initialization on the one hand,
> and those post-removal accesses that follow a grace period, but only
> if that grace period starts after the removal.  In addition, the
> accesses due to rcu_dereference(), rcu_assign_pointer(), and similar
> are guaranteed to work even if they are concurrent.
> 
> Or am I missing something subtle here?

I probably am.  An XArray is composed of a tree of xa_nodes:

struct xa_node {
        unsigned char   shift;          /* Bits remaining in each slot */
        unsigned char   offset;         /* Slot offset in parent */
        unsigned char   count;          /* Total entry count */
        unsigned char   nr_values;      /* Value entry count */
        struct xa_node __rcu *parent;   /* NULL at top of tree */
        struct xarray   *array;         /* The array we belong to */
        union {
                struct list_head private_list;  /* For tree user */
                struct rcu_head rcu_head;       /* Used when freeing node */
        };
        void __rcu      *slots[XA_CHUNK_SIZE];
        union {
                unsigned long   tags[XA_MAX_MARKS][XA_MARK_LONGS];
                unsigned long   marks[XA_MAX_MARKS][XA_MARK_LONGS];
        };
};

'shift' is initialised before the node is inserted into the tree.
Ditto 'offset'.  'count' and 'nr_values' should only be touched with the
xa_lock held.  'parent' might be modified with the lock held and an RCU
reader expecting to see either the previous or new value.  'array' should
not change once the node is inserted.  private_list is, I believe, only
modified with the lock held.  'slots' may be modified with the xa_lock
held, and simultaneously read by an RCU reader.  Ditto 'tags'/'marks'.

The RCU readers are prepared for what they see to be inconsistent --
a fact of life when dealing with RCU!  So in a sense, yes, there is a
race there.  But it's a known, accepted race, and that acceptance is
indicated by the fact that the RCU lock is held.  Does there need to be
more annotation here?  Or is there an un-noticed bug that the tool is
legitimately pointing out?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04  4:33     ` Matthew Wilcox
@ 2020-03-04 14:10       ` Paul E. McKenney
  2020-03-04 16:40         ` Marco Elver
  2020-03-05 15:18         ` Matthew Wilcox
  0 siblings, 2 replies; 16+ messages in thread
From: Paul E. McKenney @ 2020-03-04 14:10 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Qian Cai, elver, linux-kernel

On Tue, Mar 03, 2020 at 08:33:56PM -0800, Matthew Wilcox wrote:
> On Tue, Mar 03, 2020 at 08:05:15PM -0800, Paul E. McKenney wrote:
> > On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> > > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > > > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > > > could happen concurrently result in data races, but those operate only
> > > > on a single bit that are pretty much harmless. For example,
> > > 
> > > Those aren't data races.  The writes are protected by a spinlock and the
> > > reads by the RCU read lock.  If the tool can't handle RCU protection,
> > > it's not going to be much use.
> > 
> > Would KCSAN's ASSERT_EXCLUSIVE_BITS() help here?
> 
> I'm quite lost in the sea of new macros that have been added to help
> with KCSAN.  It doesn't help that they're in -next somewhere that I
> can't find, and not in mainline yet.  Is there documentation somewhere?

Yes, there is documentation.  In -next somewhere.  :-/

Early days, apologies for the construction!

> > RCU readers -do- exclude pre-insertion initialization on the one hand,
> > and those post-removal accesses that follow a grace period, but only
> > if that grace period starts after the removal.  In addition, the
> > accesses due to rcu_dereference(), rcu_assign_pointer(), and similar
> > are guaranteed to work even if they are concurrent.
> > 
> > Or am I missing something subtle here?
> 
> I probably am.  An XArray is composed of a tree of xa_nodes:
> 
> struct xa_node {
>         unsigned char   shift;          /* Bits remaining in each slot */
>         unsigned char   offset;         /* Slot offset in parent */
>         unsigned char   count;          /* Total entry count */
>         unsigned char   nr_values;      /* Value entry count */
>         struct xa_node __rcu *parent;   /* NULL at top of tree */
>         struct xarray   *array;         /* The array we belong to */
>         union {
>                 struct list_head private_list;  /* For tree user */
>                 struct rcu_head rcu_head;       /* Used when freeing node */
>         };
>         void __rcu      *slots[XA_CHUNK_SIZE];
>         union {
>                 unsigned long   tags[XA_MAX_MARKS][XA_MARK_LONGS];
>                 unsigned long   marks[XA_MAX_MARKS][XA_MARK_LONGS];
>         };
> };
> 
> 'shift' is initialised before the node is inserted into the tree.
> Ditto 'offset'.

Very good, then both ->shift and ->offset can be accessed using normal
C-language loads and stores even by most strict definition of data race.

>                  'count' and 'nr_values' should only be touched with the
> xa_lock held.  'parent' might be modified with the lock held and an RCU
> reader expecting to see either the previous or new value.  'array' should
> not change once the node is inserted.  private_list is, I believe, only
> modified with the lock held.  'slots' may be modified with the xa_lock
> held, and simultaneously read by an RCU reader.  Ditto 'tags'/'marks'.

If ->count and ->nr_values are never accessed by readers, they can also
use plain C-language loads and stores.

KCSAN expects that accesses to the ->parent field should be marked.
But if ->parent is always accessed via things like rcu_dereference()
and rcu_assign_pointer() (guessing based on the __rcu), then KCSAN
won't complain.

The ->array can be accessed using plain C-language loads and stores.

If ->private_list is never accessed without holding the lock, then
plain C-language loads and stores work for it without KCSAN complaints.

The situation with ->slots is the same as that for ->parent.

KCSAN expects accesses to the ->tags[] and ->marks[] arrays to be marked.
However, the default configuration of KCSAN asks only that the reads
be marked.  (Within RCU, I instead configure KCSAN so that it asks that
both be marked, but it is of course your choice within your code.)

> The RCU readers are prepared for what they see to be inconsistent --
> a fact of life when dealing with RCU!  So in a sense, yes, there is a
> race there.  But it's a known, accepted race, and that acceptance is
> indicated by the fact that the RCU lock is held.  Does there need to be
> more annotation here?  Or is there an un-noticed bug that the tool is
> legitimately pointing out?

The answer to both questions is "maybe", depending on the code using
the values read.  Yes, it would be nice if KCSAN could figure out the
difference, but there are limits to what a tool can do.  And things
are sometimes no-obvious, for example:

	switch (foo) {
	case 1:  do_something_1();  break;
	case 3:  do_something_3();  break;
	case 7:  do_something_7();  break;
	case 19: do_something_19(); break;
	case 23: do_something_23(); break;
	}

Only one access to "foo", so all is well, right?

Sadly, wrong.  Compilers can create jump tables, and will often emit two
loads from "foo", one to check against the table size, and the other to
index the table.

Other potential traps may be found in https://lwn.net/Articles/793253/
("Who's afraid of a big bad optimizing compiler?").

One approach is to use READ_ONCE() on the reads in the RCU read-side
critical section that are subject to concurrent update.  Another is to
use the data_race() macro (as in data_race(foo) in the switch statement
above) to tell KCSAN that you have analyzed the compiler's response.
These first two can be mixed, if you like.  And yet another is the patch
proposed by Qian if you want KCSAN to get out of your code altogether.

Within RCU, I mark the accesses rather aggressively.  RCU is quite
concurrent, and the implied documentation is very worthwhile.

Your mileage may vary, of course.  For one thing, your actuarial
statistics are quite likley significantly more favorable than are mine.
Not that mine are at all bad, particularly by the standards of a century
or two ago.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04 14:10       ` Paul E. McKenney
@ 2020-03-04 16:40         ` Marco Elver
  2020-03-04 17:10           ` Qian Cai
  2020-03-05 15:18         ` Matthew Wilcox
  1 sibling, 1 reply; 16+ messages in thread
From: Marco Elver @ 2020-03-04 16:40 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Matthew Wilcox, Qian Cai, LKML

On Wed, 4 Mar 2020 at 15:10, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Mar 03, 2020 at 08:33:56PM -0800, Matthew Wilcox wrote:
> > On Tue, Mar 03, 2020 at 08:05:15PM -0800, Paul E. McKenney wrote:
> > > On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> > > > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > > > > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > > > > could happen concurrently result in data races, but those operate only
> > > > > on a single bit that are pretty much harmless. For example,

I currently do not see those as justification to blacklist the whole
file. Wouldn't __no_kcsan be better? That is, in case there is no
other solution that emerges from the remainder of the discussion here.

> > > > Those aren't data races.  The writes are protected by a spinlock and the
> > > > reads by the RCU read lock.  If the tool can't handle RCU protection,
> > > > it's not going to be much use.
> > >
> > > Would KCSAN's ASSERT_EXCLUSIVE_BITS() help here?
> >
> > I'm quite lost in the sea of new macros that have been added to help
> > with KCSAN.  It doesn't help that they're in -next somewhere that I
> > can't find, and not in mainline yet.  Is there documentation somewhere?
>
> Yes, there is documentation.  In -next somewhere.  :-/
>
> Early days, apologies for the construction!

I just sent a patch updating documentation, just to make sure we have
something up-to-date we can refer to in the meantime.

A preview of generated documentation for ASSERT_EXCLUSIVE_BITS (and
others) is here:
https://htmlpreview.github.io/?https://github.com/google/ktsan/blob/kcsan-kerneldoc/output/dev-tools/kcsan.html#race-detection-beyond-data-races

Hope that helps somewhat.


> > > RCU readers -do- exclude pre-insertion initialization on the one hand,
> > > and those post-removal accesses that follow a grace period, but only
> > > if that grace period starts after the removal.  In addition, the
> > > accesses due to rcu_dereference(), rcu_assign_pointer(), and similar
> > > are guaranteed to work even if they are concurrent.
> > >
> > > Or am I missing something subtle here?
> >
> > I probably am.  An XArray is composed of a tree of xa_nodes:
> >
> > struct xa_node {
> >         unsigned char   shift;          /* Bits remaining in each slot */
> >         unsigned char   offset;         /* Slot offset in parent */
> >         unsigned char   count;          /* Total entry count */
> >         unsigned char   nr_values;      /* Value entry count */
> >         struct xa_node __rcu *parent;   /* NULL at top of tree */
> >         struct xarray   *array;         /* The array we belong to */
> >         union {
> >                 struct list_head private_list;  /* For tree user */
> >                 struct rcu_head rcu_head;       /* Used when freeing node */
> >         };
> >         void __rcu      *slots[XA_CHUNK_SIZE];
> >         union {
> >                 unsigned long   tags[XA_MAX_MARKS][XA_MARK_LONGS];
> >                 unsigned long   marks[XA_MAX_MARKS][XA_MARK_LONGS];
> >         };
> > };
> >
> > 'shift' is initialised before the node is inserted into the tree.
> > Ditto 'offset'.
>
> Very good, then both ->shift and ->offset can be accessed using normal
> C-language loads and stores even by most strict definition of data race.
>
> >                  'count' and 'nr_values' should only be touched with the
> > xa_lock held.  'parent' might be modified with the lock held and an RCU
> > reader expecting to see either the previous or new value.  'array' should
> > not change once the node is inserted.  private_list is, I believe, only
> > modified with the lock held.  'slots' may be modified with the xa_lock
> > held, and simultaneously read by an RCU reader.  Ditto 'tags'/'marks'.
>
> If ->count and ->nr_values are never accessed by readers, they can also
> use plain C-language loads and stores.
>
> KCSAN expects that accesses to the ->parent field should be marked.
> But if ->parent is always accessed via things like rcu_dereference()
> and rcu_assign_pointer() (guessing based on the __rcu), then KCSAN
> won't complain.
>
> The ->array can be accessed using plain C-language loads and stores.
>
> If ->private_list is never accessed without holding the lock, then
> plain C-language loads and stores work for it without KCSAN complaints.
>
> The situation with ->slots is the same as that for ->parent.
>
> KCSAN expects accesses to the ->tags[] and ->marks[] arrays to be marked.
> However, the default configuration of KCSAN asks only that the reads
> be marked.  (Within RCU, I instead configure KCSAN so that it asks that
> both be marked, but it is of course your choice within your code.)
>
> > The RCU readers are prepared for what they see to be inconsistent --
> > a fact of life when dealing with RCU!  So in a sense, yes, there is a
> > race there.  But it's a known, accepted race, and that acceptance is
> > indicated by the fact that the RCU lock is held.  Does there need to be
> > more annotation here?  Or is there an un-noticed bug that the tool is
> > legitimately pointing out?
>
> The answer to both questions is "maybe", depending on the code using
> the values read.  Yes, it would be nice if KCSAN could figure out the
> difference, but there are limits to what a tool can do.  And things
> are sometimes no-obvious, for example:
>
>         switch (foo) {
>         case 1:  do_something_1();  break;
>         case 3:  do_something_3();  break;
>         case 7:  do_something_7();  break;
>         case 19: do_something_19(); break;
>         case 23: do_something_23(); break;
>         }
>
> Only one access to "foo", so all is well, right?
>
> Sadly, wrong.  Compilers can create jump tables, and will often emit two
> loads from "foo", one to check against the table size, and the other to
> index the table.
>
> Other potential traps may be found in https://lwn.net/Articles/793253/
> ("Who's afraid of a big bad optimizing compiler?").
>
> One approach is to use READ_ONCE() on the reads in the RCU read-side
> critical section that are subject to concurrent update.  Another is to
> use the data_race() macro (as in data_race(foo) in the switch statement
> above) to tell KCSAN that you have analyzed the compiler's response.
> These first two can be mixed, if you like.  And yet another is the patch
> proposed by Qian if you want KCSAN to get out of your code altogether.
>
> Within RCU, I mark the accesses rather aggressively.  RCU is quite
> concurrent, and the implied documentation is very worthwhile.
>
> Your mileage may vary, of course.  For one thing, your actuarial
> statistics are quite likley significantly more favorable than are mine.
> Not that mine are at all bad, particularly by the standards of a century
> or two ago.  ;-)
>
>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04 16:40         ` Marco Elver
@ 2020-03-04 17:10           ` Qian Cai
  2020-03-04 17:14             ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Qian Cai @ 2020-03-04 17:10 UTC (permalink / raw)
  To: Marco Elver, Paul E. McKenney; +Cc: Matthew Wilcox, LKML

On Wed, 2020-03-04 at 17:40 +0100, Marco Elver wrote:
> On Wed, 4 Mar 2020 at 15:10, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > On Tue, Mar 03, 2020 at 08:33:56PM -0800, Matthew Wilcox wrote:
> > > On Tue, Mar 03, 2020 at 08:05:15PM -0800, Paul E. McKenney wrote:
> > > > On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> > > > > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > > > > > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > > > > > could happen concurrently result in data races, but those operate only
> > > > > > on a single bit that are pretty much harmless. For example,
> 
> I currently do not see those as justification to blacklist the whole
> file. Wouldn't __no_kcsan be better? That is, in case there is no
> other solution that emerges from the remainder of the discussion here.

I suppose it is up to Matthew. Currently, I can see there are several functions
may need __no_kcsan,

xa_get_mark(), xas_find_marked(), xas_find_chunk() etc.

My worry was that there could be many of those functions operating on marks
(which is a single-bit) in xarray that could end up needing the same treatment.

So far, my testing is thin on filesystem side where xarray is pretty much used
for page caches, so the reports I got from KCSAN runs does not necessary tell
the whole story. Once I updated my KCSAN runs to include things like xfstests,
it could add quite a few churns later if we decided to go with the __no_kcsan
route.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04 17:10           ` Qian Cai
@ 2020-03-04 17:14             ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2020-03-04 17:14 UTC (permalink / raw)
  To: Qian Cai; +Cc: Marco Elver, Matthew Wilcox, LKML

On Wed, Mar 04, 2020 at 12:10:58PM -0500, Qian Cai wrote:
> On Wed, 2020-03-04 at 17:40 +0100, Marco Elver wrote:
> > On Wed, 4 Mar 2020 at 15:10, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > 
> > > On Tue, Mar 03, 2020 at 08:33:56PM -0800, Matthew Wilcox wrote:
> > > > On Tue, Mar 03, 2020 at 08:05:15PM -0800, Paul E. McKenney wrote:
> > > > > On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> > > > > > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > > > > > > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > > > > > > could happen concurrently result in data races, but those operate only
> > > > > > > on a single bit that are pretty much harmless. For example,
> > 
> > I currently do not see those as justification to blacklist the whole
> > file. Wouldn't __no_kcsan be better? That is, in case there is no
> > other solution that emerges from the remainder of the discussion here.
> 
> I suppose it is up to Matthew. Currently, I can see there are several functions
> may need __no_kcsan,
> 
> xa_get_mark(), xas_find_marked(), xas_find_chunk() etc.
> 
> My worry was that there could be many of those functions operating on marks
> (which is a single-bit) in xarray that could end up needing the same treatment.
> 
> So far, my testing is thin on filesystem side where xarray is pretty much used
> for page caches, so the reports I got from KCSAN runs does not necessary tell
> the whole story. Once I updated my KCSAN runs to include things like xfstests,
> it could add quite a few churns later if we decided to go with the __no_kcsan
> route.

Agreed, the filesystems developers and maintainers will decide how they
would like to approach this.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-04 14:10       ` Paul E. McKenney
  2020-03-04 16:40         ` Marco Elver
@ 2020-03-05 15:18         ` Matthew Wilcox
  2020-03-05 21:39           ` Paul E. McKenney
  1 sibling, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2020-03-05 15:18 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Qian Cai, elver, linux-kernel

On Wed, Mar 04, 2020 at 06:10:21AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 03, 2020 at 08:33:56PM -0800, Matthew Wilcox wrote:
> > On Tue, Mar 03, 2020 at 08:05:15PM -0800, Paul E. McKenney wrote:
> > > On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> > > > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > > > > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > > > > could happen concurrently result in data races, but those operate only
> > > > > on a single bit that are pretty much harmless. For example,
> > > > 
> > > > Those aren't data races.  The writes are protected by a spinlock and the
> > > > reads by the RCU read lock.  If the tool can't handle RCU protection,
> > > > it's not going to be much use.
> > > 
> > > Would KCSAN's ASSERT_EXCLUSIVE_BITS() help here?
> > 
> > I'm quite lost in the sea of new macros that have been added to help
> > with KCSAN.  It doesn't help that they're in -next somewhere that I
> > can't find, and not in mainline yet.  Is there documentation somewhere?
> 
> Yes, there is documentation.  In -next somewhere.  :-/

Looking at the documentation link that Marco kindly provided, I don't
think ASSERT_EXCLUSIVE_BITS is the correct annotation to make.  The bits
in question are being accessed in parallel, it's just that we're willing
to accept a certain degree of inaccuracy.

> > struct xa_node {
> >         unsigned char   shift;          /* Bits remaining in each slot */
> >         unsigned char   offset;         /* Slot offset in parent */
> >         unsigned char   count;          /* Total entry count */
> >         unsigned char   nr_values;      /* Value entry count */
> >         struct xa_node __rcu *parent;   /* NULL at top of tree */
> >         struct xarray   *array;         /* The array we belong to */
> >         union {
> >                 struct list_head private_list;  /* For tree user */
> >                 struct rcu_head rcu_head;       /* Used when freeing node */
> >         };
> >         void __rcu      *slots[XA_CHUNK_SIZE];
> >         union {
> >                 unsigned long   tags[XA_MAX_MARKS][XA_MARK_LONGS];
> >                 unsigned long   marks[XA_MAX_MARKS][XA_MARK_LONGS];
> >         };
> > };
> > 
> > 'slots' may be modified with the xa_lock
> > held, and simultaneously read by an RCU reader.  Ditto 'tags'/'marks'.
> 
> KCSAN expects that accesses to the ->parent field should be marked.
> But if ->parent is always accessed via things like rcu_dereference()
> and rcu_assign_pointer() (guessing based on the __rcu), then KCSAN
> won't complain.
[...]
> The situation with ->slots is the same as that for ->parent.
> 
> KCSAN expects accesses to the ->tags[] and ->marks[] arrays to be marked.
> However, the default configuration of KCSAN asks only that the reads
> be marked.  (Within RCU, I instead configure KCSAN so that it asks that
> both be marked, but it is of course your choice within your code.)
> 
> > The RCU readers are prepared for what they see to be inconsistent --
> > a fact of life when dealing with RCU!  So in a sense, yes, there is a
> > race there.  But it's a known, accepted race, and that acceptance is
> > indicated by the fact that the RCU lock is held.  Does there need to be
> > more annotation here?  Or is there an un-noticed bug that the tool is
> > legitimately pointing out?
> 
> The answer to both questions is "maybe", depending on the code using
> the values read.  Yes, it would be nice if KCSAN could figure out the
> difference, but there are limits to what a tool can do.  And things
> are sometimes no-obvious, for example:

I require the following properties from this array of bits:

 - If a bit was set before and after the modification, it must be seen to
   be set.
 - If a bit was clear before and after the modification, it must be seen to
   be clear.
 - If a bit is modified, it may be seen as set or clear.

I have found three locations where we use the ->marks array:

1.
                        unsigned long data = *addr & (~0UL << offset);
                        if (data)
                                return __ffs(data);

2.
        return find_next_bit(addr, XA_CHUNK_SIZE, offset);
3.
        return test_bit(offset, node_marks(node, mark));

The modifications -- all done with the spinlock held -- use the non-atomic
bitops:
        return __test_and_set_bit(offset, node_marks(node, mark));
        return __test_and_clear_bit(offset, node_marks(node, mark));
        bitmap_fill(node_marks(node, mark), XA_CHUNK_SIZE);
(that last one doesn't really count -- it's done prior to placing the node
in the tree)

The first read seems straightforward; I can place a READ_ONCE around
*addr.  The second & third reads are rather less straightforward.
find_next_bit() and test_bit() are common code and use plain loads today.

> Your mileage may vary, of course.  For one thing, your actuarial
> statistics are quite likley significantly more favorable than are mine.
> Not that mine are at all bad, particularly by the standards of a century
> or two ago.  ;-)

I don't know ... my preferred form of exercise takes me into the
harm's way of cars, so while I've significantly lowered my risk of
dying of cardiovascular disease, I'm much more likely to be killed by
an inattentive human being!  Bring on our robot cars ...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-05 15:18         ` Matthew Wilcox
@ 2020-03-05 21:39           ` Paul E. McKenney
  2020-03-06 13:38             ` Marco Elver
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2020-03-05 21:39 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Qian Cai, elver, linux-kernel

On Thu, Mar 05, 2020 at 07:18:31AM -0800, Matthew Wilcox wrote:
> On Wed, Mar 04, 2020 at 06:10:21AM -0800, Paul E. McKenney wrote:
> > On Tue, Mar 03, 2020 at 08:33:56PM -0800, Matthew Wilcox wrote:
> > > On Tue, Mar 03, 2020 at 08:05:15PM -0800, Paul E. McKenney wrote:
> > > > On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> > > > > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > > > > > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > > > > > could happen concurrently result in data races, but those operate only
> > > > > > on a single bit that are pretty much harmless. For example,
> > > > > 
> > > > > Those aren't data races.  The writes are protected by a spinlock and the
> > > > > reads by the RCU read lock.  If the tool can't handle RCU protection,
> > > > > it's not going to be much use.
> > > > 
> > > > Would KCSAN's ASSERT_EXCLUSIVE_BITS() help here?
> > > 
> > > I'm quite lost in the sea of new macros that have been added to help
> > > with KCSAN.  It doesn't help that they're in -next somewhere that I
> > > can't find, and not in mainline yet.  Is there documentation somewhere?
> > 
> > Yes, there is documentation.  In -next somewhere.  :-/
> 
> Looking at the documentation link that Marco kindly provided, I don't
> think ASSERT_EXCLUSIVE_BITS is the correct annotation to make.  The bits
> in question are being accessed in parallel, it's just that we're willing
> to accept a certain degree of inaccuracy.
> 
> > > struct xa_node {
> > >         unsigned char   shift;          /* Bits remaining in each slot */
> > >         unsigned char   offset;         /* Slot offset in parent */
> > >         unsigned char   count;          /* Total entry count */
> > >         unsigned char   nr_values;      /* Value entry count */
> > >         struct xa_node __rcu *parent;   /* NULL at top of tree */
> > >         struct xarray   *array;         /* The array we belong to */
> > >         union {
> > >                 struct list_head private_list;  /* For tree user */
> > >                 struct rcu_head rcu_head;       /* Used when freeing node */
> > >         };
> > >         void __rcu      *slots[XA_CHUNK_SIZE];
> > >         union {
> > >                 unsigned long   tags[XA_MAX_MARKS][XA_MARK_LONGS];
> > >                 unsigned long   marks[XA_MAX_MARKS][XA_MARK_LONGS];
> > >         };
> > > };
> > > 
> > > 'slots' may be modified with the xa_lock
> > > held, and simultaneously read by an RCU reader.  Ditto 'tags'/'marks'.
> > 
> > KCSAN expects that accesses to the ->parent field should be marked.
> > But if ->parent is always accessed via things like rcu_dereference()
> > and rcu_assign_pointer() (guessing based on the __rcu), then KCSAN
> > won't complain.
> [...]
> > The situation with ->slots is the same as that for ->parent.
> > 
> > KCSAN expects accesses to the ->tags[] and ->marks[] arrays to be marked.
> > However, the default configuration of KCSAN asks only that the reads
> > be marked.  (Within RCU, I instead configure KCSAN so that it asks that
> > both be marked, but it is of course your choice within your code.)
> > 
> > > The RCU readers are prepared for what they see to be inconsistent --
> > > a fact of life when dealing with RCU!  So in a sense, yes, there is a
> > > race there.  But it's a known, accepted race, and that acceptance is
> > > indicated by the fact that the RCU lock is held.  Does there need to be
> > > more annotation here?  Or is there an un-noticed bug that the tool is
> > > legitimately pointing out?
> > 
> > The answer to both questions is "maybe", depending on the code using
> > the values read.  Yes, it would be nice if KCSAN could figure out the
> > difference, but there are limits to what a tool can do.  And things
> > are sometimes no-obvious, for example:
> 
> I require the following properties from this array of bits:
> 
>  - If a bit was set before and after the modification, it must be seen to
>    be set.
>  - If a bit was clear before and after the modification, it must be seen to
>    be clear.
>  - If a bit is modified, it may be seen as set or clear.
> 
> I have found three locations where we use the ->marks array:
> 
> 1.
>                         unsigned long data = *addr & (~0UL << offset);
>                         if (data)
>                                 return __ffs(data);
> 
> 2.
>         return find_next_bit(addr, XA_CHUNK_SIZE, offset);
> 3.
>         return test_bit(offset, node_marks(node, mark));
> 
> The modifications -- all done with the spinlock held -- use the non-atomic
> bitops:
>         return __test_and_set_bit(offset, node_marks(node, mark));
>         return __test_and_clear_bit(offset, node_marks(node, mark));
>         bitmap_fill(node_marks(node, mark), XA_CHUNK_SIZE);
> (that last one doesn't really count -- it's done prior to placing the node
> in the tree)
> 
> The first read seems straightforward; I can place a READ_ONCE around
> *addr.  The second & third reads are rather less straightforward.
> find_next_bit() and test_bit() are common code and use plain loads today.

Yes, those last two are a bit annoying, aren't they?  I guess the first
thing would be placing READ_ONCE() inside them, and if that results in
regressions, have an alternative API for concurrent access?

> > Your mileage may vary, of course.  For one thing, your actuarial
> > statistics are quite likley significantly more favorable than are mine.
> > Not that mine are at all bad, particularly by the standards of a century
> > or two ago.  ;-)
> 
> I don't know ... my preferred form of exercise takes me into the
> harm's way of cars, so while I've significantly lowered my risk of
> dying of cardiovascular disease, I'm much more likely to be killed by
> an inattentive human being!  Bring on our robot cars ...

I did my 40 years using that mode of transport.  But my reflexes are
no longer up to the task.  So I have no advice for you on that one!
Other than to question your trust in robot cars whose code was produced
by human developers!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-05 21:39           ` Paul E. McKenney
@ 2020-03-06 13:38             ` Marco Elver
  2020-03-06 16:53               ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Marco Elver @ 2020-03-06 13:38 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Matthew Wilcox, Qian Cai, LKML

On Thu, 5 Mar 2020 at 22:39, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Mar 05, 2020 at 07:18:31AM -0800, Matthew Wilcox wrote:
> > On Wed, Mar 04, 2020 at 06:10:21AM -0800, Paul E. McKenney wrote:
> > > On Tue, Mar 03, 2020 at 08:33:56PM -0800, Matthew Wilcox wrote:
> > > > On Tue, Mar 03, 2020 at 08:05:15PM -0800, Paul E. McKenney wrote:
> > > > > On Tue, Mar 03, 2020 at 07:33:29PM -0800, Matthew Wilcox wrote:
> > > > > > On Tue, Mar 03, 2020 at 10:15:51PM -0500, Qian Cai wrote:
> > > > > > > Functions like xas_find_marked(), xas_set_mark(), and xas_clear_mark()
> > > > > > > could happen concurrently result in data races, but those operate only
> > > > > > > on a single bit that are pretty much harmless. For example,
> > > > > >
> > > > > > Those aren't data races.  The writes are protected by a spinlock and the
> > > > > > reads by the RCU read lock.  If the tool can't handle RCU protection,
> > > > > > it's not going to be much use.
> > > > >
> > > > > Would KCSAN's ASSERT_EXCLUSIVE_BITS() help here?
> > > >
> > > > I'm quite lost in the sea of new macros that have been added to help
> > > > with KCSAN.  It doesn't help that they're in -next somewhere that I
> > > > can't find, and not in mainline yet.  Is there documentation somewhere?
> > >
> > > Yes, there is documentation.  In -next somewhere.  :-/
> >
> > Looking at the documentation link that Marco kindly provided, I don't
> > think ASSERT_EXCLUSIVE_BITS is the correct annotation to make.  The bits
> > in question are being accessed in parallel, it's just that we're willing
> > to accept a certain degree of inaccuracy.
> >
> > > > struct xa_node {
> > > >         unsigned char   shift;          /* Bits remaining in each slot */
> > > >         unsigned char   offset;         /* Slot offset in parent */
> > > >         unsigned char   count;          /* Total entry count */
> > > >         unsigned char   nr_values;      /* Value entry count */
> > > >         struct xa_node __rcu *parent;   /* NULL at top of tree */
> > > >         struct xarray   *array;         /* The array we belong to */
> > > >         union {
> > > >                 struct list_head private_list;  /* For tree user */
> > > >                 struct rcu_head rcu_head;       /* Used when freeing node */
> > > >         };
> > > >         void __rcu      *slots[XA_CHUNK_SIZE];
> > > >         union {
> > > >                 unsigned long   tags[XA_MAX_MARKS][XA_MARK_LONGS];
> > > >                 unsigned long   marks[XA_MAX_MARKS][XA_MARK_LONGS];
> > > >         };
> > > > };
> > > >
> > > > 'slots' may be modified with the xa_lock
> > > > held, and simultaneously read by an RCU reader.  Ditto 'tags'/'marks'.
> > >
> > > KCSAN expects that accesses to the ->parent field should be marked.
> > > But if ->parent is always accessed via things like rcu_dereference()
> > > and rcu_assign_pointer() (guessing based on the __rcu), then KCSAN
> > > won't complain.
> > [...]
> > > The situation with ->slots is the same as that for ->parent.
> > >
> > > KCSAN expects accesses to the ->tags[] and ->marks[] arrays to be marked.
> > > However, the default configuration of KCSAN asks only that the reads
> > > be marked.  (Within RCU, I instead configure KCSAN so that it asks that
> > > both be marked, but it is of course your choice within your code.)
> > >
> > > > The RCU readers are prepared for what they see to be inconsistent --
> > > > a fact of life when dealing with RCU!  So in a sense, yes, there is a
> > > > race there.  But it's a known, accepted race, and that acceptance is
> > > > indicated by the fact that the RCU lock is held.  Does there need to be
> > > > more annotation here?  Or is there an un-noticed bug that the tool is
> > > > legitimately pointing out?
> > >
> > > The answer to both questions is "maybe", depending on the code using
> > > the values read.  Yes, it would be nice if KCSAN could figure out the
> > > difference, but there are limits to what a tool can do.  And things
> > > are sometimes no-obvious, for example:
> >
> > I require the following properties from this array of bits:
> >
> >  - If a bit was set before and after the modification, it must be seen to
> >    be set.
> >  - If a bit was clear before and after the modification, it must be seen to
> >    be clear.
> >  - If a bit is modified, it may be seen as set or clear.
> >
> > I have found three locations where we use the ->marks array:
> >
> > 1.
> >                         unsigned long data = *addr & (~0UL << offset);
> >                         if (data)
> >                                 return __ffs(data);
> >
> > 2.
> >         return find_next_bit(addr, XA_CHUNK_SIZE, offset);
> > 3.
> >         return test_bit(offset, node_marks(node, mark));
> >
> > The modifications -- all done with the spinlock held -- use the non-atomic
> > bitops:
> >         return __test_and_set_bit(offset, node_marks(node, mark));
> >         return __test_and_clear_bit(offset, node_marks(node, mark));
> >         bitmap_fill(node_marks(node, mark), XA_CHUNK_SIZE);
> > (that last one doesn't really count -- it's done prior to placing the node
> > in the tree)
> >
> > The first read seems straightforward; I can place a READ_ONCE around
> > *addr.  The second & third reads are rather less straightforward.
> > find_next_bit() and test_bit() are common code and use plain loads today.
>
> Yes, those last two are a bit annoying, aren't they?  I guess the first
> thing would be placing READ_ONCE() inside them, and if that results in
> regressions, have an alternative API for concurrent access?

FWIW test_bit() is an "atomic" bitop (per atomic_bitops.txt), and
KCSAN treats it as such. On x86 arch_test_bit() is not instrumented,
and then in asm-generic/bitops/instrumented-non-atomic.h test_bit() is
instrumented with instrument_atomic_read(). So on x86, things should
already be fine for test_bit(). Not sure about other architectures.

Thanks,
-- Marco



> > > Your mileage may vary, of course.  For one thing, your actuarial
> > > statistics are quite likley significantly more favorable than are mine.
> > > Not that mine are at all bad, particularly by the standards of a century
> > > or two ago.  ;-)
> >
> > I don't know ... my preferred form of exercise takes me into the
> > harm's way of cars, so while I've significantly lowered my risk of
> > dying of cardiovascular disease, I'm much more likely to be killed by
> > an inattentive human being!  Bring on our robot cars ...
>
> I did my 40 years using that mode of transport.  But my reflexes are
> no longer up to the task.  So I have no advice for you on that one!
> Other than to question your trust in robot cars whose code was produced
> by human developers!  ;-)
>
>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-06 13:38             ` Marco Elver
@ 2020-03-06 16:53               ` Matthew Wilcox
  2020-03-06 17:03                 ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2020-03-06 16:53 UTC (permalink / raw)
  To: Marco Elver; +Cc: Paul E. McKenney, Qian Cai, LKML

On Fri, Mar 06, 2020 at 02:38:39PM +0100, Marco Elver wrote:
> On Thu, 5 Mar 2020 at 22:39, Paul E. McKenney <paulmck@kernel.org> wrote:
> > On Thu, Mar 05, 2020 at 07:18:31AM -0800, Matthew Wilcox wrote:
> > > I have found three locations where we use the ->marks array:
> > >
> > > 1.
> > >                         unsigned long data = *addr & (~0UL << offset);
> > >                         if (data)
> > >                                 return __ffs(data);
> > >
> > > 2.
> > >         return find_next_bit(addr, XA_CHUNK_SIZE, offset);
> > > 3.
> > >         return test_bit(offset, node_marks(node, mark));
> > >
> > > The modifications -- all done with the spinlock held -- use the non-atomic
> > > bitops:
> > >         return __test_and_set_bit(offset, node_marks(node, mark));
> > >         return __test_and_clear_bit(offset, node_marks(node, mark));
> > >         bitmap_fill(node_marks(node, mark), XA_CHUNK_SIZE);
> > > (that last one doesn't really count -- it's done prior to placing the node
> > > in the tree)
> > >
> > > The first read seems straightforward; I can place a READ_ONCE around
> > > *addr.  The second & third reads are rather less straightforward.
> > > find_next_bit() and test_bit() are common code and use plain loads today.
> >
> > Yes, those last two are a bit annoying, aren't they?  I guess the first
> > thing would be placing READ_ONCE() inside them, and if that results in
> > regressions, have an alternative API for concurrent access?
> 
> FWIW test_bit() is an "atomic" bitop (per atomic_bitops.txt), and
> KCSAN treats it as such. On x86 arch_test_bit() is not instrumented,
> and then in asm-generic/bitops/instrumented-non-atomic.h test_bit() is
> instrumented with instrument_atomic_read(). So on x86, things should
> already be fine for test_bit(). Not sure about other architectures.

Hum.  It may well be documented as atomic, but is it?  Here's the
generic implementation:

static inline int test_bit(int nr, const volatile unsigned long *addr)
{
        return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}

arch_test_bit is only used by the instrumented variants:

$ git grep arch_test_bit include
include/asm-generic/bitops/instrumented-non-atomic.h:   return arch_test_bit(nr, addr);

As far as I can tell, the generic version is what's used on x86.  Does
the 'volatile' qualifier save us here?

find_next_bit() doesn't have the 'volatile' qualifier, so may still be
a problem?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-06 16:53               ` Matthew Wilcox
@ 2020-03-06 17:03                 ` Paul E. McKenney
  2020-03-06 20:04                   ` Marco Elver
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2020-03-06 17:03 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Marco Elver, Qian Cai, LKML

On Fri, Mar 06, 2020 at 08:53:00AM -0800, Matthew Wilcox wrote:
> On Fri, Mar 06, 2020 at 02:38:39PM +0100, Marco Elver wrote:
> > On Thu, 5 Mar 2020 at 22:39, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > On Thu, Mar 05, 2020 at 07:18:31AM -0800, Matthew Wilcox wrote:
> > > > I have found three locations where we use the ->marks array:
> > > >
> > > > 1.
> > > >                         unsigned long data = *addr & (~0UL << offset);
> > > >                         if (data)
> > > >                                 return __ffs(data);
> > > >
> > > > 2.
> > > >         return find_next_bit(addr, XA_CHUNK_SIZE, offset);
> > > > 3.
> > > >         return test_bit(offset, node_marks(node, mark));
> > > >
> > > > The modifications -- all done with the spinlock held -- use the non-atomic
> > > > bitops:
> > > >         return __test_and_set_bit(offset, node_marks(node, mark));
> > > >         return __test_and_clear_bit(offset, node_marks(node, mark));
> > > >         bitmap_fill(node_marks(node, mark), XA_CHUNK_SIZE);
> > > > (that last one doesn't really count -- it's done prior to placing the node
> > > > in the tree)
> > > >
> > > > The first read seems straightforward; I can place a READ_ONCE around
> > > > *addr.  The second & third reads are rather less straightforward.
> > > > find_next_bit() and test_bit() are common code and use plain loads today.
> > >
> > > Yes, those last two are a bit annoying, aren't they?  I guess the first
> > > thing would be placing READ_ONCE() inside them, and if that results in
> > > regressions, have an alternative API for concurrent access?
> > 
> > FWIW test_bit() is an "atomic" bitop (per atomic_bitops.txt), and
> > KCSAN treats it as such. On x86 arch_test_bit() is not instrumented,
> > and then in asm-generic/bitops/instrumented-non-atomic.h test_bit() is
> > instrumented with instrument_atomic_read(). So on x86, things should
> > already be fine for test_bit(). Not sure about other architectures.
> 
> Hum.  It may well be documented as atomic, but is it?  Here's the
> generic implementation:
> 
> static inline int test_bit(int nr, const volatile unsigned long *addr)
> {
>         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> }
> 
> arch_test_bit is only used by the instrumented variants:
> 
> $ git grep arch_test_bit include
> include/asm-generic/bitops/instrumented-non-atomic.h:   return arch_test_bit(nr, addr);
> 
> As far as I can tell, the generic version is what's used on x86.  Does
> the 'volatile' qualifier save us here?
> 
> find_next_bit() doesn't have the 'volatile' qualifier, so may still be
> a problem?

One approach would be to add the needed READ_ONCE().

Another, if someone is crazy enough to do the work, would be to verify
that the code output is as if there was a READ_ONCE().

Thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -next] lib: disable KCSAN for XArray
  2020-03-06 17:03                 ` Paul E. McKenney
@ 2020-03-06 20:04                   ` Marco Elver
  0 siblings, 0 replies; 16+ messages in thread
From: Marco Elver @ 2020-03-06 20:04 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Matthew Wilcox, Qian Cai, LKML

On Fri, 6 Mar 2020 at 18:03, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Mar 06, 2020 at 08:53:00AM -0800, Matthew Wilcox wrote:
> > On Fri, Mar 06, 2020 at 02:38:39PM +0100, Marco Elver wrote:
> > > On Thu, 5 Mar 2020 at 22:39, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > On Thu, Mar 05, 2020 at 07:18:31AM -0800, Matthew Wilcox wrote:
> > > > > I have found three locations where we use the ->marks array:
> > > > >
> > > > > 1.
> > > > >                         unsigned long data = *addr & (~0UL << offset);
> > > > >                         if (data)
> > > > >                                 return __ffs(data);
> > > > >
> > > > > 2.
> > > > >         return find_next_bit(addr, XA_CHUNK_SIZE, offset);
> > > > > 3.
> > > > >         return test_bit(offset, node_marks(node, mark));
> > > > >
> > > > > The modifications -- all done with the spinlock held -- use the non-atomic
> > > > > bitops:
> > > > >         return __test_and_set_bit(offset, node_marks(node, mark));
> > > > >         return __test_and_clear_bit(offset, node_marks(node, mark));
> > > > >         bitmap_fill(node_marks(node, mark), XA_CHUNK_SIZE);
> > > > > (that last one doesn't really count -- it's done prior to placing the node
> > > > > in the tree)
> > > > >
> > > > > The first read seems straightforward; I can place a READ_ONCE around
> > > > > *addr.  The second & third reads are rather less straightforward.
> > > > > find_next_bit() and test_bit() are common code and use plain loads today.
> > > >
> > > > Yes, those last two are a bit annoying, aren't they?  I guess the first
> > > > thing would be placing READ_ONCE() inside them, and if that results in
> > > > regressions, have an alternative API for concurrent access?
> > >
> > > FWIW test_bit() is an "atomic" bitop (per atomic_bitops.txt), and
> > > KCSAN treats it as such. On x86 arch_test_bit() is not instrumented,
> > > and then in asm-generic/bitops/instrumented-non-atomic.h test_bit() is
> > > instrumented with instrument_atomic_read(). So on x86, things should
> > > already be fine for test_bit(). Not sure about other architectures.
> >
> > Hum.  It may well be documented as atomic, but is it?  Here's the
> > generic implementation:
> >
> > static inline int test_bit(int nr, const volatile unsigned long *addr)
> > {
> >         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > }
> >
> > arch_test_bit is only used by the instrumented variants:
> >
> > $ git grep arch_test_bit include
> > include/asm-generic/bitops/instrumented-non-atomic.h:   return arch_test_bit(nr, addr);
> >
> > As far as I can tell, the generic version is what's used on x86.  Does
> > the 'volatile' qualifier save us here?

x86 uses its own implementation of test_bit(), which we had to address
early on, otherwise we would still have tons of false reports due to
test_bit() usage.

$ grep -E -A3 'test_bit|instrumented' arch/x86/include/asm/bitops.h
static __no_kcsan_or_inline bool constant_test_bit(long nr, const
volatile unsigned long *addr)
{
        /*
         * Because this is a plain access, we need to disable KCSAN here to
         * avoid double instrumentation via instrumented bitops.
         */
        return ((1UL << (nr & (BITS_PER_LONG-1))) &
                (addr[nr >> _BITOPS_LONG_SHIFT])) != 0;
--
static __always_inline bool variable_test_bit(long nr, volatile const
unsigned long *addr)
--
#define arch_test_bit(nr, addr)                 \
        (__builtin_constant_p((nr))             \
         ? constant_test_bit((nr), (addr))      \
         : variable_test_bit((nr), (addr)))
--
#include <asm-generic/bitops/instrumented-atomic.h>
#include <asm-generic/bitops/instrumented-non-atomic.h>
#include <asm-generic/bitops/instrumented-lock.h>


For the asm-generic variant, the cast to volatile should have the same
effect as READ_ONCE today (except maybe on Alpha?).  We would still
need to use READ_ONCE() in asm-generic's test_bit() though, to avoid
KCSAN false positives on other architectures. The code-gen should be
the same. I can try to send a patch and see if that's ok to do.

> > find_next_bit() doesn't have the 'volatile' qualifier, so may still be
> > a problem?
>
> One approach would be to add the needed READ_ONCE().
>
> Another, if someone is crazy enough to do the work, would be to verify
> that the code output is as if there was a READ_ONCE().
>
> Thoughts?

find_next_bit() is difficult. The code is definitely not the same with
READ_ONCE(), there are 8 more instructions (x86-64). For now the only
thing we can do if we're fine with data-racy behaviour in
find_next_bit(), without changing it, is to use it with
'data_race(find_next_bit(...))'. Not great though. :-/

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-03-06 20:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-04  3:15 [PATCH -next] lib: disable KCSAN for XArray Qian Cai
2020-03-04  3:33 ` Matthew Wilcox
2020-03-04  3:55   ` Qian Cai
2020-03-04  4:00     ` Matthew Wilcox
2020-03-04  4:05   ` Paul E. McKenney
2020-03-04  4:33     ` Matthew Wilcox
2020-03-04 14:10       ` Paul E. McKenney
2020-03-04 16:40         ` Marco Elver
2020-03-04 17:10           ` Qian Cai
2020-03-04 17:14             ` Paul E. McKenney
2020-03-05 15:18         ` Matthew Wilcox
2020-03-05 21:39           ` Paul E. McKenney
2020-03-06 13:38             ` Marco Elver
2020-03-06 16:53               ` Matthew Wilcox
2020-03-06 17:03                 ` Paul E. McKenney
2020-03-06 20:04                   ` Marco Elver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).