linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* seqcount usage in xt_replace_table()
@ 2019-01-08 19:33 Anatol Pomozov
  2019-01-08 22:37 ` Florian Westphal
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Anatol Pomozov @ 2019-01-08 19:33 UTC (permalink / raw)
  To: fw, Dmitry Vyukov, paulmck, LKML

Hello folks,

A bit of context what I am doing. I am trying to port KTSAN (Kernel
Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
and makes sure it is accessed in a thread-safe manner.

seqlock is a synchronization primitive used by Linux kernel. KTSAN
annotates read_seqbegin()/read_seqretry() and tracks what data been
accessed in its critical section.

During KTSAN port I found and interesting seqcount usage introduced in
commit 80055dab5de0c8677bc148c4717ddfc753a9148e

If I read this commit correctly xt_replace_table() does not use
seqlock in a canonical way to specify a critical section. Instead the
code reads the counter and waits until it gets to a specific value.

Now I want KTSAN to play with this code nicely. I need to tell KTSAN
something like "this raw_read_seqcount() does not start a critical
section, just ignore it". So temporary I introduced
raw_read_seqcount_nocritical() function that is ignored by KTSAN. Is
it a good solution?

Or maybe xt_replace_table() can be enhanced? When I hear that
something waits until an event happens on all CPUs I think about
wait_event() function. Would it be better for xt_replace_table() to
introduce an atomic counter that is decremented by CPUs, and the main
CPU waits until the counter gets zero?

WDYT?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-08 19:33 seqcount usage in xt_replace_table() Anatol Pomozov
@ 2019-01-08 22:37 ` Florian Westphal
  2019-01-10 12:41   ` Peter Zijlstra
  2019-01-09  0:02 ` Andrea Parri
  2019-01-10 12:44 ` Peter Zijlstra
  2 siblings, 1 reply; 28+ messages in thread
From: Florian Westphal @ 2019-01-08 22:37 UTC (permalink / raw)
  To: Anatol Pomozov; +Cc: fw, Dmitry Vyukov, paulmck, LKML

Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
> Or maybe xt_replace_table() can be enhanced? When I hear that
> something waits until an event happens on all CPUs I think about
> wait_event() function. Would it be better for xt_replace_table() to
> introduce an atomic counter that is decremented by CPUs, and the main
> CPU waits until the counter gets zero?

That would mean placing an additional atomic op into the
iptables evaluation path (ipt_do_table and friends).

Only alternative I see that might work is synchronize_rcu (the
_do_table functions are called with rcu read lock held).

I guess current scheme is cheaper though.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-08 19:33 seqcount usage in xt_replace_table() Anatol Pomozov
  2019-01-08 22:37 ` Florian Westphal
@ 2019-01-09  0:02 ` Andrea Parri
  2019-01-09  0:36   ` Anatol Pomozov
  2019-01-10 12:44 ` Peter Zijlstra
  2 siblings, 1 reply; 28+ messages in thread
From: Andrea Parri @ 2019-01-09  0:02 UTC (permalink / raw)
  To: Anatol Pomozov; +Cc: fw, Dmitry Vyukov, paulmck, LKML

Hi Anatol,

On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> Hello folks,
> 
> A bit of context what I am doing. I am trying to port KTSAN (Kernel
> Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> and makes sure it is accessed in a thread-safe manner.

Interesting!  FYI, some LKMM's maintainers (Paul included) had and
continued to have some "fun" discussing topics related to "thread-
safe memory accesses": I'm sure that they'll be very interested in
such work of yours and eager to discuss your results.

Cheers,
  Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-09  0:02 ` Andrea Parri
@ 2019-01-09  0:36   ` Anatol Pomozov
  2019-01-09  5:35     ` Dmitry Vyukov
  2019-01-09 11:24     ` Andrea Parri
  0 siblings, 2 replies; 28+ messages in thread
From: Anatol Pomozov @ 2019-01-09  0:36 UTC (permalink / raw)
  To: Andrea Parri; +Cc: fw, Dmitry Vyukov, Paul McKenney, LKML

Hello

On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
<andrea.parri@amarulasolutions.com> wrote:
>
> Hi Anatol,
>
> On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > Hello folks,
> >
> > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > and makes sure it is accessed in a thread-safe manner.
>
> Interesting!  FYI, some LKMM's maintainers (Paul included) had and
> continued to have some "fun" discussing topics related to "thread-
> safe memory accesses": I'm sure that they'll be very interested in
> such work of yours and eager to discuss your results.

Thread Sanitizer is a great tool to find thread-safety issues with
user-space code. The tool been developed by a team of smart people
from Google [1].

KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
bunch of work been done there but the project is still at
proof-of-concept point.

I am not a part of Google's dynamic tools team. But I've decided to
pick something to do during the New Year holidays so started porting
KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
fix a few crashes [3].

[1] https://github.com/google/sanitizers
[2] https://github.com/google/ktsan/wiki
[3] https://github.com/anatol/linux/tree/ktsan_4.20

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-09  0:36   ` Anatol Pomozov
@ 2019-01-09  5:35     ` Dmitry Vyukov
  2019-01-09 11:24     ` Andrea Parri
  1 sibling, 0 replies; 28+ messages in thread
From: Dmitry Vyukov @ 2019-01-09  5:35 UTC (permalink / raw)
  To: Anatol Pomozov
  Cc: Andrea Parri, Florian Westphal, Paul McKenney, LKML, Andrey Konovalov

On Wed, Jan 9, 2019 at 1:36 AM Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
>
> Hello
>
> On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> <andrea.parri@amarulasolutions.com> wrote:
> >
> > Hi Anatol,
> >
> > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > Hello folks,
> > >
> > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > and makes sure it is accessed in a thread-safe manner.
> >
> > Interesting!  FYI, some LKMM's maintainers (Paul included) had and
> > continued to have some "fun" discussing topics related to "thread-
> > safe memory accesses": I'm sure that they'll be very interested in
> > such work of yours and eager to discuss your results.
>
> Thread Sanitizer is a great tool to find thread-safety issues with
> user-space code. The tool been developed by a team of smart people
> from Google [1].
>
> KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> bunch of work been done there but the project is still at
> proof-of-concept point.
>
> I am not a part of Google's dynamic tools team. But I've decided to
> pick something to do during the New Year holidays so started porting
> KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> fix a few crashes [3].
>
> [1] https://github.com/google/sanitizers
> [2] https://github.com/google/ktsan/wiki
> [3] https://github.com/anatol/linux/tree/ktsan_4.20


For completeness, at the time we also had to add
read_seqcount_cancel() function to dynamically mark all seqcount read
regions. It can be used here too, start read region and immediately
end it. Less clear than raw_read_seqcount_nocritical(), but also less
API surface.

/**
 * read_seqcount_cancel - cancel a seq-read critical section
 * @s: pointer to seqcount_t
 *
 * This is a no-op except for ktsan, it needs to know scopes of seq-read
 * critical sections. The sections are denoted either by begin->retry or
 * by begin->cancel.
 */
static inline void read_seqcount_cancel(const seqcount_t *s)
{
        ktsan_seqcount_end(s);
}

https://github.com/google/ktsan/blob/tsan/include/linux/seqlock.h#L235-L246

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-09  0:36   ` Anatol Pomozov
  2019-01-09  5:35     ` Dmitry Vyukov
@ 2019-01-09 11:24     ` Andrea Parri
  2019-01-09 11:55       ` Dmitry Vyukov
  1 sibling, 1 reply; 28+ messages in thread
From: Andrea Parri @ 2019-01-09 11:24 UTC (permalink / raw)
  To: Anatol Pomozov; +Cc: fw, Dmitry Vyukov, Paul McKenney, LKML, Andrey Konovalov

On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> Hello
> 
> On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> <andrea.parri@amarulasolutions.com> wrote:
> >
> > Hi Anatol,
> >
> > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > Hello folks,
> > >
> > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > and makes sure it is accessed in a thread-safe manner.
> >
> > Interesting!  FYI, some LKMM's maintainers (Paul included) had and
> > continued to have some "fun" discussing topics related to "thread-
> > safe memory accesses": I'm sure that they'll be very interested in
> > such work of yours and eager to discuss your results.
> 
> Thread Sanitizer is a great tool to find thread-safety issues with
> user-space code. The tool been developed by a team of smart people
> from Google [1].
> 
> KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> bunch of work been done there but the project is still at
> proof-of-concept point.

Yes, I have been aware of these tools since at least  ;-)

  https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ


> 
> I am not a part of Google's dynamic tools team. But I've decided to
> pick something to do during the New Year holidays so started porting
> KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> fix a few crashes [3].

I guess my first reaction would remain

  "it's kind of hard (to use an euphemism) to review 7,582 additions
  or so for a data race detector without a clear/an accepted (by the
  community) notion of data race..."

  Andrea


> 
> [1] https://github.com/google/sanitizers
> [2] https://github.com/google/ktsan/wiki
> [3] https://github.com/anatol/linux/tree/ktsan_4.20

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-09 11:24     ` Andrea Parri
@ 2019-01-09 11:55       ` Dmitry Vyukov
  2019-01-09 12:11         ` Andrea Parri
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2019-01-09 11:55 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Anatol Pomozov, Florian Westphal, Paul McKenney, LKML, Andrey Konovalov

On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
<andrea.parri@amarulasolutions.com> wrote:
>
> On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > Hello
> >
> > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > <andrea.parri@amarulasolutions.com> wrote:
> > >
> > > Hi Anatol,
> > >
> > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > Hello folks,
> > > >
> > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > and makes sure it is accessed in a thread-safe manner.
> > >
> > > Interesting!  FYI, some LKMM's maintainers (Paul included) had and
> > > continued to have some "fun" discussing topics related to "thread-
> > > safe memory accesses": I'm sure that they'll be very interested in
> > > such work of yours and eager to discuss your results.
> >
> > Thread Sanitizer is a great tool to find thread-safety issues with
> > user-space code. The tool been developed by a team of smart people
> > from Google [1].
> >
> > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > bunch of work been done there but the project is still at
> > proof-of-concept point.
>
> Yes, I have been aware of these tools since at least  ;-)
>
>   https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
>
>
> >
> > I am not a part of Google's dynamic tools team. But I've decided to
> > pick something to do during the New Year holidays so started porting
> > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > fix a few crashes [3].
>
> I guess my first reaction would remain
>
>   "it's kind of hard (to use an euphemism) to review 7,582 additions
>   or so for a data race detector without a clear/an accepted (by the
>   community) notion of data race..."

Tsan's notion of a data race is basically the C/C++'s notion:
concurrent/unsynchronized non-atomic access in different threads at
least one of which is a write.
Tremendous (for such a project) benefits of automatic data race
detection is a good motivation to finally agree on and accept a
practically useful notion of a data race.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-09 11:55       ` Dmitry Vyukov
@ 2019-01-09 12:11         ` Andrea Parri
  2019-01-09 12:29           ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrea Parri @ 2019-01-09 12:11 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Anatol Pomozov, Florian Westphal, Paul McKenney, LKML,
	Andrey Konovalov, Alan Stern, Luc Maranget, Will Deacon,
	Peter Zijlstra

On Wed, Jan 09, 2019 at 12:55:27PM +0100, Dmitry Vyukov wrote:
> On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
> <andrea.parri@amarulasolutions.com> wrote:
> >
> > On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > > Hello
> > >
> > > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > > <andrea.parri@amarulasolutions.com> wrote:
> > > >
> > > > Hi Anatol,
> > > >
> > > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > > Hello folks,
> > > > >
> > > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > > and makes sure it is accessed in a thread-safe manner.
> > > >
> > > > Interesting!  FYI, some LKMM's maintainers (Paul included) had and
> > > > continued to have some "fun" discussing topics related to "thread-
> > > > safe memory accesses": I'm sure that they'll be very interested in
> > > > such work of yours and eager to discuss your results.
> > >
> > > Thread Sanitizer is a great tool to find thread-safety issues with
> > > user-space code. The tool been developed by a team of smart people
> > > from Google [1].
> > >
> > > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > > bunch of work been done there but the project is still at
> > > proof-of-concept point.
> >
> > Yes, I have been aware of these tools since at least  ;-)
> >
> >   https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
> >
> >
> > >
> > > I am not a part of Google's dynamic tools team. But I've decided to
> > > pick something to do during the New Year holidays so started porting
> > > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > > fix a few crashes [3].
> >
> > I guess my first reaction would remain
> >
> >   "it's kind of hard (to use an euphemism) to review 7,582 additions
> >   or so for a data race detector without a clear/an accepted (by the
> >   community) notion of data race..."
> 
> Tsan's notion of a data race is basically the C/C++'s notion:
> concurrent/unsynchronized non-atomic access in different threads at
> least one of which is a write.

Yeah, I think that this notion needs to be detailed, discussed,
documented, and discussed again. ;-)


> Tremendous (for such a project) benefits of automatic data race
> detection is a good motivation to finally agree on and accept a
> practically useful notion of a data race.

Agreed.

  Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-09 12:11         ` Andrea Parri
@ 2019-01-09 12:29           ` Dmitry Vyukov
  2019-01-09 17:10             ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2019-01-09 12:29 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Anatol Pomozov, Florian Westphal, Paul McKenney, LKML,
	Andrey Konovalov, Alan Stern, Luc Maranget, Will Deacon,
	Peter Zijlstra

On Wed, Jan 9, 2019 at 1:11 PM Andrea Parri
<andrea.parri@amarulasolutions.com> wrote:
>
> On Wed, Jan 09, 2019 at 12:55:27PM +0100, Dmitry Vyukov wrote:
> > On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
> > <andrea.parri@amarulasolutions.com> wrote:
> > >
> > > On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > > > Hello
> > > >
> > > > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > > > <andrea.parri@amarulasolutions.com> wrote:
> > > > >
> > > > > Hi Anatol,
> > > > >
> > > > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > > > Hello folks,
> > > > > >
> > > > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > > > and makes sure it is accessed in a thread-safe manner.
> > > > >
> > > > > Interesting!  FYI, some LKMM's maintainers (Paul included) had and
> > > > > continued to have some "fun" discussing topics related to "thread-
> > > > > safe memory accesses": I'm sure that they'll be very interested in
> > > > > such work of yours and eager to discuss your results.
> > > >
> > > > Thread Sanitizer is a great tool to find thread-safety issues with
> > > > user-space code. The tool been developed by a team of smart people
> > > > from Google [1].
> > > >
> > > > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > > > bunch of work been done there but the project is still at
> > > > proof-of-concept point.
> > >
> > > Yes, I have been aware of these tools since at least  ;-)
> > >
> > >   https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
> > >
> > >
> > > >
> > > > I am not a part of Google's dynamic tools team. But I've decided to
> > > > pick something to do during the New Year holidays so started porting
> > > > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > > > fix a few crashes [3].
> > >
> > > I guess my first reaction would remain
> > >
> > >   "it's kind of hard (to use an euphemism) to review 7,582 additions
> > >   or so for a data race detector without a clear/an accepted (by the
> > >   community) notion of data race..."
> >
> > Tsan's notion of a data race is basically the C/C++'s notion:
> > concurrent/unsynchronized non-atomic access in different threads at
> > least one of which is a write.
>
> Yeah, I think that this notion needs to be detailed, discussed,
> documented, and discussed again. ;-)
>
>
> > Tremendous (for such a project) benefits of automatic data race
> > detection is a good motivation to finally agree on and accept a
> > practically useful notion of a data race.
>
> Agreed.


While having a 100% formal definition of a data race upfront would be
useful, I don't think this is a hard requirement for deployment of
KTSAN. What I think is required is:
1. Agree that the overall direction is right.
2. Agree that we want to enable data race detection and resolve
problems as they appear in a practical manner (rather than block whole
effort on every small thing).
We deployed TSAN in user-space in much larger code bases than kernel,
and while we had the C/C++ formal definition of a data race, practical
and legacy matters were similar to that of the kernel (lots of legacy
code, different opinions, etc). Doing both things in tandem (defining
a memory model and deploying a data race detector) can actually have
benefits as a race detector may point to under-defined or
impractically defined areas, and will otherwise help to validate that
the model works and is useful.
KTSAN is not fixed as well. We adopted it as we gathered more
knowledge and understanding of the kernel. So it's not that we have to
commit to something upfront.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-09 12:29           ` Dmitry Vyukov
@ 2019-01-09 17:10             ` Paul E. McKenney
  2019-01-10  8:49               ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2019-01-09 17:10 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrea Parri, Anatol Pomozov, Florian Westphal, LKML,
	Andrey Konovalov, Alan Stern, Luc Maranget, Will Deacon,
	Peter Zijlstra

On Wed, Jan 09, 2019 at 01:29:02PM +0100, Dmitry Vyukov wrote:
> On Wed, Jan 9, 2019 at 1:11 PM Andrea Parri
> <andrea.parri@amarulasolutions.com> wrote:
> >
> > On Wed, Jan 09, 2019 at 12:55:27PM +0100, Dmitry Vyukov wrote:
> > > On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
> > > <andrea.parri@amarulasolutions.com> wrote:
> > > >
> > > > On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > > > > Hello
> > > > >
> > > > > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > > > > <andrea.parri@amarulasolutions.com> wrote:
> > > > > >
> > > > > > Hi Anatol,
> > > > > >
> > > > > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > > > > Hello folks,
> > > > > > >
> > > > > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > > > > and makes sure it is accessed in a thread-safe manner.
> > > > > >
> > > > > > Interesting!  FYI, some LKMM's maintainers (Paul included) had and
> > > > > > continued to have some "fun" discussing topics related to "thread-
> > > > > > safe memory accesses": I'm sure that they'll be very interested in
> > > > > > such work of yours and eager to discuss your results.
> > > > >
> > > > > Thread Sanitizer is a great tool to find thread-safety issues with
> > > > > user-space code. The tool been developed by a team of smart people
> > > > > from Google [1].
> > > > >
> > > > > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > > > > bunch of work been done there but the project is still at
> > > > > proof-of-concept point.
> > > >
> > > > Yes, I have been aware of these tools since at least  ;-)
> > > >
> > > >   https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
> > > >
> > > >
> > > > >
> > > > > I am not a part of Google's dynamic tools team. But I've decided to
> > > > > pick something to do during the New Year holidays so started porting
> > > > > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > > > > fix a few crashes [3].
> > > >
> > > > I guess my first reaction would remain
> > > >
> > > >   "it's kind of hard (to use an euphemism) to review 7,582 additions
> > > >   or so for a data race detector without a clear/an accepted (by the
> > > >   community) notion of data race..."
> > >
> > > Tsan's notion of a data race is basically the C/C++'s notion:
> > > concurrent/unsynchronized non-atomic access in different threads at
> > > least one of which is a write.
> >
> > Yeah, I think that this notion needs to be detailed, discussed,
> > documented, and discussed again. ;-)
> >
> >
> > > Tremendous (for such a project) benefits of automatic data race
> > > detection is a good motivation to finally agree on and accept a
> > > practically useful notion of a data race.
> >
> > Agreed.
> 
> While having a 100% formal definition of a data race upfront would be
> useful, I don't think this is a hard requirement for deployment of
> KTSAN. What I think is required is:
> 1. Agree that the overall direction is right.
> 2. Agree that we want to enable data race detection and resolve
> problems as they appear in a practical manner (rather than block whole
> effort on every small thing).
> We deployed TSAN in user-space in much larger code bases than kernel,
> and while we had the C/C++ formal definition of a data race, practical
> and legacy matters were similar to that of the kernel (lots of legacy
> code, different opinions, etc). Doing both things in tandem (defining
> a memory model and deploying a data race detector) can actually have
> benefits as a race detector may point to under-defined or
> impractically defined areas, and will otherwise help to validate that
> the model works and is useful.
> KTSAN is not fixed as well. We adopted it as we gathered more
> knowledge and understanding of the kernel. So it's not that we have to
> commit to something upfront.

In any case, there might well be some differences in approach between
KTSAN and LKMM due to input size differences:  One would expect LKMM
to be able to tolerate a more computationally intensive definition as
a consequence of KTSAN's ability to process much larger code bases.

But I nevertheless believe that it would be good to have these differences
be a matter of conscious choice rather than a matter of chance.  ;-)

My guess is that LKMM picks its starting point (which might take some
additional time), then KTSAN critiques it, and then we work out what
differences should result in a change to one or the other (or both)
and which differences are inherent in the different workloads that LKMM
and KTSAN are presented with.

Seem reasonable?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-09 17:10             ` Paul E. McKenney
@ 2019-01-10  8:49               ` Dmitry Vyukov
  2019-01-10 12:30                 ` Andrea Parri
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2019-01-10  8:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andrea Parri, Anatol Pomozov, Florian Westphal, LKML,
	Andrey Konovalov, Alan Stern, Luc Maranget, Will Deacon,
	Peter Zijlstra

On Wed, Jan 9, 2019 at 6:11 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
>
> On Wed, Jan 09, 2019 at 01:29:02PM +0100, Dmitry Vyukov wrote:
> > On Wed, Jan 9, 2019 at 1:11 PM Andrea Parri
> > <andrea.parri@amarulasolutions.com> wrote:
> > >
> > > On Wed, Jan 09, 2019 at 12:55:27PM +0100, Dmitry Vyukov wrote:
> > > > On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
> > > > <andrea.parri@amarulasolutions.com> wrote:
> > > > >
> > > > > On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > > > > > Hello
> > > > > >
> > > > > > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > > > > > <andrea.parri@amarulasolutions.com> wrote:
> > > > > > >
> > > > > > > Hi Anatol,
> > > > > > >
> > > > > > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > > > > > Hello folks,
> > > > > > > >
> > > > > > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > > > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > > > > > and makes sure it is accessed in a thread-safe manner.
> > > > > > >
> > > > > > > Interesting!  FYI, some LKMM's maintainers (Paul included) had and
> > > > > > > continued to have some "fun" discussing topics related to "thread-
> > > > > > > safe memory accesses": I'm sure that they'll be very interested in
> > > > > > > such work of yours and eager to discuss your results.
> > > > > >
> > > > > > Thread Sanitizer is a great tool to find thread-safety issues with
> > > > > > user-space code. The tool been developed by a team of smart people
> > > > > > from Google [1].
> > > > > >
> > > > > > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > > > > > bunch of work been done there but the project is still at
> > > > > > proof-of-concept point.
> > > > >
> > > > > Yes, I have been aware of these tools since at least  ;-)
> > > > >
> > > > >   https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
> > > > >
> > > > >
> > > > > >
> > > > > > I am not a part of Google's dynamic tools team. But I've decided to
> > > > > > pick something to do during the New Year holidays so started porting
> > > > > > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > > > > > fix a few crashes [3].
> > > > >
> > > > > I guess my first reaction would remain
> > > > >
> > > > >   "it's kind of hard (to use an euphemism) to review 7,582 additions
> > > > >   or so for a data race detector without a clear/an accepted (by the
> > > > >   community) notion of data race..."
> > > >
> > > > Tsan's notion of a data race is basically the C/C++'s notion:
> > > > concurrent/unsynchronized non-atomic access in different threads at
> > > > least one of which is a write.
> > >
> > > Yeah, I think that this notion needs to be detailed, discussed,
> > > documented, and discussed again. ;-)
> > >
> > >
> > > > Tremendous (for such a project) benefits of automatic data race
> > > > detection is a good motivation to finally agree on and accept a
> > > > practically useful notion of a data race.
> > >
> > > Agreed.
> >
> > While having a 100% formal definition of a data race upfront would be
> > useful, I don't think this is a hard requirement for deployment of
> > KTSAN. What I think is required is:
> > 1. Agree that the overall direction is right.
> > 2. Agree that we want to enable data race detection and resolve
> > problems as they appear in a practical manner (rather than block whole
> > effort on every small thing).
> > We deployed TSAN in user-space in much larger code bases than kernel,
> > and while we had the C/C++ formal definition of a data race, practical
> > and legacy matters were similar to that of the kernel (lots of legacy
> > code, different opinions, etc). Doing both things in tandem (defining
> > a memory model and deploying a data race detector) can actually have
> > benefits as a race detector may point to under-defined or
> > impractically defined areas, and will otherwise help to validate that
> > the model works and is useful.
> > KTSAN is not fixed as well. We adopted it as we gathered more
> > knowledge and understanding of the kernel. So it's not that we have to
> > commit to something upfront.
>
> In any case, there might well be some differences in approach between
> KTSAN and LKMM due to input size differences:  One would expect LKMM
> to be able to tolerate a more computationally intensive definition as
> a consequence of KTSAN's ability to process much larger code bases.
>
> But I nevertheless believe that it would be good to have these differences
> be a matter of conscious choice rather than a matter of chance.  ;-)
>
> My guess is that LKMM picks its starting point (which might take some
> additional time), then KTSAN critiques it, and then we work out what
> differences should result in a change to one or the other (or both)
> and which differences are inherent in the different workloads that LKMM
> and KTSAN are presented with.
>
> Seem reasonable?

Sounds reasonable.

For seqcounts we currently simply ignore all accesses within the read
section (thus the requirement to dynamically track read sections).
What does LKMM say about seqlocks?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10  8:49               ` Dmitry Vyukov
@ 2019-01-10 12:30                 ` Andrea Parri
  2019-01-10 12:38                   ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrea Parri @ 2019-01-10 12:30 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Paul E. McKenney, Anatol Pomozov, Florian Westphal, LKML,
	Andrey Konovalov, Alan Stern, Luc Maranget, Will Deacon,
	Peter Zijlstra

> For seqcounts we currently simply ignore all accesses within the read
> section (thus the requirement to dynamically track read sections).
> What does LKMM say about seqlocks?

LKMM does not currently model seqlocks, if that's what you're asking;
c.f., tools/memory-model/linux-kernel.def for a list of the currently
supported synchronization primitives.

LKMM has also no notion of "data race", it insists that the code must
contain no unmarked accesses; we have been discussing such extensions
since at least Dec'17 (we're not quite there!, as mentioned by Paul).

My opinion is that ignoring all accesses within a given read section
_can_ lead to false negatives (in every possible definition of "data
race" and "read sections" I can think of at the moment ;D):

	P0				P1
	read_seqbegin()			x = 1;
	r0 = x;
	read_seqretry() // =0

ought to be "racy"..., right?  (I didn't audit all the callsites for
read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
pattern ;D ... "legacy", as you recalled).

  Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 12:30                 ` Andrea Parri
@ 2019-01-10 12:38                   ` Dmitry Vyukov
  2019-01-10 12:46                     ` Andrea Parri
  2019-01-10 14:50                     ` Paul E. McKenney
  0 siblings, 2 replies; 28+ messages in thread
From: Dmitry Vyukov @ 2019-01-10 12:38 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Paul E. McKenney, Anatol Pomozov, Florian Westphal, LKML,
	Andrey Konovalov, Alan Stern, Luc Maranget, Will Deacon,
	Peter Zijlstra

On Thu, Jan 10, 2019 at 1:30 PM Andrea Parri
<andrea.parri@amarulasolutions.com> wrote:
>
> > For seqcounts we currently simply ignore all accesses within the read
> > section (thus the requirement to dynamically track read sections).
> > What does LKMM say about seqlocks?
>
> LKMM does not currently model seqlocks, if that's what you're asking;
> c.f., tools/memory-model/linux-kernel.def for a list of the currently
> supported synchronization primitives.
>
> LKMM has also no notion of "data race", it insists that the code must
> contain no unmarked accesses; we have been discussing such extensions
> since at least Dec'17 (we're not quite there!, as mentioned by Paul).

How does it call cases that do contain unmarked accesses then? :)

> My opinion is that ignoring all accesses within a given read section
> _can_ lead to false negatives

Absolutely. But this is a deliberate decision.
For our tools we consider priority 1: no false positives. Period.
Priority 2: also report some true positives in best effort manner.

> (in every possible definition of "data
> race" and "read sections" I can think of at the moment ;D):
>
>         P0                              P1
>         read_seqbegin()                 x = 1;
>         r0 = x;
>         read_seqretry() // =0
>
> ought to be "racy"..., right?  (I didn't audit all the callsites for
> read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
> pattern ;D ... "legacy", as you recalled).
>
>   Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-08 22:37 ` Florian Westphal
@ 2019-01-10 12:41   ` Peter Zijlstra
  2019-01-10 12:53     ` Dmitry Vyukov
                       ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Peter Zijlstra @ 2019-01-10 12:41 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Anatol Pomozov, Dmitry Vyukov, paulmck, LKML

On Tue, Jan 08, 2019 at 11:37:46PM +0100, Florian Westphal wrote:
> Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
> > Or maybe xt_replace_table() can be enhanced? When I hear that
> > something waits until an event happens on all CPUs I think about
> > wait_event() function. Would it be better for xt_replace_table() to
> > introduce an atomic counter that is decremented by CPUs, and the main
> > CPU waits until the counter gets zero?
> 
> That would mean placing an additional atomic op into the
> iptables evaluation path (ipt_do_table and friends).
> 

For:

	/*
	 * Ensure contents of newinfo are visible before assigning to
	 * private.
	 */
	smp_wmb();
	table->private = newinfo;

we have:

	smp_store_release(&table->private, newinfo);

But what store does that second smp_wmb() order against? The comment:

	/* make sure all cpus see new ->private value */
	smp_wmb();

makes no sense what so ever, no smp_*() barrier can provide such
guarantees.

> Only alternative I see that might work is synchronize_rcu (the
> _do_table functions are called with rcu read lock held).
> 
> I guess current scheme is cheaper though.

Is performance a concern in this path? There is no comment justifying
this 'creative' stuff.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-08 19:33 seqcount usage in xt_replace_table() Anatol Pomozov
  2019-01-08 22:37 ` Florian Westphal
  2019-01-09  0:02 ` Andrea Parri
@ 2019-01-10 12:44 ` Peter Zijlstra
  2019-01-10 12:54   ` Dmitry Vyukov
  2 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2019-01-10 12:44 UTC (permalink / raw)
  To: Anatol Pomozov; +Cc: fw, Dmitry Vyukov, paulmck, LKML

On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> Hello folks,
> 
> A bit of context what I am doing. I am trying to port KTSAN (Kernel
> Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> and makes sure it is accessed in a thread-safe manner.
> 
> seqlock is a synchronization primitive used by Linux kernel. KTSAN
> annotates read_seqbegin()/read_seqretry() and tracks what data been
> accessed in its critical section.
> 
> During KTSAN port I found and interesting seqcount usage introduced in
> commit 80055dab5de0c8677bc148c4717ddfc753a9148e
> 
> If I read this commit correctly xt_replace_table() does not use
> seqlock in a canonical way to specify a critical section. Instead the
> code reads the counter and waits until it gets to a specific value.

(gets away from)

> Now I want KTSAN to play with this code nicely. I need to tell KTSAN
> something like "this raw_read_seqcount() does not start a critical
> section, just ignore it". So temporary I introduced
> raw_read_seqcount_nocritical() function that is ignored by KTSAN. Is
> it a good solution?

This code is special enough to just do: READ_ONCE(->sequence) and be
done with it. It doesn't need the smp_rmb() or anything else.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 12:38                   ` Dmitry Vyukov
@ 2019-01-10 12:46                     ` Andrea Parri
  2019-01-10 13:25                       ` Dmitry Vyukov
  2019-01-10 14:50                     ` Paul E. McKenney
  1 sibling, 1 reply; 28+ messages in thread
From: Andrea Parri @ 2019-01-10 12:46 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Paul E. McKenney, Anatol Pomozov, Florian Westphal, LKML,
	Andrey Konovalov, Alan Stern, Luc Maranget, Will Deacon,
	Peter Zijlstra

On Thu, Jan 10, 2019 at 01:38:11PM +0100, Dmitry Vyukov wrote:
> On Thu, Jan 10, 2019 at 1:30 PM Andrea Parri
> <andrea.parri@amarulasolutions.com> wrote:
> >
> > > For seqcounts we currently simply ignore all accesses within the read
> > > section (thus the requirement to dynamically track read sections).
> > > What does LKMM say about seqlocks?
> >
> > LKMM does not currently model seqlocks, if that's what you're asking;
> > c.f., tools/memory-model/linux-kernel.def for a list of the currently
> > supported synchronization primitives.
> >
> > LKMM has also no notion of "data race", it insists that the code must
> > contain no unmarked accesses; we have been discussing such extensions
> > since at least Dec'17 (we're not quite there!, as mentioned by Paul).
> 
> How does it call cases that do contain unmarked accesses then? :)

"work-in-progress" ;), or "limitation" (see tools/memory-model/README)


> 
> > My opinion is that ignoring all accesses within a given read section
> > _can_ lead to false negatives
> 
> Absolutely. But this is a deliberate decision.
> For our tools we consider priority 1: no false positives. Period.
> Priority 2: also report some true positives in best effort manner.

This sound reasonable to me.  But please don't overlook the fact that
to be able to talk about "false positive" and "false negative" (for a
data race detector) we need to agree about "what a data race is".

(The hope, of course, is that the LKMM will have a say soon here ...)

  Andrea


> 
> > (in every possible definition of "data
> > race" and "read sections" I can think of at the moment ;D):
> >
> >         P0                              P1
> >         read_seqbegin()                 x = 1;
> >         r0 = x;
> >         read_seqretry() // =0
> >
> > ought to be "racy"..., right?  (I didn't audit all the callsites for
> > read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
> > pattern ;D ... "legacy", as you recalled).
> >
> >   Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 12:41   ` Peter Zijlstra
@ 2019-01-10 12:53     ` Dmitry Vyukov
  2019-01-10 20:18       ` Peter Zijlstra
  2019-01-10 14:48     ` Florian Westphal
  2019-01-10 14:52     ` Paul E. McKenney
  2 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2019-01-10 12:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Florian Westphal, Anatol Pomozov, Paul E. McKenney, LKML

On Thu, Jan 10, 2019 at 1:41 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 08, 2019 at 11:37:46PM +0100, Florian Westphal wrote:
> > Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
> > > Or maybe xt_replace_table() can be enhanced? When I hear that
> > > something waits until an event happens on all CPUs I think about
> > > wait_event() function. Would it be better for xt_replace_table() to
> > > introduce an atomic counter that is decremented by CPUs, and the main
> > > CPU waits until the counter gets zero?
> >
> > That would mean placing an additional atomic op into the
> > iptables evaluation path (ipt_do_table and friends).
> >
>
> For:
>
>         /*
>          * Ensure contents of newinfo are visible before assigning to
>          * private.
>          */
>         smp_wmb();
>         table->private = newinfo;
>
> we have:
>
>         smp_store_release(&table->private, newinfo);
>
> But what store does that second smp_wmb() order against? The comment:
>
>         /* make sure all cpus see new ->private value */
>         smp_wmb();
>
> makes no sense what so ever, no smp_*() barrier can provide such
> guarantees.

Do we want WRITE_ONCE here then?
We want it to be compiled to an actual memory access and then it's up
to hardware to make it visible to other CPUs.
smp_wmb should most likely have this as a side effect too, but
somewhat indirect.
Also race-detector-friendly.


> > Only alternative I see that might work is synchronize_rcu (the
> > _do_table functions are called with rcu read lock held).
> >
> > I guess current scheme is cheaper though.
>
> Is performance a concern in this path? There is no comment justifying
> this 'creative' stuff.
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 12:44 ` Peter Zijlstra
@ 2019-01-10 12:54   ` Dmitry Vyukov
  0 siblings, 0 replies; 28+ messages in thread
From: Dmitry Vyukov @ 2019-01-10 12:54 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Anatol Pomozov, Florian Westphal, Paul E. McKenney, LKML

On Thu, Jan 10, 2019 at 1:44 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > Hello folks,
> >
> > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > and makes sure it is accessed in a thread-safe manner.
> >
> > seqlock is a synchronization primitive used by Linux kernel. KTSAN
> > annotates read_seqbegin()/read_seqretry() and tracks what data been
> > accessed in its critical section.
> >
> > During KTSAN port I found and interesting seqcount usage introduced in
> > commit 80055dab5de0c8677bc148c4717ddfc753a9148e
> >
> > If I read this commit correctly xt_replace_table() does not use
> > seqlock in a canonical way to specify a critical section. Instead the
> > code reads the counter and waits until it gets to a specific value.
>
> (gets away from)
>
> > Now I want KTSAN to play with this code nicely. I need to tell KTSAN
> > something like "this raw_read_seqcount() does not start a critical
> > section, just ignore it". So temporary I introduced
> > raw_read_seqcount_nocritical() function that is ignored by KTSAN. Is
> > it a good solution?
>
> This code is special enough to just do: READ_ONCE(->sequence) and be
> done with it. It doesn't need the smp_rmb() or anything else.

Sounds good to me.
From KTSAN perspective it then should work without any additional
dance, it's always good when code works as-is.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 12:46                     ` Andrea Parri
@ 2019-01-10 13:25                       ` Dmitry Vyukov
  0 siblings, 0 replies; 28+ messages in thread
From: Dmitry Vyukov @ 2019-01-10 13:25 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Paul E. McKenney, Anatol Pomozov, Florian Westphal, LKML,
	Andrey Konovalov, Alan Stern, Luc Maranget, Will Deacon,
	Peter Zijlstra

On Thu, Jan 10, 2019 at 1:47 PM Andrea Parri
<andrea.parri@amarulasolutions.com> wrote:
>
> On Thu, Jan 10, 2019 at 01:38:11PM +0100, Dmitry Vyukov wrote:
> > On Thu, Jan 10, 2019 at 1:30 PM Andrea Parri
> > <andrea.parri@amarulasolutions.com> wrote:
> > >
> > > > For seqcounts we currently simply ignore all accesses within the read
> > > > section (thus the requirement to dynamically track read sections).
> > > > What does LKMM say about seqlocks?
> > >
> > > LKMM does not currently model seqlocks, if that's what you're asking;
> > > c.f., tools/memory-model/linux-kernel.def for a list of the currently
> > > supported synchronization primitives.
> > >
> > > LKMM has also no notion of "data race", it insists that the code must
> > > contain no unmarked accesses; we have been discussing such extensions
> > > since at least Dec'17 (we're not quite there!, as mentioned by Paul).
> >
> > How does it call cases that do contain unmarked accesses then? :)
>
> "work-in-progress" ;), or "limitation" (see tools/memory-model/README)

Let's call it /data race/ interim then :)
Which also have literally undefined behavior in LKMM. Which now
precisely matches the implementation language (C) definitions. Which
is nice.

> > > My opinion is that ignoring all accesses within a given read section
> > > _can_ lead to false negatives
> >
> > Absolutely. But this is a deliberate decision.
> > For our tools we consider priority 1: no false positives. Period.
> > Priority 2: also report some true positives in best effort manner.
>
> This sound reasonable to me.  But please don't overlook the fact that
> to be able to talk about "false positive" and "false negative" (for a
> data race detector) we need to agree about "what a data race is".

Having a formal model would be undoubtedly good.
But in practice things are much simpler. The complex cases that
majority of LKMM deals with are <<1% of kernel concurrency. The
majority of kernel cases are "no concurrent accesses at all", "always
protected by a mutex", "passed as argument to a new thread", "the
canonical store-release/load-acquire synchronization". For these I
hope there is no controversy across C, POSIX, gcc, clang, kernel.
Handling these cases in a race detector brings 99.9% of benefit. And
for more complex cases (like seqlock) we can always approximate as "no
races there" which inevitably satisfy our priorities (if you report
nothing, you don't report false positives).

But I am much more concerned about actual kernel code and behavior wrt
a memory model. We are talking about interaction between LKMM <->
KTSAN. When a way more important question is LKMM <-> actual kernel
behavior. KTSAN is really a secondary thing in this picture. So if
anything needs a memory model, or needs to be blocked on a memory
model, that's writing kernel code ;)


> (The hope, of course, is that the LKMM will have a say soon here ...)
>
>   Andrea
>
>
> >
> > > (in every possible definition of "data
> > > race" and "read sections" I can think of at the moment ;D):
> > >
> > >         P0                              P1
> > >         read_seqbegin()                 x = 1;
> > >         r0 = x;
> > >         read_seqretry() // =0
> > >
> > > ought to be "racy"..., right?  (I didn't audit all the callsites for
> > > read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
> > > pattern ;D ... "legacy", as you recalled).
> > >
> > >   Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 12:41   ` Peter Zijlstra
  2019-01-10 12:53     ` Dmitry Vyukov
@ 2019-01-10 14:48     ` Florian Westphal
  2019-01-10 20:20       ` Peter Zijlstra
  2019-01-10 20:25       ` Peter Zijlstra
  2019-01-10 14:52     ` Paul E. McKenney
  2 siblings, 2 replies; 28+ messages in thread
From: Florian Westphal @ 2019-01-10 14:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Florian Westphal, Anatol Pomozov, Dmitry Vyukov, paulmck, LKML

Peter Zijlstra <peterz@infradead.org> wrote:
> 	/*
> 	 * Ensure contents of newinfo are visible before assigning to
> 	 * private.
> 	 */
> 	smp_wmb();
> 	table->private = newinfo;
> 
> we have:
> 
> 	smp_store_release(&table->private, newinfo);
> 
> But what store does that second smp_wmb() order against? The comment:
> 
> 	/* make sure all cpus see new ->private value */
> 	smp_wmb();
> 
> makes no sense what so ever, no smp_*() barrier can provide such
> guarantees.

IIRC I added this at the request of a reviewer of an earlier iteration
of that patch.

IIRC the concern was that compiler/hw could re-order

smb_wmb();
table->private = newinfo;
/* wait until all cpus are done with old table */

 into:

smb_wmb();
/* wait until all cpus are done with old table */
...
table->private = newinfo;

and that (obviously) makes the wait-loop useless.

> > Only alternative I see that might work is synchronize_rcu (the
> > _do_table functions are called with rcu read lock held).
> > 
> > I guess current scheme is cheaper though.
> 
> Is performance a concern in this path? There is no comment justifying
> this 'creative' stuff.

We have to wait until all cpus are done with current iptables ruleset.

Before this 'creative' change, this relied on get_counters
synchronization.  And that caused wait times of 30 seconds or more on
busy systems.

I have no objections swapping this with a synchronize_rcu() if that
makes it easier.

(synchronize_rcu might be confusing though, as we don't use rcu
 for table->private), and one 'has to know' that all the netfilter
 hooks, including the iptables eval loop, run with rcu_read_lock
 held).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 12:38                   ` Dmitry Vyukov
  2019-01-10 12:46                     ` Andrea Parri
@ 2019-01-10 14:50                     ` Paul E. McKenney
  1 sibling, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2019-01-10 14:50 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrea Parri, Anatol Pomozov, Florian Westphal, LKML,
	Andrey Konovalov, Alan Stern, Luc Maranget, Will Deacon,
	Peter Zijlstra

On Thu, Jan 10, 2019 at 01:38:11PM +0100, Dmitry Vyukov wrote:
> On Thu, Jan 10, 2019 at 1:30 PM Andrea Parri
> <andrea.parri@amarulasolutions.com> wrote:
> >
> > > For seqcounts we currently simply ignore all accesses within the read
> > > section (thus the requirement to dynamically track read sections).
> > > What does LKMM say about seqlocks?
> >
> > LKMM does not currently model seqlocks, if that's what you're asking;
> > c.f., tools/memory-model/linux-kernel.def for a list of the currently
> > supported synchronization primitives.
> >
> > LKMM has also no notion of "data race", it insists that the code must
> > contain no unmarked accesses; we have been discussing such extensions
> > since at least Dec'17 (we're not quite there!, as mentioned by Paul).
> 
> How does it call cases that do contain unmarked accesses then? :)
> 
> > My opinion is that ignoring all accesses within a given read section
> > _can_ lead to false negatives
> 
> Absolutely. But this is a deliberate decision.
> For our tools we consider priority 1: no false positives. Period.
> Priority 2: also report some true positives in best effort manner.
> 
> > (in every possible definition of "data
> > race" and "read sections" I can think of at the moment ;D):
> >
> >         P0                              P1
> >         read_seqbegin()                 x = 1;
> >         r0 = x;
> >         read_seqretry() // =0
> >
> > ought to be "racy"..., right?  (I didn't audit all the callsites for
> > read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
> > pattern ;D ... "legacy", as you recalled).

One approach would be to forgive data races in the seqlock read-side
critical section only if:

o	There was a later matching read_seqretry() that returned true, and

o	There were no dereferences of any data-racy load.  (Yeah, this
	one should be good clean fun to model!)

Do people nest read_seqbegin(), and if so, what does that mean?  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 12:41   ` Peter Zijlstra
  2019-01-10 12:53     ` Dmitry Vyukov
  2019-01-10 14:48     ` Florian Westphal
@ 2019-01-10 14:52     ` Paul E. McKenney
  2 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2019-01-10 14:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Florian Westphal, Anatol Pomozov, Dmitry Vyukov, LKML

On Thu, Jan 10, 2019 at 01:41:23PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 08, 2019 at 11:37:46PM +0100, Florian Westphal wrote:
> > Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
> > > Or maybe xt_replace_table() can be enhanced? When I hear that
> > > something waits until an event happens on all CPUs I think about
> > > wait_event() function. Would it be better for xt_replace_table() to
> > > introduce an atomic counter that is decremented by CPUs, and the main
> > > CPU waits until the counter gets zero?
> > 
> > That would mean placing an additional atomic op into the
> > iptables evaluation path (ipt_do_table and friends).
> > 
> 
> For:
> 
> 	/*
> 	 * Ensure contents of newinfo are visible before assigning to
> 	 * private.
> 	 */
> 	smp_wmb();
> 	table->private = newinfo;
> 
> we have:
> 
> 	smp_store_release(&table->private, newinfo);
> 
> But what store does that second smp_wmb() order against? The comment:
> 
> 	/* make sure all cpus see new ->private value */
> 	smp_wmb();
> 
> makes no sense what so ever, no smp_*() barrier can provide such
> guarantees.

Agreed, this would require something like synchronize_rcu() or some
sort of IPI-based sys_membarrier() lookalike.

							Thanx, Paul

> > Only alternative I see that might work is synchronize_rcu (the
> > _do_table functions are called with rcu read lock held).
> > 
> > I guess current scheme is cheaper though.
> 
> Is performance a concern in this path? There is no comment justifying
> this 'creative' stuff.
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 12:53     ` Dmitry Vyukov
@ 2019-01-10 20:18       ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2019-01-10 20:18 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Florian Westphal, Anatol Pomozov, Paul E. McKenney, LKML

On Thu, Jan 10, 2019 at 01:53:28PM +0100, Dmitry Vyukov wrote:
> On Thu, Jan 10, 2019 at 1:41 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Jan 08, 2019 at 11:37:46PM +0100, Florian Westphal wrote:
> > > Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
> > > > Or maybe xt_replace_table() can be enhanced? When I hear that
> > > > something waits until an event happens on all CPUs I think about
> > > > wait_event() function. Would it be better for xt_replace_table() to
> > > > introduce an atomic counter that is decremented by CPUs, and the main
> > > > CPU waits until the counter gets zero?
> > >
> > > That would mean placing an additional atomic op into the
> > > iptables evaluation path (ipt_do_table and friends).
> > >
> >
> > For:
> >
> >         /*
> >          * Ensure contents of newinfo are visible before assigning to
> >          * private.
> >          */
> >         smp_wmb();
> >         table->private = newinfo;
> >
> > we have:
> >
> >         smp_store_release(&table->private, newinfo);
> >
> > But what store does that second smp_wmb() order against? The comment:
> >
> >         /* make sure all cpus see new ->private value */
> >         smp_wmb();
> >
> > makes no sense what so ever, no smp_*() barrier can provide such
> > guarantees.
> 
> Do we want WRITE_ONCE here then?

The smp_store_release() already implies WRITE_ONCE().

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 14:48     ` Florian Westphal
@ 2019-01-10 20:20       ` Peter Zijlstra
  2019-01-10 20:25       ` Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2019-01-10 20:20 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Anatol Pomozov, Dmitry Vyukov, paulmck, LKML

On Thu, Jan 10, 2019 at 03:48:12PM +0100, Florian Westphal wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> > 	/*
> > 	 * Ensure contents of newinfo are visible before assigning to
> > 	 * private.
> > 	 */
> > 	smp_wmb();
> > 	table->private = newinfo;
> > 
> > we have:
> > 
> > 	smp_store_release(&table->private, newinfo);
> > 
> > But what store does that second smp_wmb() order against? The comment:
> > 
> > 	/* make sure all cpus see new ->private value */
> > 	smp_wmb();
> > 
> > makes no sense what so ever, no smp_*() barrier can provide such
> > guarantees.
> 
> IIRC I added this at the request of a reviewer of an earlier iteration
> of that patch.
> 
> IIRC the concern was that compiler/hw could re-order
> 
> smb_wmb();
> table->private = newinfo;
> /* wait until all cpus are done with old table */
> 
>  into:
> 
> smb_wmb();
> /* wait until all cpus are done with old table */
> ...
> table->private = newinfo;
> 
> and that (obviously) makes the wait-loop useless.

The thing is, the 'wait for all cpus' thing is pure loads, not stores,
so smp_wmb() is a complete NOP there.

If you want to ensure those loads happen after that store (which does
indeed seem like a sensible thing), you're going to have to use
smp_mb().

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 14:48     ` Florian Westphal
  2019-01-10 20:20       ` Peter Zijlstra
@ 2019-01-10 20:25       ` Peter Zijlstra
  2019-01-10 22:29         ` Florian Westphal
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2019-01-10 20:25 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Anatol Pomozov, Dmitry Vyukov, paulmck, LKML

On Thu, Jan 10, 2019 at 03:48:12PM +0100, Florian Westphal wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> > Is performance a concern in this path? There is no comment justifying
> > this 'creative' stuff.
> 
> We have to wait until all cpus are done with current iptables ruleset.
> 
> Before this 'creative' change, this relied on get_counters
> synchronization.  And that caused wait times of 30 seconds or more on
> busy systems.
> 
> I have no objections swapping this with a synchronize_rcu() if that
> makes it easier.

Would using synchronize_rcu() not also mean you can get rid of that
xt_write_recseq*() stuff entirely?

Anyway, synchronize_rcu() can also take a little while, but I don't
think anywere near 30 seconds.

> (synchronize_rcu might be confusing though, as we don't use rcu
>  for table->private), and one 'has to know' that all the netfilter
>  hooks, including the iptables eval loop, run with rcu_read_lock
>  held).

A comment can help there, right?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 20:25       ` Peter Zijlstra
@ 2019-01-10 22:29         ` Florian Westphal
  2019-01-11  8:34           ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Westphal @ 2019-01-10 22:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Florian Westphal, Anatol Pomozov, Dmitry Vyukov, paulmck, LKML

Peter Zijlstra <peterz@infradead.org> wrote:
> Would using synchronize_rcu() not also mean you can get rid of that
> xt_write_recseq*() stuff entirely?

No, because those are used to synchronize with cpus that read
the ruleset counters, see

net/ipv4/netfilter/ip_tables.c:get_counters().

> Anyway, synchronize_rcu() can also take a little while, but I don't
> think anywere near 30 seconds.

Ok, I think in that case it would be best to just replace the
recseq value sampling with smp_mb + synchronize_rcu plus a comment
that explains why its done.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-10 22:29         ` Florian Westphal
@ 2019-01-11  8:34           ` Peter Zijlstra
  2019-01-11 14:08             ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2019-01-11  8:34 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Anatol Pomozov, Dmitry Vyukov, paulmck, LKML

On Thu, Jan 10, 2019 at 11:29:20PM +0100, Florian Westphal wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> > Would using synchronize_rcu() not also mean you can get rid of that
> > xt_write_recseq*() stuff entirely?
> 
> No, because those are used to synchronize with cpus that read
> the ruleset counters, see
> 
> net/ipv4/netfilter/ip_tables.c:get_counters().

Ah, bummer :/

> > Anyway, synchronize_rcu() can also take a little while, but I don't
> > think anywere near 30 seconds.
> 
> Ok, I think in that case it would be best to just replace the
> recseq value sampling with smp_mb + synchronize_rcu plus a comment
> that explains why its done.

synchronize_rcu() implies smp_mb() on all CPUs.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: seqcount usage in xt_replace_table()
  2019-01-11  8:34           ` Peter Zijlstra
@ 2019-01-11 14:08             ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2019-01-11 14:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Florian Westphal, Anatol Pomozov, Dmitry Vyukov, LKML

On Fri, Jan 11, 2019 at 09:34:11AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 10, 2019 at 11:29:20PM +0100, Florian Westphal wrote:
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > > Would using synchronize_rcu() not also mean you can get rid of that
> > > xt_write_recseq*() stuff entirely?
> > 
> > No, because those are used to synchronize with cpus that read
> > the ruleset counters, see
> > 
> > net/ipv4/netfilter/ip_tables.c:get_counters().
> 
> Ah, bummer :/
> 
> > > Anyway, synchronize_rcu() can also take a little while, but I don't
> > > think anywere near 30 seconds.
> > 
> > Ok, I think in that case it would be best to just replace the
> > recseq value sampling with smp_mb + synchronize_rcu plus a comment
> > that explains why its done.
> 
> synchronize_rcu() implies smp_mb() on all CPUs.

Yes, it does, but in the case of idle CPUs, the smp_mb() calls are only
required to follow any pre-existing RCU read-side critical section on
the one hand an precede any RCU read-side critical section completing
after the synchronize_rcu() on the other.

To do more would mean waking up idle CPUs, which does not make the
battery-powered guys happy.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-01-11 14:08 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-08 19:33 seqcount usage in xt_replace_table() Anatol Pomozov
2019-01-08 22:37 ` Florian Westphal
2019-01-10 12:41   ` Peter Zijlstra
2019-01-10 12:53     ` Dmitry Vyukov
2019-01-10 20:18       ` Peter Zijlstra
2019-01-10 14:48     ` Florian Westphal
2019-01-10 20:20       ` Peter Zijlstra
2019-01-10 20:25       ` Peter Zijlstra
2019-01-10 22:29         ` Florian Westphal
2019-01-11  8:34           ` Peter Zijlstra
2019-01-11 14:08             ` Paul E. McKenney
2019-01-10 14:52     ` Paul E. McKenney
2019-01-09  0:02 ` Andrea Parri
2019-01-09  0:36   ` Anatol Pomozov
2019-01-09  5:35     ` Dmitry Vyukov
2019-01-09 11:24     ` Andrea Parri
2019-01-09 11:55       ` Dmitry Vyukov
2019-01-09 12:11         ` Andrea Parri
2019-01-09 12:29           ` Dmitry Vyukov
2019-01-09 17:10             ` Paul E. McKenney
2019-01-10  8:49               ` Dmitry Vyukov
2019-01-10 12:30                 ` Andrea Parri
2019-01-10 12:38                   ` Dmitry Vyukov
2019-01-10 12:46                     ` Andrea Parri
2019-01-10 13:25                       ` Dmitry Vyukov
2019-01-10 14:50                     ` Paul E. McKenney
2019-01-10 12:44 ` Peter Zijlstra
2019-01-10 12:54   ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).