All of lore.kernel.org
 help / color / mirror / Atom feed
* [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up
@ 2023-03-22  8:02 Ondřej Surý via lttng-dev
  2023-03-22 11:01 ` Ondřej Surý via lttng-dev
  2023-03-22 14:41 ` Mathieu Desnoyers via lttng-dev
  0 siblings, 2 replies; 4+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-22  8:02 UTC (permalink / raw)
  To: lttng-dev

Hi,

this happens with all the patches fully applied and doesn't seem to be caused by anything I am doing :)

WARNING: ThreadSanitizer: data race (pid=3995296)
  Write of size 8 at 0x7fb51e5fd048 by thread T296:
    #0 __tsan_memset <null> (badcache_test+0x49257d) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7)
    #1 urcu_mb_synchronize_rcu /home/ondrej/Projects/userspace-rcu/src/urcu.c:412:2 (liburcu-mb.so.8+0x35e0) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #2 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 (liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)

  Previous atomic write of size 4 at 0x7fb51e5fd048 by thread T295:
    #0 urcu_adaptative_wake_up /home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:138:2 (liburcu-mb.so.8+0x8cb9) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #1 urcu_wake_all_waiters /home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:214:3 (liburcu-mb.so.8+0x41de) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #2 urcu_mb_synchronize_rcu /home/ondrej/Projects/userspace-rcu/src/urcu.c:522:2 (liburcu-mb.so.8+0x3766) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #3 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 (liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)

  Location is stack of thread T296.

  Thread T296 (tid=3995606, running) created by thread T272 at:
    #0 pthread_create <null> (badcache_test+0x44d5fb) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7)
    #1 call_rcu_data_init /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 (liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #2 __create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 (liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #3 urcu_mb_create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 (liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 (libisc-9.19.12-dev.so+0x7a0a0) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
    #5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 (libisc-9.19.12-dev.so+0x77890) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
    #6 isc__trampoline_run /home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 (libisc-9.19.12-dev.so+0xaa6be) (BuildId: a33cd26e483b73684928b4782627f1278c001605)

  Thread T295 (tid=3995605, running) created by thread T261 at:
    #0 pthread_create <null> (badcache_test+0x44d5fb) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7)
    #1 call_rcu_data_init /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 (liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #2 __create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 (liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #3 urcu_mb_create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 (liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
    #4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 (libisc-9.19.12-dev.so+0x7a0a0) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
    #5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 (libisc-9.19.12-dev.so+0x77890) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
    #6 isc__trampoline_run /home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 (libisc-9.19.12-dev.so+0xaa6be) (BuildId: a33cd26e483b73684928b4782627f1278c001605)

SUMMARY: ThreadSanitizer: data race (/home/ondrej/Projects/bind9/tests/dns/.libs/badcache_test+0x49257d) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7) in __tsan_memset

This is between:
- DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
and
- uatomic_or(&wait->state, URCU_WAIT_TEARDOWN);

That's pretty much weird because the "Write" happens on stack local variable,
while the "Previous write" happens after futex, which lead me to the fact that
ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:

https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4

Oh boy...

Ondrej
--
Ondřej Surý (He/Him)
ondrej@sury.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up
  2023-03-22  8:02 [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up Ondřej Surý via lttng-dev
@ 2023-03-22 11:01 ` Ondřej Surý via lttng-dev
  2023-03-22 14:57   ` Mathieu Desnoyers via lttng-dev
  2023-03-22 14:41 ` Mathieu Desnoyers via lttng-dev
  1 sibling, 1 reply; 4+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-22 11:01 UTC (permalink / raw)
  To: Ondřej Surý; +Cc: lttng-dev

> On 22. 3. 2023, at 9:02, Ondřej Surý via lttng-dev <lttng-dev@lists.lttng.org> wrote:
> 
> That's pretty much weird because the "Write" happens on stack local variable,
> while the "Previous write" happens after futex, which lead me to the fact that
> ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:
> 
> https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4

FTR neither annotating the futex with __tsan_acquire(addr) and __tsan_release(addr)
nor falling back to compat_futex_async() for ThreadSanitizer has helped.

It seems to me that TSAN still doesn't understand the synchronization between
RCU read-critical sections and call_rcu/synchronize_rcu() as I am also getting
following reports:

  Write of size 8 at 0x7b54000009c0 by thread T102:
    #0 __tsan_memset <null> (badcache_test+0x49257d) (BuildId: a7c1595d61e3ee411276cf89a536a8daefa959a3)
    #1 mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:324:3 (libisc-9.19.12-dev.so+0x7d136) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
    #2 isc__mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:684:2 (libisc-9.19.12-dev.so+0x7e0c3) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
    #3 bcentry_destroy_rcu /home/ondrej/Projects/bind9/lib/dns/badcache.c:163:2 (libdns-9.19.12-dev.so+0x4e071) (BuildId: 8a550b795003cd1075ff29590734c806d84e76e6)
    #4 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:389:5 (liburcu-mb.so.8+0x9d6b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)

  Previous atomic write of size 8 at 0x7b54000009c0 by main thread (mutexes: write M0):
    #0 ___cds_wfcq_append /home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:202:2 (liburcu-mb.so.8+0xa8ae) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
    #1 _cds_wfcq_enqueue /home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:223:9 (liburcu-mb.so.8+0xac09) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
    #2 _call_rcu /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:719:2 (liburcu-mb.so.8+0x604f) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
    #3 urcu_mb_barrier /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:932:3 (liburcu-mb.so.8+0x4d1b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
    #4 badcache_flush /home/ondrej/Projects/bind9/lib/dns/badcache.c:329:2 (libdns-9.19.12-dev.so+0x4d8b3) (BuildId: 8a550b795003cd1075ff29590734c806d84e76e6)
    [...]

E.g. ThreadSanitizer reports a race between a place where bcentry->rcu_head is added to call_rcu() queue
and when call_rcu callbacks are called.  Annotating the bcentry with acquire/release here helps with this
particular data race, but it does not feel right to me to add annotation at this level.

The code is not very complicated there:

static void
bcentry_destroy_rcu(struct rcu_head *rcu_head) {
        dns_bcentry_t *bad = caa_container_of(rcu_head, dns_bcentry_t,
                                              rcu_head);
        /* __tsan_release(bad); <-- this helps */
        dns_badcache_t *bc = bad->bc;

        isc_mem_put(bc->mctx, bad, sizeof(*bad));

        dns_badcache_detach(&bc);
}

static void
bcentry_evict(struct cds_lfht *ht, dns_bcentry_t *bad) {
        /* There can be multiple deleters now */
        if (cds_lfht_del(ht, &bad->ht_node) == 0) {
                /* __tsan_acquire(bad); <- this helps */
                call_rcu(&bad->rcu_head, bcentry_destroy_rcu);
        }
}

static void
badcache_flush(dns_badcache_t *bc, struct cds_lfht *ht) {
        struct cds_lfht *oldht = rcu_xchg_pointer(&bc->ht, ht);

        synchronize_rcu();

        rcu_read_lock();
        dns_bcentry_t *bad = NULL;
        struct cds_lfht_iter iter;
        cds_lfht_for_each_entry (oldht, &iter, bad, ht_node) {
                bcentry_evict(oldht, bad);
        }
        rcu_read_unlock();
        rcu_barrier();
        RUNTIME_CHECK(cds_lfht_destroy(oldht, NULL) == 0);
}

Any ideas?

Ondrej
--
Ondřej Surý (He/Him)
ondrej@sury.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up
  2023-03-22  8:02 [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up Ondřej Surý via lttng-dev
  2023-03-22 11:01 ` Ondřej Surý via lttng-dev
@ 2023-03-22 14:41 ` Mathieu Desnoyers via lttng-dev
  1 sibling, 0 replies; 4+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-22 14:41 UTC (permalink / raw)
  To: Ondřej Surý, lttng-dev, paulmck

On 2023-03-22 04:02, Ondřej Surý via lttng-dev wrote:
> Hi,
> 
> this happens with all the patches fully applied and doesn't seem to be caused by anything I am doing :)
> 
> WARNING: ThreadSanitizer: data race (pid=3995296)
>    Write of size 8 at 0x7fb51e5fd048 by thread T296:
>      #0 __tsan_memset <null> (badcache_test+0x49257d) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7)
>      #1 urcu_mb_synchronize_rcu /home/ondrej/Projects/userspace-rcu/src/urcu.c:412:2 (liburcu-mb.so.8+0x35e0) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #2 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 (liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
> 
>    Previous atomic write of size 4 at 0x7fb51e5fd048 by thread T295:
>      #0 urcu_adaptative_wake_up /home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:138:2 (liburcu-mb.so.8+0x8cb9) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #1 urcu_wake_all_waiters /home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:214:3 (liburcu-mb.so.8+0x41de) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #2 urcu_mb_synchronize_rcu /home/ondrej/Projects/userspace-rcu/src/urcu.c:522:2 (liburcu-mb.so.8+0x3766) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #3 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 (liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
> 
>    Location is stack of thread T296.
> 
>    Thread T296 (tid=3995606, running) created by thread T272 at:
>      #0 pthread_create <null> (badcache_test+0x44d5fb) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7)
>      #1 call_rcu_data_init /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 (liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #2 __create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 (liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #3 urcu_mb_create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 (liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 (libisc-9.19.12-dev.so+0x7a0a0) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 (libisc-9.19.12-dev.so+0x77890) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #6 isc__trampoline_run /home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 (libisc-9.19.12-dev.so+0xaa6be) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
> 
>    Thread T295 (tid=3995605, running) created by thread T261 at:
>      #0 pthread_create <null> (badcache_test+0x44d5fb) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7)
>      #1 call_rcu_data_init /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 (liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #2 __create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 (liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #3 urcu_mb_create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 (liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 (libisc-9.19.12-dev.so+0x7a0a0) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 (libisc-9.19.12-dev.so+0x77890) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #6 isc__trampoline_run /home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 (libisc-9.19.12-dev.so+0xaa6be) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
> 
> SUMMARY: ThreadSanitizer: data race (/home/ondrej/Projects/bind9/tests/dns/.libs/badcache_test+0x49257d) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7) in __tsan_memset
> 
> This is between:
> - DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
> and
> - uatomic_or(&wait->state, URCU_WAIT_TEARDOWN);
> 
> That's pretty much weird because the "Write" happens on stack local variable,

Yes, it's the initialization of this "wait state" variable, which is located on the stack of the thread being blocked.

At initialization time, there are no concurrent accesses to this variable.

> while the "Previous write" happens after futex,

That "previous" write was clearly for an unrelated prior blocking of the same thread. Basically what is missing here is information about the lifetime of this urcu_wait_node object. After urcu_adaptative_busy_wait() has observed the state (uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN), it knows that the object has no concurrent user anymore, which means the thread can move forward and reclaim this area of the stack for other uses.

So somehow we should add an annotation about the lifetime of this object, which begins with DEFINE_URCU_WAIT_QUEUE() and ends right after "urcu_posix_assert(uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN);".

Thanks,

Mathieu


  which lead me to the fact that
> ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:
> 
> https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4
> 
> Oh boy...
> 
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej@sury.org
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up
  2023-03-22 11:01 ` Ondřej Surý via lttng-dev
@ 2023-03-22 14:57   ` Mathieu Desnoyers via lttng-dev
  0 siblings, 0 replies; 4+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2023-03-22 14:57 UTC (permalink / raw)
  To: Ondřej Surý; +Cc: lttng-dev, paulmck

On 2023-03-22 07:01, Ondřej Surý via lttng-dev wrote:
>> On 22. 3. 2023, at 9:02, Ondřej Surý via lttng-dev <lttng-dev@lists.lttng.org> wrote:
>>
>> That's pretty much weird because the "Write" happens on stack local variable,
>> while the "Previous write" happens after futex, which lead me to the fact that
>> ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:
>>
>> https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4
> 
> FTR neither annotating the futex with __tsan_acquire(addr) and __tsan_release(addr)
> nor falling back to compat_futex_async() for ThreadSanitizer has helped.
> 
> It seems to me that TSAN still doesn't understand the synchronization between
> RCU read-critical sections and call_rcu/synchronize_rcu() as I am also getting
> following reports:
> 
>    Write of size 8 at 0x7b54000009c0 by thread T102:
>      #0 __tsan_memset <null> (badcache_test+0x49257d) (BuildId: a7c1595d61e3ee411276cf89a536a8daefa959a3)
>      #1 mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:324:3 (libisc-9.19.12-dev.so+0x7d136) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #2 isc__mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:684:2 (libisc-9.19.12-dev.so+0x7e0c3) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #3 bcentry_destroy_rcu /home/ondrej/Projects/bind9/lib/dns/badcache.c:163:2 (libdns-9.19.12-dev.so+0x4e071) (BuildId: 8a550b795003cd1075ff29590734c806d84e76e6)
>      #4 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:389:5 (liburcu-mb.so.8+0x9d6b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
> 
>    Previous atomic write of size 8 at 0x7b54000009c0 by main thread (mutexes: write M0):
>      #0 ___cds_wfcq_append /home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:202:2 (liburcu-mb.so.8+0xa8ae) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
>      #1 _cds_wfcq_enqueue /home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:223:9 (liburcu-mb.so.8+0xac09) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
>      #2 _call_rcu /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:719:2 (liburcu-mb.so.8+0x604f) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
>      #3 urcu_mb_barrier /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:932:3 (liburcu-mb.so.8+0x4d1b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
>      #4 badcache_flush /home/ondrej/Projects/bind9/lib/dns/badcache.c:329:2 (libdns-9.19.12-dev.so+0x4d8b3) (BuildId: 8a550b795003cd1075ff29590734c806d84e76e6)
>      [...]
> 
> E.g. ThreadSanitizer reports a race between a place where bcentry->rcu_head is added to call_rcu() queue
> and when call_rcu callbacks are called.  Annotating the bcentry with acquire/release here helps with this
> particular data race, but it does not feel right to me to add annotation at this level.
> 
> The code is not very complicated there:
> 
> static void
> bcentry_destroy_rcu(struct rcu_head *rcu_head) {
>          dns_bcentry_t *bad = caa_container_of(rcu_head, dns_bcentry_t,
>                                                rcu_head);
>          /* __tsan_release(bad); <-- this helps */
>          dns_badcache_t *bc = bad->bc;
> 
>          isc_mem_put(bc->mctx, bad, sizeof(*bad));
> 
>          dns_badcache_detach(&bc);
> }
> 
> static void
> bcentry_evict(struct cds_lfht *ht, dns_bcentry_t *bad) {
>          /* There can be multiple deleters now */
>          if (cds_lfht_del(ht, &bad->ht_node) == 0) {
>                  /* __tsan_acquire(bad); <- this helps */
>                  call_rcu(&bad->rcu_head, bcentry_destroy_rcu);
>          }
> }
> 
> static void
> badcache_flush(dns_badcache_t *bc, struct cds_lfht *ht) {
>          struct cds_lfht *oldht = rcu_xchg_pointer(&bc->ht, ht);
> 
>          synchronize_rcu();
> 
>          rcu_read_lock();
>          dns_bcentry_t *bad = NULL;
>          struct cds_lfht_iter iter;
>          cds_lfht_for_each_entry (oldht, &iter, bad, ht_node) {
>                  bcentry_evict(oldht, bad);
>          }
>          rcu_read_unlock();
>          rcu_barrier();
>          RUNTIME_CHECK(cds_lfht_destroy(oldht, NULL) == 0);
> }
> 
> Any ideas?

I suspect what happens here is that TSAN is considering the

         /*
          * Implicit memory barrier after uatomic_xchg() orders store to
          * q->tail before store to old_tail->next.
          *
          * At this point, dequeuers see a NULL tail->p->next, which
          * indicates that the queue is being appended to. The following
          * store will append "node" to the queue from a dequeuer
          * perspective.
          */
         CMM_STORE_SHARED(old_tail->next, new_head);

within ___cds_wfcq_append() as racy.

This pairs with:

/*
  * Waiting for enqueuer to complete enqueue and return the next node.
  */
static inline struct cds_wfcq_node *
___cds_wfcq_node_sync_next(struct cds_wfcq_node *node, int blocking)
{
         struct cds_wfcq_node *next;
         int attempt = 0;

         /*
          * Adaptative busy-looping waiting for enqueuer to complete enqueue.
          */
         while ((next = CMM_LOAD_SHARED(node->next)) == NULL) {
                 if (___cds_wfcq_busy_wait(&attempt, blocking))
                         return CDS_WFCQ_WOULDBLOCK;
         }

         return next;
}

So the release semantic is provided by the implicit SEQ_CST barrier in:

___cds_wfcq_append():
   old_tail = uatomic_xchg(&tail->p, new_tail); (release)

and the acquire semantic is provided by the implicit SEQ_CST barrier in:

___cds_wfcq_splice():

         /*
          * Memory barrier implied before uatomic_xchg() orders store to
          * src_q->head before store to src_q->tail. This is required by
          * concurrent enqueue on src_q, which exchanges the tail before
          * updating the previous tail's next pointer.
          */
         tail = uatomic_xchg(&src_q_tail->p, &src_q_head->node);

Notice how the release/acquire semantic is provided by tail->p, which is atomically modified _before_ we set the node->next pointer.

With this information, is there a specific annotation that would make sense ?

Thanks,

Mathieu


> 
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej@sury.org
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-22 14:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22  8:02 [lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up Ondřej Surý via lttng-dev
2023-03-22 11:01 ` Ondřej Surý via lttng-dev
2023-03-22 14:57   ` Mathieu Desnoyers via lttng-dev
2023-03-22 14:41 ` Mathieu Desnoyers via lttng-dev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.