* Re: Deadlock in call_rcu_thread when destroy rculfhash node with nested rculfhash
[not found] <CAO6Ho0ea+Mhdoea4b3EL-J4z2wifHPWohq1ps74LQBU+b0-OOQ@mail.gmail.com>
@ 2016-10-19 10:03 ` Evgeniy Ivanov
[not found] ` <CAO6Ho0e27-JTmUPrTRZfr95Lm=ri+mOSRNr7Av5Ma0t1ucuH6g@mail.gmail.com>
1 sibling, 0 replies; 6+ messages in thread
From: Evgeniy Ivanov @ 2016-10-19 10:03 UTC (permalink / raw)
To: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 881 bytes --]
Sorry, found partial answer in docs which state that cds_lfht_destroy
should not be called from a call_rcu thread context. Why does this
limitation exists?
On Wed, Oct 19, 2016 at 12:56 PM, Evgeniy Ivanov <i@eivanov.com> wrote:
> Hi,
>
> Each node of top level rculfhash has nested rculfhash. Some thread clears
> the top level map and then uses rcu_barrier() to wait until everything is
> destroyed (it is done to check leaks). Recently it started to dead lock
> sometimes with following stacks:
>
> Thread1:
>
> __poll
> cds_lfht_destroy <---- nested map
> ...
> free_Node(rcu_head*) <----- node of top level map
> call_rcu_thread
>
> Thread2:
>
> syscall
> rcu_barrier_qsbr
> destroy_all
> main
>
>
> Did call_rcu_thread dead lock with barrier thread? Or is it some kind of
> internal deadlock because of nested maps?
>
>
> --
> Cheers,
> Evgeniy
>
--
Cheers,
Evgeniy
[-- Attachment #1.2: Type: text/html, Size: 1529 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Deadlock in call_rcu_thread when destroy rculfhash node with nested rculfhash
[not found] ` <CAO6Ho0e27-JTmUPrTRZfr95Lm=ri+mOSRNr7Av5Ma0t1ucuH6g@mail.gmail.com>
@ 2016-10-19 15:03 ` Mathieu Desnoyers
[not found] ` <85898265.59079.1476889412542.JavaMail.zimbra@efficios.com>
1 sibling, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2016-10-19 15:03 UTC (permalink / raw)
To: Evgeniy Ivanov; +Cc: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 1888 bytes --]
This is because we use call_rcu internally to trigger the hash table
resize.
In cds_lfht_destroy, we start by waiting for "in-flight" resize to complete.
Unfortunately, this requires that call_rcu worker thread progresses. If
cds_lfht_destroy is called from the call_rcu worker thread, it will wait
forever.
One alternative would be to implement our own worker thread scheme
for the rcu HT resize rather than use the call_rcu worker thread. This
would simplify cds_lfht_destroy requirements a lot.
Ideally I'd like to re-use all the call_rcu work dispatch/worker handling
scheme, just as a separate work queue.
Thoughts ?
Thanks,
Mathieu
----- On Oct 19, 2016, at 6:03 AM, Evgeniy Ivanov <i@eivanov.com> wrote:
> Sorry, found partial answer in docs which state that cds_lfht_destroy should not
> be called from a call_rcu thread context. Why does this limitation exists?
> On Wed, Oct 19, 2016 at 12:56 PM, Evgeniy Ivanov < [ mailto:i@eivanov.com |
> i@eivanov.com ] > wrote:
>> Hi,
>> Each node of top level rculfhash has nested rculfhash. Some thread clears the
>> top level map and then uses rcu_barrier() to wait until everything is destroyed
>> (it is done to check leaks). Recently it started to dead lock sometimes with
>> following stacks:
>> Thread1:
>> __poll
>> cds_lfht_destroy <---- nested map
>> ...
>> free_Node(rcu_head*) <----- node of top level map
>> call_rcu_thread
>> Thread2:
>> syscall
>> rcu_barrier_qsbr
>> destroy_all
>> main
>> Did call_rcu_thread dead lock with barrier thread? Or is it some kind of
>> internal deadlock because of nested maps?
>> --
>> Cheers,
>> Evgeniy
> --
> Cheers,
> Evgeniy
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
[-- Attachment #1.2: Type: text/html, Size: 3909 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Deadlock in call_rcu_thread when destroy rculfhash node with nested rculfhash
[not found] ` <85898265.59079.1476889412542.JavaMail.zimbra@efficios.com>
@ 2016-10-21 8:19 ` Evgeniy Ivanov
[not found] ` <CAO6Ho0d+LkJi_2ebomx13D42CApEG-bakyTbtzvz+Jvxc4wy9A@mail.gmail.com>
1 sibling, 0 replies; 6+ messages in thread
From: Evgeniy Ivanov @ 2016-10-21 8:19 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 2375 bytes --]
On Wed, Oct 19, 2016 at 6:03 PM, Mathieu Desnoyers <
mathieu.desnoyers@efficios.com> wrote:
> This is because we use call_rcu internally to trigger the hash table
> resize.
>
> In cds_lfht_destroy, we start by waiting for "in-flight" resize to
> complete.
> Unfortunately, this requires that call_rcu worker thread progresses. If
> cds_lfht_destroy is called from the call_rcu worker thread, it will wait
> forever.
>
> One alternative would be to implement our own worker thread scheme
> for the rcu HT resize rather than use the call_rcu worker thread. This
> would simplify cds_lfht_destroy requirements a lot.
>
> Ideally I'd like to re-use all the call_rcu work dispatch/worker handling
> scheme, just as a separate work queue.
>
> Thoughts ?
>
Thank you for explaining. Sounds like a plan: in our prod there is no issue
with having extra thread for table resizes. And nested tables is important
feature.
>
Thanks,
>
> Mathieu
>
> ----- On Oct 19, 2016, at 6:03 AM, Evgeniy Ivanov <i@eivanov.com> wrote:
>
> Sorry, found partial answer in docs which state that cds_lfht_destroy
> should not be called from a call_rcu thread context. Why does this
> limitation exists?
>
> On Wed, Oct 19, 2016 at 12:56 PM, Evgeniy Ivanov <i@eivanov.com> wrote:
>
>> Hi,
>>
>> Each node of top level rculfhash has nested rculfhash. Some thread clears
>> the top level map and then uses rcu_barrier() to wait until everything is
>> destroyed (it is done to check leaks). Recently it started to dead lock
>> sometimes with following stacks:
>>
>> Thread1:
>>
>> __poll
>> cds_lfht_destroy <---- nested map
>> ...
>> free_Node(rcu_head*) <----- node of top level map
>> call_rcu_thread
>>
>> Thread2:
>>
>> syscall
>> rcu_barrier_qsbr
>> destroy_all
>> main
>>
>>
>> Did call_rcu_thread dead lock with barrier thread? Or is it some kind of
>> internal deadlock because of nested maps?
>>
>>
>> --
>> Cheers,
>> Evgeniy
>>
>
>
>
> --
> Cheers,
> Evgeniy
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
--
Cheers,
Evgeniy
[-- Attachment #1.2: Type: text/html, Size: 5611 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Deadlock in call_rcu_thread when destroy rculfhash node with nested rculfhash
[not found] ` <CAO6Ho0d+LkJi_2ebomx13D42CApEG-bakyTbtzvz+Jvxc4wy9A@mail.gmail.com>
@ 2017-06-01 13:01 ` Mathieu Desnoyers
[not found] ` <5908376.2670.1496322081960.JavaMail.zimbra@efficios.com>
1 sibling, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2017-06-01 13:01 UTC (permalink / raw)
To: Evgeniy Ivanov; +Cc: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 3348 bytes --]
----- On Oct 21, 2016, at 4:19 AM, Evgeniy Ivanov <i@eivanov.com> wrote:
> On Wed, Oct 19, 2016 at 6:03 PM, Mathieu Desnoyers < [
> mailto:mathieu.desnoyers@efficios.com | mathieu.desnoyers@efficios.com ] >
> wrote:
>> This is because we use call_rcu internally to trigger the hash table
>> resize.
>> In cds_lfht_destroy, we start by waiting for "in-flight" resize to complete.
>> Unfortunately, this requires that call_rcu worker thread progresses. If
>> cds_lfht_destroy is called from the call_rcu worker thread, it will wait
>> forever.
>> One alternative would be to implement our own worker thread scheme
>> for the rcu HT resize rather than use the call_rcu worker thread. This
>> would simplify cds_lfht_destroy requirements a lot.
>> Ideally I'd like to re-use all the call_rcu work dispatch/worker handling
>> scheme, just as a separate work queue.
>> Thoughts ?
> Thank you for explaining. Sounds like a plan: in our prod there is no issue with
> having extra thread for table resizes. And nested tables is important feature.
I finally managed to find some time to implement a solution, feedback
would be welcome!
Here are the RFC patches:
https://lists.lttng.org/pipermail/lttng-dev/2017-May/027183.html
https://lists.lttng.org/pipermail/lttng-dev/2017-May/027184.html
Thanks,
Mathieu
>> Thanks,
>> Mathieu
>> ----- On Oct 19, 2016, at 6:03 AM, Evgeniy Ivanov < [ mailto:i@eivanov.com |
>> i@eivanov.com ] > wrote:
>>> Sorry, found partial answer in docs which state that cds_lfht_destroy should not
>>> be called from a call_rcu thread context. Why does this limitation exists?
>>> On Wed, Oct 19, 2016 at 12:56 PM, Evgeniy Ivanov < [ mailto:i@eivanov.com |
>>> i@eivanov.com ] > wrote:
>>>> Hi,
>>>> Each node of top level rculfhash has nested rculfhash. Some thread clears the
>>>> top level map and then uses rcu_barrier() to wait until everything is destroyed
>>>> (it is done to check leaks). Recently it started to dead lock sometimes with
>>>> following stacks:
>>>> Thread1:
>>>> __poll
>>>> cds_lfht_destroy <---- nested map
>>>> ...
>>>> free_Node(rcu_head*) <----- node of top level map
>>>> call_rcu_thread
>>>> Thread2:
>>>> syscall
>>>> rcu_barrier_qsbr
>>>> destroy_all
>>>> main
>>>> Did call_rcu_thread dead lock with barrier thread? Or is it some kind of
>>>> internal deadlock because of nested maps?
>>>> --
>>>> Cheers,
>>>> Evgeniy
>>> --
>>> Cheers,
>>> Evgeniy
>>> _______________________________________________
>>> lttng-dev mailing list
>>> [ mailto:lttng-dev@lists.lttng.org | lttng-dev@lists.lttng.org ]
>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev |
>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ]
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> [ http://www.efficios.com/ | http://www.efficios.com ]
>> _______________________________________________
>> lttng-dev mailing list
>> [ mailto:lttng-dev@lists.lttng.org | lttng-dev@lists.lttng.org ]
>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev |
>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ]
> --
> Cheers,
> Evgeniy
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
[-- Attachment #1.2: Type: text/html, Size: 7261 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Deadlock in call_rcu_thread when destroy rculfhash node with nested rculfhash
[not found] ` <5908376.2670.1496322081960.JavaMail.zimbra@efficios.com>
@ 2017-06-07 22:05 ` Mathieu Desnoyers
0 siblings, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2017-06-07 22:05 UTC (permalink / raw)
To: Evgeniy Ivanov; +Cc: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 3850 bytes --]
----- On Jun 1, 2017, at 9:01 AM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> ----- On Oct 21, 2016, at 4:19 AM, Evgeniy Ivanov <i@eivanov.com> wrote:
>> On Wed, Oct 19, 2016 at 6:03 PM, Mathieu Desnoyers < [
>> mailto:mathieu.desnoyers@efficios.com | mathieu.desnoyers@efficios.com ] >
>> wrote:
>>> This is because we use call_rcu internally to trigger the hash table
>>> resize.
>>> In cds_lfht_destroy, we start by waiting for "in-flight" resize to complete.
>>> Unfortunately, this requires that call_rcu worker thread progresses. If
>>> cds_lfht_destroy is called from the call_rcu worker thread, it will wait
>>> forever.
>>> One alternative would be to implement our own worker thread scheme
>>> for the rcu HT resize rather than use the call_rcu worker thread. This
>>> would simplify cds_lfht_destroy requirements a lot.
>>> Ideally I'd like to re-use all the call_rcu work dispatch/worker handling
>>> scheme, just as a separate work queue.
>>> Thoughts ?
>> Thank you for explaining. Sounds like a plan: in our prod there is no issue with
>> having extra thread for table resizes. And nested tables is important feature.
> I finally managed to find some time to implement a solution, feedback
> would be welcome!
> Here are the RFC patches:
> https://lists.lttng.org/pipermail/lttng-dev/2017-May/027183.html
> https://lists.lttng.org/pipermail/lttng-dev/2017-May/027184.html
Just merged commits derived from those patches into liburcu master branch.
Thanks,
Mathieu
> Thanks,
> Mathieu
>>> Thanks,
>>> Mathieu
>>> ----- On Oct 19, 2016, at 6:03 AM, Evgeniy Ivanov < [ mailto:i@eivanov.com |
>>> i@eivanov.com ] > wrote:
>>>> Sorry, found partial answer in docs which state that cds_lfht_destroy should not
>>>> be called from a call_rcu thread context. Why does this limitation exists?
>>>> On Wed, Oct 19, 2016 at 12:56 PM, Evgeniy Ivanov < [ mailto:i@eivanov.com |
>>>> i@eivanov.com ] > wrote:
>>>>> Hi,
>>>>> Each node of top level rculfhash has nested rculfhash. Some thread clears the
>>>>> top level map and then uses rcu_barrier() to wait until everything is destroyed
>>>>> (it is done to check leaks). Recently it started to dead lock sometimes with
>>>>> following stacks:
>>>>> Thread1:
>>>>> __poll
>>>>> cds_lfht_destroy <---- nested map
>>>>> ...
>>>>> free_Node(rcu_head*) <----- node of top level map
>>>>> call_rcu_thread
>>>>> Thread2:
>>>>> syscall
>>>>> rcu_barrier_qsbr
>>>>> destroy_all
>>>>> main
>>>>> Did call_rcu_thread dead lock with barrier thread? Or is it some kind of
>>>>> internal deadlock because of nested maps?
>>>>> --
>>>>> Cheers,
>>>>> Evgeniy
>>>> --
>>>> Cheers,
>>>> Evgeniy
>>>> _______________________________________________
>>>> lttng-dev mailing list
>>>> [ mailto:lttng-dev@lists.lttng.org | lttng-dev@lists.lttng.org ]
>>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev |
>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ]
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> [ http://www.efficios.com/ | http://www.efficios.com ]
>>> _______________________________________________
>>> lttng-dev mailing list
>>> [ mailto:lttng-dev@lists.lttng.org | lttng-dev@lists.lttng.org ]
>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev |
>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ]
>> --
>> Cheers,
>> Evgeniy
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev@lists.lttng.org
>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
[-- Attachment #1.2: Type: text/html, Size: 8198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 6+ messages in thread
* Deadlock in call_rcu_thread when destroy rculfhash node with nested rculfhash
@ 2016-10-19 9:56 Evgeniy Ivanov
0 siblings, 0 replies; 6+ messages in thread
From: Evgeniy Ivanov @ 2016-10-19 9:56 UTC (permalink / raw)
To: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 576 bytes --]
Hi,
Each node of top level rculfhash has nested rculfhash. Some thread clears
the top level map and then uses rcu_barrier() to wait until everything is
destroyed (it is done to check leaks). Recently it started to dead lock
sometimes with following stacks:
Thread1:
__poll
cds_lfht_destroy <---- nested map
...
free_Node(rcu_head*) <----- node of top level map
call_rcu_thread
Thread2:
syscall
rcu_barrier_qsbr
destroy_all
main
Did call_rcu_thread dead lock with barrier thread? Or is it some kind of
internal deadlock because of nested maps?
--
Cheers,
Evgeniy
[-- Attachment #1.2: Type: text/html, Size: 826 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-06-07 22:06 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAO6Ho0ea+Mhdoea4b3EL-J4z2wifHPWohq1ps74LQBU+b0-OOQ@mail.gmail.com>
2016-10-19 10:03 ` Deadlock in call_rcu_thread when destroy rculfhash node with nested rculfhash Evgeniy Ivanov
[not found] ` <CAO6Ho0e27-JTmUPrTRZfr95Lm=ri+mOSRNr7Av5Ma0t1ucuH6g@mail.gmail.com>
2016-10-19 15:03 ` Mathieu Desnoyers
[not found] ` <85898265.59079.1476889412542.JavaMail.zimbra@efficios.com>
2016-10-21 8:19 ` Evgeniy Ivanov
[not found] ` <CAO6Ho0d+LkJi_2ebomx13D42CApEG-bakyTbtzvz+Jvxc4wy9A@mail.gmail.com>
2017-06-01 13:01 ` Mathieu Desnoyers
[not found] ` <5908376.2670.1496322081960.JavaMail.zimbra@efficios.com>
2017-06-07 22:05 ` Mathieu Desnoyers
2016-10-19 9:56 Evgeniy Ivanov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.