* call_rcu seems inefficient without futex [not found] <157982514329.691.6168767011604689030.ref@pink> @ 2020-01-24 0:19 ` Alex Xu via lttng-dev 2020-01-27 15:38 ` Mathieu Desnoyers 0 siblings, 1 reply; 5+ messages in thread From: Alex Xu via lttng-dev @ 2020-01-24 0:19 UTC (permalink / raw) To: lttng-dev Hi, I recently installed knot dns for a very small FreeBSD server. I noticed that it uses a surprising amount of CPU, even when there is no load: about 0.25%. That's not huge, but it seems unnecessarily high when my QPS is less than 0.01. After some profiling, I came to the conclusion that this is caused by call_rcu_wait using futex_async to repeatedly wait. Since there is no futex on FreeBSD (without the Linux compatibility layer), this effectively turns into a permanent busy waiting loop. I think futex_noasync can be used here instead. call_rcu_wait is only supposed to be called from call_rcu_thread, never from a signal context. call_rcu calls get_call_rcu_data, which may call get_default_call_rcu_data, which calls pthread_mutex_lock through call_rcu_lock. Therefore, call_rcu is not async-signal-safe already. Also, I think it only makes sense to use call_rcu around a RCU write, which contradicts the README saying that only RCU reads are allowed in signal handlers. I applied "sed -i -e 's/futex_async/futex_noasync/' src/urcu-call-rcu-impl.h" and knot seems to work correctly with only 0.01% CPU now. I also ran tests/unit and tests/regression with default and signal backends and all completed successfully. I think that the other two usages of futex_async are also a little suspicious, but I didn't look too closely. Thanks, Alex. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: call_rcu seems inefficient without futex 2020-01-24 0:19 ` call_rcu seems inefficient without futex Alex Xu via lttng-dev @ 2020-01-27 15:38 ` Mathieu Desnoyers 2020-01-27 18:25 ` Alex Xu via lttng-dev 2020-01-28 3:45 ` Paul E. McKenney 0 siblings, 2 replies; 5+ messages in thread From: Mathieu Desnoyers @ 2020-01-27 15:38 UTC (permalink / raw) To: Alex Xu, paulmck; +Cc: lttng-dev ----- On Jan 23, 2020, at 7:19 PM, lttng-dev lttng-dev@lists.lttng.org wrote: > Hi, > > I recently installed knot dns for a very small FreeBSD server. I noticed > that it uses a surprising amount of CPU, even when there is no load: > about 0.25%. That's not huge, but it seems unnecessarily high when my > QPS is less than 0.01. > > After some profiling, I came to the conclusion that this is caused by > call_rcu_wait using futex_async to repeatedly wait. Since there is no > futex on FreeBSD (without the Linux compatibility layer), this > effectively turns into a permanent busy waiting loop. > > I think futex_noasync can be used here instead. call_rcu_wait is only > supposed to be called from call_rcu_thread, never from a signal context. > call_rcu calls get_call_rcu_data, which may call > get_default_call_rcu_data, which calls pthread_mutex_lock through > call_rcu_lock. Therefore, call_rcu is not async-signal-safe already. call_rcu() is meant to be async-signal-safe and lock-free after that initialization has been performed on first use. Paul, do you know where we have documented this in liburcu ? > Also, I think it only makes sense to use call_rcu around a RCU write, > which contradicts the README saying that only RCU reads are allowed in > signal handlers. Not sure what you mean by "use call_rcu around a RCU write" ? Is there anything similar to sys_futex on FreeBSD ? It would be good to look into alternative ways to fix this that do not involve changing the guarantees provided by call_rcu() for that fallback scenario (no futex available). Perhaps in your use-case you may want to tweak the retry delay for compat_futex_async(). Currently src/compat_futex.c:compat_futex_async() has a 10ms delay. Would 100ms be more acceptable ? Thanks, Mathieu > > I applied "sed -i -e 's/futex_async/futex_noasync/' > src/urcu-call-rcu-impl.h" and knot seems to work correctly with only > 0.01% CPU now. I also ran tests/unit and tests/regression with default > and signal backends and all completed successfully. > > I think that the other two usages of futex_async are also a little > suspicious, but I didn't look too closely. > > Thanks, > Alex. > _______________________________________________ > lttng-dev mailing list > lttng-dev@lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: call_rcu seems inefficient without futex 2020-01-27 15:38 ` Mathieu Desnoyers @ 2020-01-27 18:25 ` Alex Xu via lttng-dev 2020-01-28 3:45 ` Paul E. McKenney 1 sibling, 0 replies; 5+ messages in thread From: Alex Xu via lttng-dev @ 2020-01-27 18:25 UTC (permalink / raw) To: lttng-dev Quoting Mathieu Desnoyers (2020-01-27 15:38:05) > ----- On Jan 23, 2020, at 7:19 PM, lttng-dev lttng-dev@lists.lttng.org wrote: > call_rcu() is meant to be async-signal-safe and lock-free after that > initialization has been performed on first use. Paul, do you know where > we have documented this in liburcu ? Hm... reading it a little more carefully, it does seem that as long as you manually initialize it, then it is async-signal-safe afterwards. > > Also, I think it only makes sense to use call_rcu around a RCU write, > > which contradicts the README saying that only RCU reads are allowed in > > signal handlers. > > Not sure what you mean by "use call_rcu around a RCU write" ? I mean that in general, the pattern is usually to do an RCU write (to remove an item from a list, for example), then do call_rcu to aynchronously clean up the item. > Is there anything similar to sys_futex on FreeBSD ? Doing some more research, it seems that _umtx_op is allowed to be used by userspace applications, see https://patchwork.freedesktop.org/patch/200456/. So we can probably just use that. I'll send a patch shortly. > It would be good to look into alternative ways to fix this that do not > involve changing the guarantees provided by call_rcu() for that fallback > scenario (no futex available). Perhaps in your use-case you may want to > tweak the retry delay for compat_futex_async(). Currently > src/compat_futex.c:compat_futex_async() has a 10ms delay. Would 100ms > be more acceptable ? I don't completely understand what poll does here. Does it just mean that call_rcu callbacks would be delayed up to 100 ms? That's probably OK for cleanup uses, but I guess other uses may need faster reactions. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: call_rcu seems inefficient without futex 2020-01-27 15:38 ` Mathieu Desnoyers 2020-01-27 18:25 ` Alex Xu via lttng-dev @ 2020-01-28 3:45 ` Paul E. McKenney 2020-01-28 14:59 ` Mathieu Desnoyers 1 sibling, 1 reply; 5+ messages in thread From: Paul E. McKenney @ 2020-01-28 3:45 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: lttng-dev On Mon, Jan 27, 2020 at 10:38:05AM -0500, Mathieu Desnoyers wrote: > ----- On Jan 23, 2020, at 7:19 PM, lttng-dev lttng-dev@lists.lttng.org wrote: > > > Hi, > > > > I recently installed knot dns for a very small FreeBSD server. I noticed > > that it uses a surprising amount of CPU, even when there is no load: > > about 0.25%. That's not huge, but it seems unnecessarily high when my > > QPS is less than 0.01. > > > > After some profiling, I came to the conclusion that this is caused by > > call_rcu_wait using futex_async to repeatedly wait. Since there is no > > futex on FreeBSD (without the Linux compatibility layer), this > > effectively turns into a permanent busy waiting loop. > > > > I think futex_noasync can be used here instead. call_rcu_wait is only > > supposed to be called from call_rcu_thread, never from a signal context. > > call_rcu calls get_call_rcu_data, which may call > > get_default_call_rcu_data, which calls pthread_mutex_lock through > > call_rcu_lock. Therefore, call_rcu is not async-signal-safe already. > > call_rcu() is meant to be async-signal-safe and lock-free after that > initialization has been performed on first use. Paul, do you know where > we have documented this in liburcu ? Lock freedom is the goal, but when not in real-time mode, call_rcu() does invoke futex_async(), which can acquire locks within the Linux kernel. Should BSD instead use POSIX condvars for the call_rcu() waits and wakeups? > > Also, I think it only makes sense to use call_rcu around a RCU write, > > which contradicts the README saying that only RCU reads are allowed in > > signal handlers. I do not believe that it is always safe to invoke call_rcu() from within a signal handler. If you made sure to invoke it outside a signal handler the first time, and then used real-time mode, that should work. But in that case, you aren't invoking the futex code. > Not sure what you mean by "use call_rcu around a RCU write" ? I confess to some curiosity on this point as well. Maybe what is meant is "around a RCU write" as in "near to an RCU write" as in "in place of using synchronize_rcu()"? > Is there anything similar to sys_futex on FreeBSD ? > > It would be good to look into alternative ways to fix this that do not > involve changing the guarantees provided by call_rcu() for that fallback > scenario (no futex available). Perhaps in your use-case you may want to > tweak the retry delay for compat_futex_async(). Currently > src/compat_futex.c:compat_futex_async() has a 10ms delay. Would 100ms > be more acceptable ? If this works for knot dns, it would of course be simpler. Thanx, Paul > Thanks, > > Mathieu > > > > > I applied "sed -i -e 's/futex_async/futex_noasync/' > > src/urcu-call-rcu-impl.h" and knot seems to work correctly with only > > 0.01% CPU now. I also ran tests/unit and tests/regression with default > > and signal backends and all completed successfully. > > > > I think that the other two usages of futex_async are also a little > > suspicious, but I didn't look too closely. > > > > Thanks, > > Alex. > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev@lists.lttng.org > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: call_rcu seems inefficient without futex 2020-01-28 3:45 ` Paul E. McKenney @ 2020-01-28 14:59 ` Mathieu Desnoyers 0 siblings, 0 replies; 5+ messages in thread From: Mathieu Desnoyers @ 2020-01-28 14:59 UTC (permalink / raw) To: paulmck; +Cc: lttng-dev ----- On Jan 27, 2020, at 10:45 PM, paulmck paulmck@kernel.org wrote: > On Mon, Jan 27, 2020 at 10:38:05AM -0500, Mathieu Desnoyers wrote: >> ----- On Jan 23, 2020, at 7:19 PM, lttng-dev lttng-dev@lists.lttng.org wrote: >> >> > Hi, >> > >> > I recently installed knot dns for a very small FreeBSD server. I noticed >> > that it uses a surprising amount of CPU, even when there is no load: >> > about 0.25%. That's not huge, but it seems unnecessarily high when my >> > QPS is less than 0.01. >> > >> > After some profiling, I came to the conclusion that this is caused by >> > call_rcu_wait using futex_async to repeatedly wait. Since there is no >> > futex on FreeBSD (without the Linux compatibility layer), this >> > effectively turns into a permanent busy waiting loop. >> > >> > I think futex_noasync can be used here instead. call_rcu_wait is only >> > supposed to be called from call_rcu_thread, never from a signal context. >> > call_rcu calls get_call_rcu_data, which may call >> > get_default_call_rcu_data, which calls pthread_mutex_lock through >> > call_rcu_lock. Therefore, call_rcu is not async-signal-safe already. >> >> call_rcu() is meant to be async-signal-safe and lock-free after that >> initialization has been performed on first use. Paul, do you know where >> we have documented this in liburcu ? > > Lock freedom is the goal, but when not in real-time mode, call_rcu() > does invoke futex_async(), which can acquire locks within the Linux > kernel. > > Should BSD instead use POSIX condvars for the call_rcu() waits and > wakeups? There are two distinct benefit to lock-freedom which I think are relevant here (at least): - As you stated, lock-freedom is useful for real-time algorithms because it does not require careful handling of locks (priority inversion and so on), - Moreover, another characteristic of lock-free algorithms which is useful beyond the scope of real-time systems is its ability to fail gracefully. Basically, if a lock-free algorithm crashes at any point, the rest of the system can still go on. This is especially useful for data structures over shared memory between processes. This last point highlights why being lock-free in user-space vs being lock-free over the entire system (including kernel system call implementation) do not cover exactly the same requirements. For RT, indeed, the requirement is to be lock-free on both sides of user/kernel boundary, because timings are what matter. However, if lock-freedom is used as a mean to recover from failure gracefully, it can be sufficient to achieve lock-freedom in the userspace part of the algorithm, and then rely on non-lock-free algorithms within the kernel, because failure within the kernel is an internal kernel failure which affects the entire system anyways. > >> > Also, I think it only makes sense to use call_rcu around a RCU write, >> > which contradicts the README saying that only RCU reads are allowed in >> > signal handlers. > > I do not believe that it is always safe to invoke call_rcu() from within > a signal handler. If you made sure to invoke it outside a signal handler > the first time, and then used real-time mode, that should work. But in > that case, you aren't invoking the futex code. Other that the initialization, what prevents using non-rt call_rcu() from a signal handler context ? AFAIU it should be safe to issue futex WAKEUP from a signal handler context. > >> Not sure what you mean by "use call_rcu around a RCU write" ? > > I confess to some curiosity on this point as well. Maybe what is meant > is "around a RCU write" as in "near to an RCU write" as in "in place of > using synchronize_rcu()"? From Alex Xu's reply: "I mean that in general, the pattern is usually to do an RCU write (to remove an item from a list, for example), then do call_rcu to aynchronously clean up the item." > >> Is there anything similar to sys_futex on FreeBSD ? Alex Xu provided a patch set in a separate thread implementing "umtx" support to basically provide OS support for futex on FreeBSD and DragonflyBSD. https://lists.lttng.org/pipermail/lttng-dev/2020-January/029507.html https://lists.lttng.org/pipermail/lttng-dev/2020-January/029510.html >> >> It would be good to look into alternative ways to fix this that do not >> involve changing the guarantees provided by call_rcu() for that fallback >> scenario (no futex available). Perhaps in your use-case you may want to >> tweak the retry delay for compat_futex_async(). Currently >> src/compat_futex.c:compat_futex_async() has a 10ms delay. Would 100ms >> be more acceptable ? > > If this works for knot dns, it would of course be simpler. I think we should not put too much effort in tweaking the fallback for scenarios where futex is missing. The proper approach seems to be to implement proper support for futex-like APIs provided by each OS kernel. Thanks, Mathieu > > Thanx, Paul > >> Thanks, >> >> Mathieu >> >> > >> > I applied "sed -i -e 's/futex_async/futex_noasync/' >> > src/urcu-call-rcu-impl.h" and knot seems to work correctly with only >> > 0.01% CPU now. I also ran tests/unit and tests/regression with default >> > and signal backends and all completed successfully. >> > >> > I think that the other two usages of futex_async are also a little >> > suspicious, but I didn't look too closely. >> > >> > Thanks, >> > Alex. >> > _______________________________________________ >> > lttng-dev mailing list >> > lttng-dev@lists.lttng.org >> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. > > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-01-28 14:59 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <157982514329.691.6168767011604689030.ref@pink> 2020-01-24 0:19 ` call_rcu seems inefficient without futex Alex Xu via lttng-dev 2020-01-27 15:38 ` Mathieu Desnoyers 2020-01-27 18:25 ` Alex Xu via lttng-dev 2020-01-28 3:45 ` Paul E. McKenney 2020-01-28 14:59 ` Mathieu Desnoyers
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.