* [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay() @ 2018-08-13 3:04 Jia-Ju Bai 2018-08-13 4:18 ` Paul E. McKenney 0 siblings, 1 reply; 5+ messages in thread From: Jia-Ju Bai @ 2018-08-13 3:04 UTC (permalink / raw) To: dave, paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai Cc: Linux Kernel Mailing List The kernel may sleep with holding a spinlock. The function call paths (from bottom to top) in Linux-4.16 are: [FUNC] schedule_timeout_interruptible kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in srcu_read_delay kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in rcu_torture_timer kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer Note that [FUNC_PTR] means a function pointer call is used. I do not find a good way to fix, so I only report. This is found by my static analysis tool (DSAC). Thanks, Jia-Ju Bai ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay() 2018-08-13 3:04 [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay() Jia-Ju Bai @ 2018-08-13 4:18 ` Paul E. McKenney 2018-08-13 9:26 ` Jia-Ju Bai 0 siblings, 1 reply; 5+ messages in thread From: Paul E. McKenney @ 2018-08-13 4:18 UTC (permalink / raw) To: Jia-Ju Bai Cc: dave, josh, rostedt, mathieu.desnoyers, jiangshanlai, Linux Kernel Mailing List On Mon, Aug 13, 2018 at 11:04:10AM +0800, Jia-Ju Bai wrote: > The kernel may sleep with holding a spinlock. > > The function call paths (from bottom to top) in Linux-4.16 are: > > [FUNC] schedule_timeout_interruptible > kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in > srcu_read_delay > kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in > rcu_torture_timer > kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer > > Note that [FUNC_PTR] means a function pointer call is used. > > I do not find a good way to fix, so I only report. > This is found by my static analysis tool (DSAC). Interesting. I would have expected to have gotten a "scheduling while atomic" error message, which I do not recall seeing. And I ran a great deal of rcutorture on v4.16. So let's see... As you say, the rcu_torture_timer() function does in fact acquire rand_lock in 4.16 and 4.17, in which case sleeping would indeed be illegal. But let's take a look at srcu_read_delay(): static void srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp) { long delay; const long uspertick = 1000000 / HZ; const long longdelay = 10; /* We want there to be long-running readers, but not all the time. */ delay = torture_random(rrsp) % (nrealreaders * 2 * longdelay * uspertick); if (!delay && in_task()) { schedule_timeout_interruptible(longdelay); rtrsp->rt_delay_jiffies = longdelay; } else { rcu_read_delay(rrsp, rtrsp); } } The call to schedule_timeout_interruptible() cannot happen unless the in_task() macro returns true, which it won't if the SOFTIRQ_OFFSET bit is set: #define in_task() (!(preempt_count() & \ (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) And the SOFTIRQ_OFFSET bit will be set if srcu_read_delay() is invoked from a timer handler, which is the case for the call from rcu_torture_timer(). So if that lock is held, schedule_timeout_interruptible() won't ever be invoked. So what am I missing here? Thanx, Paul ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay() 2018-08-13 4:18 ` Paul E. McKenney @ 2018-08-13 9:26 ` Jia-Ju Bai 2018-08-13 12:42 ` Paul E. McKenney 0 siblings, 1 reply; 5+ messages in thread From: Jia-Ju Bai @ 2018-08-13 9:26 UTC (permalink / raw) To: paulmck Cc: dave, josh, rostedt, mathieu.desnoyers, jiangshanlai, Linux Kernel Mailing List On 2018/8/13 12:18, Paul E. McKenney wrote: > On Mon, Aug 13, 2018 at 11:04:10AM +0800, Jia-Ju Bai wrote: >> The kernel may sleep with holding a spinlock. >> >> The function call paths (from bottom to top) in Linux-4.16 are: >> >> [FUNC] schedule_timeout_interruptible >> kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in >> srcu_read_delay >> kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in >> rcu_torture_timer >> kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer >> >> Note that [FUNC_PTR] means a function pointer call is used. >> >> I do not find a good way to fix, so I only report. >> This is found by my static analysis tool (DSAC). > Interesting. I would have expected to have gotten a "scheduling while > atomic" error message, which I do not recall seeing. And I ran a great > deal of rcutorture on v4.16. > > So let's see... As you say, the rcu_torture_timer() function does in > fact acquire rand_lock in 4.16 and 4.17, in which case sleeping would > indeed be illegal. But let's take a look at srcu_read_delay(): > > static void > srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp) > { > long delay; > const long uspertick = 1000000 / HZ; > const long longdelay = 10; > > /* We want there to be long-running readers, but not all the time. */ > > delay = torture_random(rrsp) % > (nrealreaders * 2 * longdelay * uspertick); > if (!delay && in_task()) { > schedule_timeout_interruptible(longdelay); > rtrsp->rt_delay_jiffies = longdelay; > } else { > rcu_read_delay(rrsp, rtrsp); > } > } > > The call to schedule_timeout_interruptible() cannot happen unless the > in_task() macro returns true, which it won't if the SOFTIRQ_OFFSET bit > is set: > > #define in_task() (!(preempt_count() & \ > (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) > > And the SOFTIRQ_OFFSET bit will be set if srcu_read_delay() > is invoked from a timer handler, which is the case for the > call from rcu_torture_timer(). So if that lock is held, > schedule_timeout_interruptible() won't ever be invoked. Thanks for your reply :) My tool does not track this bit... Sorry for this false report. Best wishes, Jia-Ju Bai ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay() 2018-08-13 9:26 ` Jia-Ju Bai @ 2018-08-13 12:42 ` Paul E. McKenney 2018-08-15 1:05 ` Jia-Ju Bai 0 siblings, 1 reply; 5+ messages in thread From: Paul E. McKenney @ 2018-08-13 12:42 UTC (permalink / raw) To: Jia-Ju Bai Cc: dave, josh, rostedt, mathieu.desnoyers, jiangshanlai, Linux Kernel Mailing List On Mon, Aug 13, 2018 at 05:26:49PM +0800, Jia-Ju Bai wrote: > > > On 2018/8/13 12:18, Paul E. McKenney wrote: > >On Mon, Aug 13, 2018 at 11:04:10AM +0800, Jia-Ju Bai wrote: > >>The kernel may sleep with holding a spinlock. > >> > >>The function call paths (from bottom to top) in Linux-4.16 are: > >> > >>[FUNC] schedule_timeout_interruptible > >>kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in > >>srcu_read_delay > >>kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in > >>rcu_torture_timer > >>kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer > >> > >>Note that [FUNC_PTR] means a function pointer call is used. > >> > >>I do not find a good way to fix, so I only report. > >>This is found by my static analysis tool (DSAC). > >Interesting. I would have expected to have gotten a "scheduling while > >atomic" error message, which I do not recall seeing. And I ran a great > >deal of rcutorture on v4.16. > > > >So let's see... As you say, the rcu_torture_timer() function does in > >fact acquire rand_lock in 4.16 and 4.17, in which case sleeping would > >indeed be illegal. But let's take a look at srcu_read_delay(): > > > >static void > >srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp) > >{ > > long delay; > > const long uspertick = 1000000 / HZ; > > const long longdelay = 10; > > > > /* We want there to be long-running readers, but not all the time. */ > > > > delay = torture_random(rrsp) % > > (nrealreaders * 2 * longdelay * uspertick); > > if (!delay && in_task()) { > > schedule_timeout_interruptible(longdelay); > > rtrsp->rt_delay_jiffies = longdelay; > > } else { > > rcu_read_delay(rrsp, rtrsp); > > } > >} > > > >The call to schedule_timeout_interruptible() cannot happen unless the > >in_task() macro returns true, which it won't if the SOFTIRQ_OFFSET bit > >is set: > > > >#define in_task() (!(preempt_count() & \ > > (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) > > > >And the SOFTIRQ_OFFSET bit will be set if srcu_read_delay() > >is invoked from a timer handler, which is the case for the > >call from rcu_torture_timer(). So if that lock is held, > >schedule_timeout_interruptible() won't ever be invoked. > > Thanks for your reply :) > My tool does not track this bit... > Sorry for this false report. Not a problem, a few false positives are to be expected. And it looks like you have some work to do on your tool -- which is good, because I would not want you to be bored. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay() 2018-08-13 12:42 ` Paul E. McKenney @ 2018-08-15 1:05 ` Jia-Ju Bai 0 siblings, 0 replies; 5+ messages in thread From: Jia-Ju Bai @ 2018-08-15 1:05 UTC (permalink / raw) To: paulmck Cc: dave, josh, rostedt, mathieu.desnoyers, jiangshanlai, Linux Kernel Mailing List On 2018/8/13 20:42, Paul E. McKenney wrote: > On Mon, Aug 13, 2018 at 05:26:49PM +0800, Jia-Ju Bai wrote: >> >> On 2018/8/13 12:18, Paul E. McKenney wrote: >>> On Mon, Aug 13, 2018 at 11:04:10AM +0800, Jia-Ju Bai wrote: >>>> The kernel may sleep with holding a spinlock. >>>> >>>> The function call paths (from bottom to top) in Linux-4.16 are: >>>> >>>> [FUNC] schedule_timeout_interruptible >>>> kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in >>>> srcu_read_delay >>>> kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in >>>> rcu_torture_timer >>>> kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer >>>> >>>> Note that [FUNC_PTR] means a function pointer call is used. >>>> >>>> I do not find a good way to fix, so I only report. >>>> This is found by my static analysis tool (DSAC). >>> Interesting. I would have expected to have gotten a "scheduling while >>> atomic" error message, which I do not recall seeing. And I ran a great >>> deal of rcutorture on v4.16. >>> >>> So let's see... As you say, the rcu_torture_timer() function does in >>> fact acquire rand_lock in 4.16 and 4.17, in which case sleeping would >>> indeed be illegal. But let's take a look at srcu_read_delay(): >>> >>> static void >>> srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp) >>> { >>> long delay; >>> const long uspertick = 1000000 / HZ; >>> const long longdelay = 10; >>> >>> /* We want there to be long-running readers, but not all the time. */ >>> >>> delay = torture_random(rrsp) % >>> (nrealreaders * 2 * longdelay * uspertick); >>> if (!delay && in_task()) { >>> schedule_timeout_interruptible(longdelay); >>> rtrsp->rt_delay_jiffies = longdelay; >>> } else { >>> rcu_read_delay(rrsp, rtrsp); >>> } >>> } >>> >>> The call to schedule_timeout_interruptible() cannot happen unless the >>> in_task() macro returns true, which it won't if the SOFTIRQ_OFFSET bit >>> is set: >>> >>> #define in_task() (!(preempt_count() & \ >>> (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) >>> >>> And the SOFTIRQ_OFFSET bit will be set if srcu_read_delay() >>> is invoked from a timer handler, which is the case for the >>> call from rcu_torture_timer(). So if that lock is held, >>> schedule_timeout_interruptible() won't ever be invoked. >> Thanks for your reply :) >> My tool does not track this bit... >> Sorry for this false report. > Not a problem, a few false positives are to be expected. And it looks > like you have some work to do on your tool -- which is good, because I > would not want you to be bored. ;-) > > Thanx, Paul > Thanks for your advice. I will improve my tool to produce less false positives :) Best wishes, Jia-Ju Bai ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-08-15 1:05 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-13 3:04 [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay() Jia-Ju Bai 2018-08-13 4:18 ` Paul E. McKenney 2018-08-13 9:26 ` Jia-Ju Bai 2018-08-13 12:42 ` Paul E. McKenney 2018-08-15 1:05 ` Jia-Ju Bai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).