linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay()
@ 2018-08-13  3:04 Jia-Ju Bai
  2018-08-13  4:18 ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Jia-Ju Bai @ 2018-08-13  3:04 UTC (permalink / raw)
  To: dave, paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai
  Cc: Linux Kernel Mailing List

The kernel may sleep with holding a spinlock.

The function call paths (from bottom to top) in Linux-4.16 are:

[FUNC] schedule_timeout_interruptible
kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in 
srcu_read_delay
kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in 
rcu_torture_timer
kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer

Note that [FUNC_PTR] means a function pointer call is used.

I do not find a good way to fix, so I only report.
This is found by my static analysis tool (DSAC).


Thanks,
Jia-Ju Bai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay()
  2018-08-13  3:04 [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay() Jia-Ju Bai
@ 2018-08-13  4:18 ` Paul E. McKenney
  2018-08-13  9:26   ` Jia-Ju Bai
  0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2018-08-13  4:18 UTC (permalink / raw)
  To: Jia-Ju Bai
  Cc: dave, josh, rostedt, mathieu.desnoyers, jiangshanlai,
	Linux Kernel Mailing List

On Mon, Aug 13, 2018 at 11:04:10AM +0800, Jia-Ju Bai wrote:
> The kernel may sleep with holding a spinlock.
> 
> The function call paths (from bottom to top) in Linux-4.16 are:
> 
> [FUNC] schedule_timeout_interruptible
> kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in
> srcu_read_delay
> kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in
> rcu_torture_timer
> kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer
> 
> Note that [FUNC_PTR] means a function pointer call is used.
> 
> I do not find a good way to fix, so I only report.
> This is found by my static analysis tool (DSAC).

Interesting.  I would have expected to have gotten a "scheduling while
atomic" error message, which I do not recall seeing.  And I ran a great
deal of rcutorture on v4.16.

So let's see...  As you say, the rcu_torture_timer() function does in
fact acquire rand_lock in 4.16 and 4.17, in which case sleeping would
indeed be illegal.  But let's take a look at srcu_read_delay():

static void
srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
{
	long delay;
	const long uspertick = 1000000 / HZ;
	const long longdelay = 10;

	/* We want there to be long-running readers, but not all the time. */

	delay = torture_random(rrsp) %
		(nrealreaders * 2 * longdelay * uspertick);
	if (!delay && in_task()) {
		schedule_timeout_interruptible(longdelay);
		rtrsp->rt_delay_jiffies = longdelay;
	} else {
		rcu_read_delay(rrsp, rtrsp);
	}
}

The call to schedule_timeout_interruptible() cannot happen unless the
in_task() macro returns true, which it won't if the SOFTIRQ_OFFSET bit
is set:

#define in_task()		(!(preempt_count() & \
				   (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))

And the SOFTIRQ_OFFSET bit will be set if srcu_read_delay()
is invoked from a timer handler, which is the case for the
call from rcu_torture_timer().  So if that lock is held,
schedule_timeout_interruptible() won't ever be invoked.

So what am I missing here?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay()
  2018-08-13  4:18 ` Paul E. McKenney
@ 2018-08-13  9:26   ` Jia-Ju Bai
  2018-08-13 12:42     ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Jia-Ju Bai @ 2018-08-13  9:26 UTC (permalink / raw)
  To: paulmck
  Cc: dave, josh, rostedt, mathieu.desnoyers, jiangshanlai,
	Linux Kernel Mailing List



On 2018/8/13 12:18, Paul E. McKenney wrote:
> On Mon, Aug 13, 2018 at 11:04:10AM +0800, Jia-Ju Bai wrote:
>> The kernel may sleep with holding a spinlock.
>>
>> The function call paths (from bottom to top) in Linux-4.16 are:
>>
>> [FUNC] schedule_timeout_interruptible
>> kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in
>> srcu_read_delay
>> kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in
>> rcu_torture_timer
>> kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer
>>
>> Note that [FUNC_PTR] means a function pointer call is used.
>>
>> I do not find a good way to fix, so I only report.
>> This is found by my static analysis tool (DSAC).
> Interesting.  I would have expected to have gotten a "scheduling while
> atomic" error message, which I do not recall seeing.  And I ran a great
> deal of rcutorture on v4.16.
>
> So let's see...  As you say, the rcu_torture_timer() function does in
> fact acquire rand_lock in 4.16 and 4.17, in which case sleeping would
> indeed be illegal.  But let's take a look at srcu_read_delay():
>
> static void
> srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
> {
> 	long delay;
> 	const long uspertick = 1000000 / HZ;
> 	const long longdelay = 10;
>
> 	/* We want there to be long-running readers, but not all the time. */
>
> 	delay = torture_random(rrsp) %
> 		(nrealreaders * 2 * longdelay * uspertick);
> 	if (!delay && in_task()) {
> 		schedule_timeout_interruptible(longdelay);
> 		rtrsp->rt_delay_jiffies = longdelay;
> 	} else {
> 		rcu_read_delay(rrsp, rtrsp);
> 	}
> }
>
> The call to schedule_timeout_interruptible() cannot happen unless the
> in_task() macro returns true, which it won't if the SOFTIRQ_OFFSET bit
> is set:
>
> #define in_task()		(!(preempt_count() & \
> 				   (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
>
> And the SOFTIRQ_OFFSET bit will be set if srcu_read_delay()
> is invoked from a timer handler, which is the case for the
> call from rcu_torture_timer().  So if that lock is held,
> schedule_timeout_interruptible() won't ever be invoked.

Thanks for your reply :)
My tool does not track this bit...
Sorry for this false report.


Best wishes,
Jia-Ju Bai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay()
  2018-08-13  9:26   ` Jia-Ju Bai
@ 2018-08-13 12:42     ` Paul E. McKenney
  2018-08-15  1:05       ` Jia-Ju Bai
  0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2018-08-13 12:42 UTC (permalink / raw)
  To: Jia-Ju Bai
  Cc: dave, josh, rostedt, mathieu.desnoyers, jiangshanlai,
	Linux Kernel Mailing List

On Mon, Aug 13, 2018 at 05:26:49PM +0800, Jia-Ju Bai wrote:
> 
> 
> On 2018/8/13 12:18, Paul E. McKenney wrote:
> >On Mon, Aug 13, 2018 at 11:04:10AM +0800, Jia-Ju Bai wrote:
> >>The kernel may sleep with holding a spinlock.
> >>
> >>The function call paths (from bottom to top) in Linux-4.16 are:
> >>
> >>[FUNC] schedule_timeout_interruptible
> >>kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in
> >>srcu_read_delay
> >>kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in
> >>rcu_torture_timer
> >>kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer
> >>
> >>Note that [FUNC_PTR] means a function pointer call is used.
> >>
> >>I do not find a good way to fix, so I only report.
> >>This is found by my static analysis tool (DSAC).
> >Interesting.  I would have expected to have gotten a "scheduling while
> >atomic" error message, which I do not recall seeing.  And I ran a great
> >deal of rcutorture on v4.16.
> >
> >So let's see...  As you say, the rcu_torture_timer() function does in
> >fact acquire rand_lock in 4.16 and 4.17, in which case sleeping would
> >indeed be illegal.  But let's take a look at srcu_read_delay():
> >
> >static void
> >srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
> >{
> >	long delay;
> >	const long uspertick = 1000000 / HZ;
> >	const long longdelay = 10;
> >
> >	/* We want there to be long-running readers, but not all the time. */
> >
> >	delay = torture_random(rrsp) %
> >		(nrealreaders * 2 * longdelay * uspertick);
> >	if (!delay && in_task()) {
> >		schedule_timeout_interruptible(longdelay);
> >		rtrsp->rt_delay_jiffies = longdelay;
> >	} else {
> >		rcu_read_delay(rrsp, rtrsp);
> >	}
> >}
> >
> >The call to schedule_timeout_interruptible() cannot happen unless the
> >in_task() macro returns true, which it won't if the SOFTIRQ_OFFSET bit
> >is set:
> >
> >#define in_task()		(!(preempt_count() & \
> >				   (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
> >
> >And the SOFTIRQ_OFFSET bit will be set if srcu_read_delay()
> >is invoked from a timer handler, which is the case for the
> >call from rcu_torture_timer().  So if that lock is held,
> >schedule_timeout_interruptible() won't ever be invoked.
> 
> Thanks for your reply :)
> My tool does not track this bit...
> Sorry for this false report.

Not a problem, a few false positives are to be expected.  And it looks
like you have some work to do on your tool -- which is good, because I
would not want you to be bored.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay()
  2018-08-13 12:42     ` Paul E. McKenney
@ 2018-08-15  1:05       ` Jia-Ju Bai
  0 siblings, 0 replies; 5+ messages in thread
From: Jia-Ju Bai @ 2018-08-15  1:05 UTC (permalink / raw)
  To: paulmck
  Cc: dave, josh, rostedt, mathieu.desnoyers, jiangshanlai,
	Linux Kernel Mailing List



On 2018/8/13 20:42, Paul E. McKenney wrote:
> On Mon, Aug 13, 2018 at 05:26:49PM +0800, Jia-Ju Bai wrote:
>>
>> On 2018/8/13 12:18, Paul E. McKenney wrote:
>>> On Mon, Aug 13, 2018 at 11:04:10AM +0800, Jia-Ju Bai wrote:
>>>> The kernel may sleep with holding a spinlock.
>>>>
>>>> The function call paths (from bottom to top) in Linux-4.16 are:
>>>>
>>>> [FUNC] schedule_timeout_interruptible
>>>> kernel/rcu/rcutorture.c, 523: schedule_timeout_interruptible in
>>>> srcu_read_delay
>>>> kernel/rcu/rcutorture.c, 1105: [FUNC_PTR]srcu_read_delay in
>>>> rcu_torture_timer
>>>> kernel/rcu/rcutorture.c, 1104: spin_lock in rcu_torture_timer
>>>>
>>>> Note that [FUNC_PTR] means a function pointer call is used.
>>>>
>>>> I do not find a good way to fix, so I only report.
>>>> This is found by my static analysis tool (DSAC).
>>> Interesting.  I would have expected to have gotten a "scheduling while
>>> atomic" error message, which I do not recall seeing.  And I ran a great
>>> deal of rcutorture on v4.16.
>>>
>>> So let's see...  As you say, the rcu_torture_timer() function does in
>>> fact acquire rand_lock in 4.16 and 4.17, in which case sleeping would
>>> indeed be illegal.  But let's take a look at srcu_read_delay():
>>>
>>> static void
>>> srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
>>> {
>>> 	long delay;
>>> 	const long uspertick = 1000000 / HZ;
>>> 	const long longdelay = 10;
>>>
>>> 	/* We want there to be long-running readers, but not all the time. */
>>>
>>> 	delay = torture_random(rrsp) %
>>> 		(nrealreaders * 2 * longdelay * uspertick);
>>> 	if (!delay && in_task()) {
>>> 		schedule_timeout_interruptible(longdelay);
>>> 		rtrsp->rt_delay_jiffies = longdelay;
>>> 	} else {
>>> 		rcu_read_delay(rrsp, rtrsp);
>>> 	}
>>> }
>>>
>>> The call to schedule_timeout_interruptible() cannot happen unless the
>>> in_task() macro returns true, which it won't if the SOFTIRQ_OFFSET bit
>>> is set:
>>>
>>> #define in_task()		(!(preempt_count() & \
>>> 				   (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
>>>
>>> And the SOFTIRQ_OFFSET bit will be set if srcu_read_delay()
>>> is invoked from a timer handler, which is the case for the
>>> call from rcu_torture_timer().  So if that lock is held,
>>> schedule_timeout_interruptible() won't ever be invoked.
>> Thanks for your reply :)
>> My tool does not track this bit...
>> Sorry for this false report.
> Not a problem, a few false positives are to be expected.  And it looks
> like you have some work to do on your tool -- which is good, because I
> would not want you to be bored.  ;-)
>
> 							Thanx, Paul
>

Thanks for your advice.
I will improve my tool to produce less false positives :)


Best wishes,
Jia-Ju Bai

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-08-15  1:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-13  3:04 [BUG] kernel: rcu: a possible sleep-in-atomic-context bug in srcu_read_delay() Jia-Ju Bai
2018-08-13  4:18 ` Paul E. McKenney
2018-08-13  9:26   ` Jia-Ju Bai
2018-08-13 12:42     ` Paul E. McKenney
2018-08-15  1:05       ` Jia-Ju Bai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).