All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks
@ 2015-04-16 14:06 Jan Kiszka
  2015-04-16 14:12 ` Steven Rostedt
  2015-04-16 14:26 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Kiszka @ 2015-04-16 14:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, RT; +Cc: Linux Kernel Mailing List, Steven Rostedt

ftrace may trigger rb_wakeups while holding pi_lock which will also be
requested via trace_...->...->ring_buffer_unlock_commit->...->
irq_work_queue->raise_softirq->try_to_wake_up. This quickly causes
deadlocks when trying to use ftrace under -rt.

Resolve this by marking the ring buffer's irq_work as HARD_IRQ.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

I'm not yet sure if this doesn't push work into hard-irq context that
is better not done there on -rt.

I'm also not sure if there aren't more such cases, given that -rt turns
the default irq_work wakeup policy around. But maybe we are lucky.

 kernel/trace/ring_buffer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f4fbbfc..6a1939e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1252,6 +1252,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 	init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters);
 	init_waitqueue_head(&cpu_buffer->irq_work.waiters);
 	init_waitqueue_head(&cpu_buffer->irq_work.full_waiters);
+	cpu_buffer->irq_work.work.flags = IRQ_WORK_HARD_IRQ;
 
 	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 			    GFP_KERNEL, cpu_to_node(cpu));

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks
  2015-04-16 14:06 [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks Jan Kiszka
@ 2015-04-16 14:12 ` Steven Rostedt
  2015-04-16 14:26 ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2015-04-16 14:12 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Thu, 16 Apr 2015 16:06:54 +0200
Jan Kiszka <jan.kiszka@siemens.com> wrote:

> ftrace may trigger rb_wakeups while holding pi_lock which will also be
> requested via trace_...->...->ring_buffer_unlock_commit->...->
> irq_work_queue->raise_softirq->try_to_wake_up. This quickly causes
> deadlocks when trying to use ftrace under -rt.
> 
> Resolve this by marking the ring buffer's irq_work as HARD_IRQ.

Hmm,  I could have sworn I wrote a patch like this not too long ago.

I'll have to check that out.

Thanks,

-- Steve

> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> 
> I'm not yet sure if this doesn't push work into hard-irq context that
> is better not done there on -rt.
> 
> I'm also not sure if there aren't more such cases, given that -rt turns
> the default irq_work wakeup policy around. But maybe we are lucky.
> 
>  kernel/trace/ring_buffer.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index f4fbbfc..6a1939e 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1252,6 +1252,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
>  	init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters);
>  	init_waitqueue_head(&cpu_buffer->irq_work.waiters);
>  	init_waitqueue_head(&cpu_buffer->irq_work.full_waiters);
> +	cpu_buffer->irq_work.work.flags = IRQ_WORK_HARD_IRQ;
>  
>  	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
>  			    GFP_KERNEL, cpu_to_node(cpu));


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks
  2015-04-16 14:06 [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks Jan Kiszka
  2015-04-16 14:12 ` Steven Rostedt
@ 2015-04-16 14:26 ` Sebastian Andrzej Siewior
  2015-04-16 14:28   ` Jan Kiszka
  1 sibling, 1 reply; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-04-16 14:26 UTC (permalink / raw)
  To: Jan Kiszka, RT; +Cc: Linux Kernel Mailing List, Steven Rostedt

On 04/16/2015 04:06 PM, Jan Kiszka wrote:
> ftrace may trigger rb_wakeups while holding pi_lock which will also be
> requested via trace_...->...->ring_buffer_unlock_commit->...->
> irq_work_queue->raise_softirq->try_to_wake_up. This quickly causes
> deadlocks when trying to use ftrace under -rt.
> 
> Resolve this by marking the ring buffer's irq_work as HARD_IRQ.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> 
> I'm not yet sure if this doesn't push work into hard-irq context that
> is better not done there on -rt.

everything should be done in the soft-irq.

> 
> I'm also not sure if there aren't more such cases, given that -rt turns
> the default irq_work wakeup policy around. But maybe we are lucky.

The only thing that is getting done in the hardirq is the FULL_NO_HZ
thingy. I would be _very_ glad if we could keep it that way.

>  kernel/trace/ring_buffer.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index f4fbbfc..6a1939e 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1252,6 +1252,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
>  	init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters);
>  	init_waitqueue_head(&cpu_buffer->irq_work.waiters);
>  	init_waitqueue_head(&cpu_buffer->irq_work.full_waiters);
> +	cpu_buffer->irq_work.work.flags = IRQ_WORK_HARD_IRQ;
>  
>  	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
>  			    GFP_KERNEL, cpu_to_node(cpu));
> 

Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks
  2015-04-16 14:26 ` Sebastian Andrzej Siewior
@ 2015-04-16 14:28   ` Jan Kiszka
  2015-04-16 14:57     ` Sebastian Andrzej Siewior
  2015-04-16 15:10     ` Steven Rostedt
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Kiszka @ 2015-04-16 14:28 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, RT; +Cc: Linux Kernel Mailing List, Steven Rostedt

On 2015-04-16 16:26, Sebastian Andrzej Siewior wrote:
> On 04/16/2015 04:06 PM, Jan Kiszka wrote:
>> ftrace may trigger rb_wakeups while holding pi_lock which will also be
>> requested via trace_...->...->ring_buffer_unlock_commit->...->
>> irq_work_queue->raise_softirq->try_to_wake_up. This quickly causes
>> deadlocks when trying to use ftrace under -rt.
>>
>> Resolve this by marking the ring buffer's irq_work as HARD_IRQ.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>
>> I'm not yet sure if this doesn't push work into hard-irq context that
>> is better not done there on -rt.
> 
> everything should be done in the soft-irq.
> 
>>
>> I'm also not sure if there aren't more such cases, given that -rt turns
>> the default irq_work wakeup policy around. But maybe we are lucky.
> 
> The only thing that is getting done in the hardirq is the FULL_NO_HZ
> thingy. I would be _very_ glad if we could keep it that way.

Then - to my current understanding - we need an NMI-safe trigger for
soft-irq work. Is there anything like this existing already? Or can we
still use the IPI-based kick without actually doing the work in hard-irq
context?

Jan

> 
>>  kernel/trace/ring_buffer.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
>> index f4fbbfc..6a1939e 100644
>> --- a/kernel/trace/ring_buffer.c
>> +++ b/kernel/trace/ring_buffer.c
>> @@ -1252,6 +1252,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
>>  	init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters);
>>  	init_waitqueue_head(&cpu_buffer->irq_work.waiters);
>>  	init_waitqueue_head(&cpu_buffer->irq_work.full_waiters);
>> +	cpu_buffer->irq_work.work.flags = IRQ_WORK_HARD_IRQ;
>>  
>>  	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
>>  			    GFP_KERNEL, cpu_to_node(cpu));
>>
> 
> Sebastian
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks
  2015-04-16 14:28   ` Jan Kiszka
@ 2015-04-16 14:57     ` Sebastian Andrzej Siewior
  2015-04-16 15:31       ` Jan Kiszka
  2015-04-16 15:10     ` Steven Rostedt
  1 sibling, 1 reply; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-04-16 14:57 UTC (permalink / raw)
  To: Jan Kiszka, RT; +Cc: Linux Kernel Mailing List, Steven Rostedt

On 04/16/2015 04:28 PM, Jan Kiszka wrote:
> On 2015-04-16 16:26, Sebastian Andrzej Siewior wrote:
>> On 04/16/2015 04:06 PM, Jan Kiszka wrote:
>>> ftrace may trigger rb_wakeups while holding pi_lock which will also be
>>> requested via trace_...->...->ring_buffer_unlock_commit->...->
>>> irq_work_queue->raise_softirq->try_to_wake_up. This quickly causes
>>> deadlocks when trying to use ftrace under -rt.
>>>
>>> Resolve this by marking the ring buffer's irq_work as HARD_IRQ.
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>> ---
>>>
>>> I'm not yet sure if this doesn't push work into hard-irq context that
>>> is better not done there on -rt.
>>
>> everything should be done in the soft-irq.
>>
>>>
>>> I'm also not sure if there aren't more such cases, given that -rt turns
>>> the default irq_work wakeup policy around. But maybe we are lucky.
>>
>> The only thing that is getting done in the hardirq is the FULL_NO_HZ
>> thingy. I would be _very_ glad if we could keep it that way.
> 
> Then - to my current understanding - we need an NMI-safe trigger for
> soft-irq work. Is there anything like this existing already? Or can we
> still use the IPI-based kick without actually doing the work in hard-irq
> context?

But if you trigger it via IPI it will still run in hardirq context,
right? Can you describe how run into this and try to think about it in
a quiet moment. It it just enabling the function tracer and running it?

> Jan

Sebastian


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks
  2015-04-16 14:28   ` Jan Kiszka
  2015-04-16 14:57     ` Sebastian Andrzej Siewior
@ 2015-04-16 15:10     ` Steven Rostedt
  2015-04-16 15:29       ` Jan Kiszka
  1 sibling, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2015-04-16 15:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Thu, 16 Apr 2015 16:28:58 +0200
Jan Kiszka <jan.kiszka@siemens.com> wrote:

> On 2015-04-16 16:26, Sebastian Andrzej Siewior wrote:
> > On 04/16/2015 04:06 PM, Jan Kiszka wrote:
> >> ftrace may trigger rb_wakeups while holding pi_lock which will also be
> >> requested via trace_...->...->ring_buffer_unlock_commit->...->
> >> irq_work_queue->raise_softirq->try_to_wake_up. This quickly causes
> >> deadlocks when trying to use ftrace under -rt.
> >>
> >> Resolve this by marking the ring buffer's irq_work as HARD_IRQ.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
> >>
> >> I'm not yet sure if this doesn't push work into hard-irq context that
> >> is better not done there on -rt.
> > 
> > everything should be done in the soft-irq.
> > 
> >>
> >> I'm also not sure if there aren't more such cases, given that -rt turns
> >> the default irq_work wakeup policy around. But maybe we are lucky.
> > 
> > The only thing that is getting done in the hardirq is the FULL_NO_HZ
> > thingy. I would be _very_ glad if we could keep it that way.

tracing is special, even more so than NO_HZ_FULL, as it also traces
that as well (and even RCU). Tracing the kernel is like a debugger.
Ideally, it would not be part of the kernel, but just an external
observer. Without special hardware that is not the case, so we try to
be outside the main system as much as possible.


> 
> Then - to my current understanding - we need an NMI-safe trigger for
> soft-irq work. Is there anything like this existing already? Or can we
> still use the IPI-based kick without actually doing the work in hard-irq
> context?
> 

The reason why it uses irq_work() is because a simple wakeup can
deadlock the system if called by the tracing infrastructure (as we see
raise_softirq() does too).

But yeah, there's no real need to have the ring buffer irq work
handler run from hardirq context. The only requirement is that you can
not do the raise from the irq_work_queue call. If you want to have the
hardirq work handle do the raise softirq, that's fine. Perhaps that's
the solution? Have all irq_work_queue() always trigger the hard irq, but
the hard irq may just raise a softirq or it will call the handler
directly if IRQ_WORK_HARD_IRQ is set.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks
  2015-04-16 15:10     ` Steven Rostedt
@ 2015-04-16 15:29       ` Jan Kiszka
  2015-04-16 15:33         ` Sebastian Andrzej Siewior
  2015-04-16 16:28         ` [PATCH RT 3.18] irq_work: Provide a soft-irq based queue Jan Kiszka
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Kiszka @ 2015-04-16 15:29 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On 2015-04-16 17:10, Steven Rostedt wrote:
> On Thu, 16 Apr 2015 16:28:58 +0200
> Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
>> On 2015-04-16 16:26, Sebastian Andrzej Siewior wrote:
>>> On 04/16/2015 04:06 PM, Jan Kiszka wrote:
>>>> ftrace may trigger rb_wakeups while holding pi_lock which will also be
>>>> requested via trace_...->...->ring_buffer_unlock_commit->...->
>>>> irq_work_queue->raise_softirq->try_to_wake_up. This quickly causes
>>>> deadlocks when trying to use ftrace under -rt.
>>>>
>>>> Resolve this by marking the ring buffer's irq_work as HARD_IRQ.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> ---
>>>>
>>>> I'm not yet sure if this doesn't push work into hard-irq context that
>>>> is better not done there on -rt.
>>>
>>> everything should be done in the soft-irq.
>>>
>>>>
>>>> I'm also not sure if there aren't more such cases, given that -rt turns
>>>> the default irq_work wakeup policy around. But maybe we are lucky.
>>>
>>> The only thing that is getting done in the hardirq is the FULL_NO_HZ
>>> thingy. I would be _very_ glad if we could keep it that way.
> 
> tracing is special, even more so than NO_HZ_FULL, as it also traces
> that as well (and even RCU). Tracing the kernel is like a debugger.
> Ideally, it would not be part of the kernel, but just an external
> observer. Without special hardware that is not the case, so we try to
> be outside the main system as much as possible.
> 
> 
>>
>> Then - to my current understanding - we need an NMI-safe trigger for
>> soft-irq work. Is there anything like this existing already? Or can we
>> still use the IPI-based kick without actually doing the work in hard-irq
>> context?
>>
> 
> The reason why it uses irq_work() is because a simple wakeup can
> deadlock the system if called by the tracing infrastructure (as we see
> raise_softirq() does too).
> 
> But yeah, there's no real need to have the ring buffer irq work
> handler run from hardirq context. The only requirement is that you can
> not do the raise from the irq_work_queue call. If you want to have the
> hardirq work handle do the raise softirq, that's fine. Perhaps that's
> the solution? Have all irq_work_queue() always trigger the hard irq, but
> the hard irq may just raise a softirq or it will call the handler
> directly if IRQ_WORK_HARD_IRQ is set.

I'll play with that.

My patch is definitely not OK. It causes

[  380.372579] BUG: scheduling while atomic: trace-cmd/2149/0x00010004
...
[  380.372604] Call Trace:
[  380.372610]  <IRQ>  [<ffffffff81607694>] dump_stack+0x50/0x9f
[  380.372613]  [<ffffffff8160413c>] __schedule_bug+0x59/0x69
[  380.372615]  [<ffffffff8160a1d5>] __schedule+0x675/0x800
[  380.372617]  [<ffffffff8160a394>] schedule+0x34/0xa0
[  380.372619]  [<ffffffff8160bf7d>] rt_spin_lock_slowlock+0xcd/0x290
[  380.372621]  [<ffffffff8160d8b5>] rt_spin_lock+0x25/0x30
[  380.372623]  [<ffffffff8108fe39>] __wake_up+0x29/0x60
[  380.372626]  [<ffffffff81106960>] rb_wake_up_waiters+0x40/0x50
[  380.372628]  [<ffffffff8112cdbf>] irq_work_run_list+0x3f/0x60
[  380.372630]  [<ffffffff8112cdf9>] irq_work_run+0x19/0x20
[  380.372632]  [<ffffffff81008409>] smp_trace_irq_work_interrupt+0x39/0x120
[  380.372633]  [<ffffffff8160f8ef>] trace_irq_work_interrupt+0x6f/0x80
[  380.372636]  <EOI>  [<ffffffff8103d66d>] ? native_apic_msr_write+0x2d/0x30
[  380.372637]  [<ffffffff8103d53d>] x2apic_send_IPI_self+0x1d/0x20
[  380.372638]  [<ffffffff8100851e>] arch_irq_work_raise+0x2e/0x40
[  380.372639]  [<ffffffff8112d025>] irq_work_queue+0xc5/0xf0
[  380.372641]  [<ffffffff81107d8a>] ring_buffer_unlock_commit+0x14a/0x2e0
[  380.372643]  [<ffffffff8110f894>] trace_buffer_unlock_commit+0x24/0x60
[  380.372644]  [<ffffffff8111f9da>] ftrace_event_buffer_commit+0x8a/0xc0
[  380.372647]  [<ffffffff811c58de>] ftrace_raw_event_writeback_dirty_inode_template+0x8e/0xc0
[  380.372648]  [<ffffffff811c8b21>] __mark_inode_dirty+0x1d1/0x310
[  380.372650]  [<ffffffff811d0ec8>] generic_write_end+0x78/0xb0
[  380.372658]  [<ffffffffa021c42b>] ext4_da_write_end+0x10b/0x2f0 [ext4]
[  380.372661]  [<ffffffff8116335e>] ? pagefault_enable+0x1e/0x20
[  380.372662]  [<ffffffff8113c337>] generic_perform_write+0x107/0x1b0
[  380.372664]  [<ffffffff8113e49f>] __generic_file_write_iter+0x15f/0x350
[  380.372668]  [<ffffffffa0210c91>] ext4_file_write_iter+0x101/0x3d0 [ext4]
[  380.372670]  [<ffffffff8118f59b>] ? __kmalloc+0x16b/0x250
[  380.372672]  [<ffffffff811ca96e>] ? iter_file_splice_write+0x8e/0x430
[  380.372673]  [<ffffffff811ca96e>] ? iter_file_splice_write+0x8e/0x430
[  380.372674]  [<ffffffff811cab35>] iter_file_splice_write+0x255/0x430
[  380.372676]  [<ffffffff811cc474>] SyS_splice+0x214/0x760
[  380.372677]  [<ffffffff81011fe7>] ? syscall_trace_enter_phase2+0xa7/0x1e0
[  380.372679]  [<ffffffff8160e266>] tracesys_phase2+0xd4/0xd9

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks
  2015-04-16 14:57     ` Sebastian Andrzej Siewior
@ 2015-04-16 15:31       ` Jan Kiszka
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2015-04-16 15:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, RT; +Cc: Linux Kernel Mailing List, Steven Rostedt

On 2015-04-16 16:57, Sebastian Andrzej Siewior wrote:
> On 04/16/2015 04:28 PM, Jan Kiszka wrote:
>> On 2015-04-16 16:26, Sebastian Andrzej Siewior wrote:
>>> On 04/16/2015 04:06 PM, Jan Kiszka wrote:
>>>> ftrace may trigger rb_wakeups while holding pi_lock which will also be
>>>> requested via trace_...->...->ring_buffer_unlock_commit->...->
>>>> irq_work_queue->raise_softirq->try_to_wake_up. This quickly causes
>>>> deadlocks when trying to use ftrace under -rt.
>>>>
>>>> Resolve this by marking the ring buffer's irq_work as HARD_IRQ.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> ---
>>>>
>>>> I'm not yet sure if this doesn't push work into hard-irq context that
>>>> is better not done there on -rt.
>>>
>>> everything should be done in the soft-irq.
>>>
>>>>
>>>> I'm also not sure if there aren't more such cases, given that -rt turns
>>>> the default irq_work wakeup policy around. But maybe we are lucky.
>>>
>>> The only thing that is getting done in the hardirq is the FULL_NO_HZ
>>> thingy. I would be _very_ glad if we could keep it that way.
>>
>> Then - to my current understanding - we need an NMI-safe trigger for
>> soft-irq work. Is there anything like this existing already? Or can we
>> still use the IPI-based kick without actually doing the work in hard-irq
>> context?
> 
> But if you trigger it via IPI it will still run in hardirq context,
> right? Can you describe how run into this and try to think about it in
> a quiet moment. It it just enabling the function tracer and running it?

# trace-cmd record -e sched
/sys/kernel/debug/tracing/events/sched/filter
/sys/kernel/debug/tracing/events/*/sched/filter
Hit Ctrl^C to stop recording
^C

and you are dead(locked).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks
  2015-04-16 15:29       ` Jan Kiszka
@ 2015-04-16 15:33         ` Sebastian Andrzej Siewior
  2015-04-16 16:28         ` [PATCH RT 3.18] irq_work: Provide a soft-irq based queue Jan Kiszka
  1 sibling, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-04-16 15:33 UTC (permalink / raw)
  To: Jan Kiszka, Steven Rostedt; +Cc: RT, Linux Kernel Mailing List

On 04/16/2015 05:29 PM, Jan Kiszka wrote:
> My patch is definitely not OK. It causes
> 
> [  380.372579] BUG: scheduling while atomic: trace-cmd/2149/0x00010004
> ...
> [  380.372604] Call Trace:
> [  380.372610]  <IRQ>  [<ffffffff81607694>] dump_stack+0x50/0x9f
> [  380.372613]  [<ffffffff8160413c>] __schedule_bug+0x59/0x69
> [  380.372615]  [<ffffffff8160a1d5>] __schedule+0x675/0x800
> [  380.372617]  [<ffffffff8160a394>] schedule+0x34/0xa0
> [  380.372619]  [<ffffffff8160bf7d>] rt_spin_lock_slowlock+0xcd/0x290
> [  380.372621]  [<ffffffff8160d8b5>] rt_spin_lock+0x25/0x30
> [  380.372623]  [<ffffffff8108fe39>] __wake_up+0x29/0x60

right. you must not take any sleeping locks in the hardirq context :)
This works for the no-hz-work thingy. It grabs raw locks and may cancel
one hrtimer which is marked irqsafe.

Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-16 15:29       ` Jan Kiszka
  2015-04-16 15:33         ` Sebastian Andrzej Siewior
@ 2015-04-16 16:28         ` Jan Kiszka
  2015-04-20  8:03           ` Mike Galbraith
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Kiszka @ 2015-04-16 16:28 UTC (permalink / raw)
  To: Steven Rostedt, Sebastian Andrzej Siewior; +Cc: RT, Linux Kernel Mailing List

Instead of turning all irq_work requests into lazy ones on -rt, just
move their execution from hard into soft-irq context.

This resolves deadlocks of ftrace which will queue work from arbitrary
contexts, including those that have locks held that are needed for
raising a soft-irq.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Second try, looks much better so far. And it also removes my concerns
regarding other potential cases besides ftrace.

 kernel/irq_work.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 9dda38a..3f6ffcd 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -85,12 +85,9 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
 		raise_irqwork = llist_add(&work->llnode,
 					  &per_cpu(hirq_work_list, cpu));
 	else
-		raise_irqwork = llist_add(&work->llnode,
-					  &per_cpu(lazy_list, cpu));
-#else
+#endif
 		raise_irqwork = llist_add(&work->llnode,
 					  &per_cpu(raised_list, cpu));
-#endif
 
 	if (raise_irqwork)
 		arch_send_call_function_single_ipi(cpu);
@@ -114,21 +111,20 @@ bool irq_work_queue(struct irq_work *work)
 	if (work->flags & IRQ_WORK_HARD_IRQ) {
 		if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
 			arch_irq_work_raise();
-	} else {
+	} else
+#endif
+	if (work->flags & IRQ_WORK_LAZY) {
 		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
 		    tick_nohz_tick_stopped())
+#ifdef CONFIG_PREEMPT_RT_FULL
 			raise_softirq(TIMER_SOFTIRQ);
-	}
 #else
-	if (work->flags & IRQ_WORK_LAZY) {
-		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
-		    tick_nohz_tick_stopped())
 			arch_irq_work_raise();
+#endif
 	} else {
 		if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
 			arch_irq_work_raise();
 	}
-#endif
 
 	preempt_enable();
 
@@ -202,6 +198,8 @@ void irq_work_run(void)
 {
 #ifdef CONFIG_PREEMPT_RT_FULL
 	irq_work_run_list(this_cpu_ptr(&hirq_work_list));
+	if (!llist_empty(this_cpu_ptr(&raised_list)))
+		raise_softirq(TIMER_SOFTIRQ);
 #else
 	irq_work_run_list(this_cpu_ptr(&raised_list));
 	irq_work_run_list(this_cpu_ptr(&lazy_list));
@@ -211,15 +209,11 @@ EXPORT_SYMBOL_GPL(irq_work_run);
 
 void irq_work_tick(void)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-	irq_work_run_list(this_cpu_ptr(&lazy_list));
-#else
-	struct llist_head *raised = &__get_cpu_var(raised_list);
+	struct llist_head *raised = this_cpu_ptr(&raised_list);
 
 	if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
 		irq_work_run_list(raised);
-	irq_work_run_list(&__get_cpu_var(lazy_list));
-#endif
+	irq_work_run_list(this_cpu_ptr(&lazy_list));
 }
 
 /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-16 16:28         ` [PATCH RT 3.18] irq_work: Provide a soft-irq based queue Jan Kiszka
@ 2015-04-20  8:03           ` Mike Galbraith
  2015-04-23  6:11             ` Mike Galbraith
  0 siblings, 1 reply; 28+ messages in thread
From: Mike Galbraith @ 2015-04-20  8:03 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Thu, 2015-04-16 at 18:28 +0200, Jan Kiszka wrote:
> Instead of turning all irq_work requests into lazy ones on -rt, just
> move their execution from hard into soft-irq context.
> 
> This resolves deadlocks of ftrace which will queue work from 
> arbitrary
> contexts, including those that have locks held that are needed for
> raising a soft-irq.

Yup, trace-cmd record -> dead-box fully repeatable, and now fixed.

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> 
> Second try, looks much better so far. And it also removes my concerns
> regarding other potential cases besides ftrace.
> 
>  kernel/irq_work.c | 26 ++++++++++----------------
>  1 file changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 9dda38a..3f6ffcd 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -85,12 +85,9 @@ bool irq_work_queue_on(struct irq_work *work, int 
> cpu)
>               raise_irqwork = llist_add(&work->llnode,
>                                         &per_cpu(hirq_work_list, cpu));
>       else
> -             raise_irqwork = llist_add(&work->llnode,
> -                                       &per_cpu(lazy_list, cpu));
> -#else
> +#endif
>               raise_irqwork = llist_add(&work->llnode,
>                                         &per_cpu(raised_list, cpu));
> -#endif
>  
>       if (raise_irqwork)
>               arch_send_call_function_single_ipi(cpu);
> @@ -114,21 +111,20 @@ bool irq_work_queue(struct irq_work *work)
>       if (work->flags & IRQ_WORK_HARD_IRQ) {
>               if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
>                       arch_irq_work_raise();
> -     } else {
> +     } else
> +#endif
> +     if (work->flags & IRQ_WORK_LAZY) {
>               if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
>                   tick_nohz_tick_stopped())
> +#ifdef CONFIG_PREEMPT_RT_FULL
>                       raise_softirq(TIMER_SOFTIRQ);
> -     }
>  #else
> -     if (work->flags & IRQ_WORK_LAZY) {
> -             if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
> -                 tick_nohz_tick_stopped())
>                       arch_irq_work_raise();
> +#endif
>       } else {
>               if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
>                       arch_irq_work_raise();
>       }
> -#endif
>  
>       preempt_enable();
>  
> @@ -202,6 +198,8 @@ void irq_work_run(void)
>  {
>  #ifdef CONFIG_PREEMPT_RT_FULL
>       irq_work_run_list(this_cpu_ptr(&hirq_work_list));
> +     if (!llist_empty(this_cpu_ptr(&raised_list)))
> +             raise_softirq(TIMER_SOFTIRQ);
>  #else
>       irq_work_run_list(this_cpu_ptr(&raised_list));
>       irq_work_run_list(this_cpu_ptr(&lazy_list));
> @@ -211,15 +209,11 @@ EXPORT_SYMBOL_GPL(irq_work_run);
>  
>  void irq_work_tick(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -     irq_work_run_list(this_cpu_ptr(&lazy_list));
> -#else
> -     struct llist_head *raised = &__get_cpu_var(raised_list);
> +     struct llist_head *raised = this_cpu_ptr(&raised_list);
>  
>       if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
>               irq_work_run_list(raised);
> -     irq_work_run_list(&__get_cpu_var(lazy_list));
> -#endif
> +     irq_work_run_list(this_cpu_ptr(&lazy_list));
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-20  8:03           ` Mike Galbraith
@ 2015-04-23  6:11             ` Mike Galbraith
  2015-04-23  6:29               ` Jan Kiszka
  2015-04-23  6:50               ` Jan Kiszka
  0 siblings, 2 replies; 28+ messages in thread
From: Mike Galbraith @ 2015-04-23  6:11 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Mon, 2015-04-20 at 10:03 +0200, Mike Galbraith wrote:
> On Thu, 2015-04-16 at 18:28 +0200, Jan Kiszka wrote:
> > Instead of turning all irq_work requests into lazy ones on -rt, 
> > just
> > move their execution from hard into soft-irq context.
> > 
> > This resolves deadlocks of ftrace which will queue work from 
> > arbitrary
> > contexts, including those that have locks held that are needed for
> > raising a soft-irq.
> 
> Yup, trace-cmd record -> dead-box fully repeatable, and now fixed.

Except you kinda forgot to run the raised list.  The reformatted 
(which saved two whole lines;) patch below adds that to 
irq_work_tick(), which fixes the livelock both powertop and perf top 
otherwise meet.

Subject: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
Date:   Thu, 16 Apr 2015 18:28:16 +0200
From:   Jan Kiszka <jan.kiszka@siemens.com>

Instead of turning all irq_work requests into lazy ones on -rt, just
move their execution from hard into soft-irq context.

This resolves deadlocks of ftrace which will queue work from arbitrary
contexts, including those that have locks held that are needed for
raising a soft-irq.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Second try, looks much better so far. And it also removes my concerns
regarding other potential cases besides ftrace.

 kernel/irq_work.c |   84 ++++++++++++++++++++++++++----------------------------
 1 file changed, 41 insertions(+), 43 deletions(-)

--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -80,17 +80,12 @@ bool irq_work_queue_on(struct irq_work *
        if (!irq_work_claim(work))
                return false;
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-       if (work->flags & IRQ_WORK_HARD_IRQ)
+       if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && (work->flags & IRQ_WORK_HARD_IRQ))
                raise_irqwork = llist_add(&work->llnode,
                                          &per_cpu(hirq_work_list, cpu));
        else
                raise_irqwork = llist_add(&work->llnode,
-                                         &per_cpu(lazy_list, cpu));
-#else
-               raise_irqwork = llist_add(&work->llnode,
                                          &per_cpu(raised_list, cpu));
-#endif
 
        if (raise_irqwork)
                arch_send_call_function_single_ipi(cpu);
@@ -103,6 +98,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
 /* Enqueue the irq work @work on the current CPU */
 bool irq_work_queue(struct irq_work *work)
 {
+       bool realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
+       bool raise = false;
+
        /* Only queue if not already pending */
        if (!irq_work_claim(work))
                return false;
@@ -110,25 +108,22 @@ bool irq_work_queue(struct irq_work *wor
        /* Queue the entry and raise the IPI if needed. */
        preempt_disable();
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-       if (work->flags & IRQ_WORK_HARD_IRQ) {
+       if (realtime && (work->flags & IRQ_WORK_HARD_IRQ)) {
                if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
-                       arch_irq_work_raise();
-       } else {
-               if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
-                   tick_nohz_tick_stopped())
-                       raise_softirq(TIMER_SOFTIRQ);
-       }
-#else
-       if (work->flags & IRQ_WORK_LAZY) {
+                       raise = 1;
+       } else if (work->flags & IRQ_WORK_LAZY) {
                if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
-                   tick_nohz_tick_stopped())
-                       arch_irq_work_raise();
-       } else {
-               if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
-                       arch_irq_work_raise();
-       }
-#endif
+                   tick_nohz_tick_stopped()) {
+                       if (realtime)
+                               raise_softirq(TIMER_SOFTIRQ);
+                       else
+                               raise = true;
+               }
+       } else if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
+               raise = true;
+
+       if (raise)
+               arch_irq_work_raise();
 
        preempt_enable();
 
@@ -143,12 +138,13 @@ bool irq_work_needs_cpu(void)
        raised = this_cpu_ptr(&raised_list);
        lazy = this_cpu_ptr(&lazy_list);
 
-       if (llist_empty(raised))
-               if (llist_empty(lazy))
-#ifdef CONFIG_PREEMPT_RT_FULL
+       if (llist_empty(raised) && llist_empty(lazy)) {
+               if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
                        if (llist_empty(this_cpu_ptr(&hirq_work_list)))
-#endif
                                return false;
+               } else
+                       return false;
+       }
 
        /* All work should have been flushed before going offline */
        WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
@@ -162,9 +158,7 @@ static void irq_work_run_list(struct lli
        struct irq_work *work;
        struct llist_node *llnode;
 
-#ifndef CONFIG_PREEMPT_RT_FULL
-       BUG_ON(!irqs_disabled());
-#endif
+       BUG_ON(!IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !irqs_disabled());
 
        if (llist_empty(list))
                return;
@@ -200,26 +194,30 @@ static void irq_work_run_list(struct lli
  */
 void irq_work_run(void)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-       irq_work_run_list(this_cpu_ptr(&hirq_work_list));
-#else
-       irq_work_run_list(this_cpu_ptr(&raised_list));
-       irq_work_run_list(this_cpu_ptr(&lazy_list));
-#endif
+       if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
+               irq_work_run_list(this_cpu_ptr(&hirq_work_list));
+               /*
+                * NOTE: we raise softirq via IPI for safety,
+                * and execute in irq_work_tick() to move the
+                * overhead from hard to soft irq context.
+                */
+               if (!llist_empty(this_cpu_ptr(&raised_list)))
+                       raise_softirq(TIMER_SOFTIRQ);
+       } else {
+               irq_work_run_list(this_cpu_ptr(&raised_list));
+               irq_work_run_list(this_cpu_ptr(&lazy_list));
+       }
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
 
 void irq_work_tick(void)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-       irq_work_run_list(this_cpu_ptr(&lazy_list));
-#else
-       struct llist_head *raised = &__get_cpu_var(raised_list);
+       struct llist_head *raised = this_cpu_ptr(&raised_list);
 
-       if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
+       if (!llist_empty(raised) && (!arch_irq_work_has_interrupt() ||
+           IS_ENABLED(CONFIG_PREEMPT_RT_FULL)))
                irq_work_run_list(raised);
-       irq_work_run_list(&__get_cpu_var(lazy_list));
-#endif
+       irq_work_run_list(this_cpu_ptr(&lazy_list));
 }
 
 /*

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-23  6:11             ` Mike Galbraith
@ 2015-04-23  6:29               ` Jan Kiszka
  2015-04-23  6:58                 ` Mike Galbraith
  2015-04-23  6:50               ` Jan Kiszka
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Kiszka @ 2015-04-23  6:29 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On 2015-04-23 08:11, Mike Galbraith wrote:
> On Mon, 2015-04-20 at 10:03 +0200, Mike Galbraith wrote:
>> On Thu, 2015-04-16 at 18:28 +0200, Jan Kiszka wrote:
>>> Instead of turning all irq_work requests into lazy ones on -rt, 
>>> just
>>> move their execution from hard into soft-irq context.
>>>
>>> This resolves deadlocks of ftrace which will queue work from 
>>> arbitrary
>>> contexts, including those that have locks held that are needed for
>>> raising a soft-irq.
>>
>> Yup, trace-cmd record -> dead-box fully repeatable, and now fixed.
> 
> Except you kinda forgot to run the raised list.  The reformatted 
> (which saved two whole lines;) patch below adds that to 
> irq_work_tick(), which fixes the livelock both powertop and perf top 
> otherwise meet.
> 
> Subject: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
> Date:   Thu, 16 Apr 2015 18:28:16 +0200
> From:   Jan Kiszka <jan.kiszka@siemens.com>
> 
> Instead of turning all irq_work requests into lazy ones on -rt, just
> move their execution from hard into soft-irq context.
> 
> This resolves deadlocks of ftrace which will queue work from arbitrary
> contexts, including those that have locks held that are needed for
> raising a soft-irq.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> 
> Second try, looks much better so far. And it also removes my concerns
> regarding other potential cases besides ftrace.
> 
>  kernel/irq_work.c |   84 ++++++++++++++++++++++++++----------------------------
>  1 file changed, 41 insertions(+), 43 deletions(-)
> 
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -80,17 +80,12 @@ bool irq_work_queue_on(struct irq_work *
>         if (!irq_work_claim(work))
>                 return false;
>  
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -       if (work->flags & IRQ_WORK_HARD_IRQ)
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && (work->flags & IRQ_WORK_HARD_IRQ))
>                 raise_irqwork = llist_add(&work->llnode,
>                                           &per_cpu(hirq_work_list, cpu));
>         else
>                 raise_irqwork = llist_add(&work->llnode,
> -                                         &per_cpu(lazy_list, cpu));
> -#else
> -               raise_irqwork = llist_add(&work->llnode,
>                                           &per_cpu(raised_list, cpu));
> -#endif
>  
>         if (raise_irqwork)
>                 arch_send_call_function_single_ipi(cpu);
> @@ -103,6 +98,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
>  /* Enqueue the irq work @work on the current CPU */
>  bool irq_work_queue(struct irq_work *work)
>  {
> +       bool realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
> +       bool raise = false;
> +
>         /* Only queue if not already pending */
>         if (!irq_work_claim(work))
>                 return false;
> @@ -110,25 +108,22 @@ bool irq_work_queue(struct irq_work *wor
>         /* Queue the entry and raise the IPI if needed. */
>         preempt_disable();
>  
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -       if (work->flags & IRQ_WORK_HARD_IRQ) {
> +       if (realtime && (work->flags & IRQ_WORK_HARD_IRQ)) {
>                 if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
> -                       arch_irq_work_raise();
> -       } else {
> -               if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
> -                   tick_nohz_tick_stopped())
> -                       raise_softirq(TIMER_SOFTIRQ);
> -       }
> -#else
> -       if (work->flags & IRQ_WORK_LAZY) {
> +                       raise = 1;
> +       } else if (work->flags & IRQ_WORK_LAZY) {
>                 if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
> -                   tick_nohz_tick_stopped())
> -                       arch_irq_work_raise();
> -       } else {
> -               if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
> -                       arch_irq_work_raise();
> -       }
> -#endif
> +                   tick_nohz_tick_stopped()) {
> +                       if (realtime)
> +                               raise_softirq(TIMER_SOFTIRQ);
> +                       else
> +                               raise = true;
> +               }
> +       } else if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
> +               raise = true;
> +
> +       if (raise)
> +               arch_irq_work_raise();
>  
>         preempt_enable();
>  
> @@ -143,12 +138,13 @@ bool irq_work_needs_cpu(void)
>         raised = this_cpu_ptr(&raised_list);
>         lazy = this_cpu_ptr(&lazy_list);
>  
> -       if (llist_empty(raised))
> -               if (llist_empty(lazy))
> -#ifdef CONFIG_PREEMPT_RT_FULL
> +       if (llist_empty(raised) && llist_empty(lazy)) {
> +               if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
>                         if (llist_empty(this_cpu_ptr(&hirq_work_list)))
> -#endif
>                                 return false;
> +               } else
> +                       return false;
> +       }
>  
>         /* All work should have been flushed before going offline */
>         WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
> @@ -162,9 +158,7 @@ static void irq_work_run_list(struct lli
>         struct irq_work *work;
>         struct llist_node *llnode;
>  
> -#ifndef CONFIG_PREEMPT_RT_FULL
> -       BUG_ON(!irqs_disabled());
> -#endif
> +       BUG_ON(!IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !irqs_disabled());
>  
>         if (llist_empty(list))
>                 return;
> @@ -200,26 +194,30 @@ static void irq_work_run_list(struct lli
>   */
>  void irq_work_run(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -       irq_work_run_list(this_cpu_ptr(&hirq_work_list));
> -#else
> -       irq_work_run_list(this_cpu_ptr(&raised_list));
> -       irq_work_run_list(this_cpu_ptr(&lazy_list));
> -#endif
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
> +               irq_work_run_list(this_cpu_ptr(&hirq_work_list));
> +               /*
> +                * NOTE: we raise softirq via IPI for safety,
> +                * and execute in irq_work_tick() to move the
> +                * overhead from hard to soft irq context.
> +                */
> +               if (!llist_empty(this_cpu_ptr(&raised_list)))
> +                       raise_softirq(TIMER_SOFTIRQ);
> +       } else {
> +               irq_work_run_list(this_cpu_ptr(&raised_list));
> +               irq_work_run_list(this_cpu_ptr(&lazy_list));
> +       }
>  }
>  EXPORT_SYMBOL_GPL(irq_work_run);
>  
>  void irq_work_tick(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -       irq_work_run_list(this_cpu_ptr(&lazy_list));
> -#else
> -       struct llist_head *raised = &__get_cpu_var(raised_list);
> +       struct llist_head *raised = this_cpu_ptr(&raised_list);
>  
> -       if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
> +       if (!llist_empty(raised) && (!arch_irq_work_has_interrupt() ||
> +           IS_ENABLED(CONFIG_PREEMPT_RT_FULL)))

OK, that additional condition is addressing archs that don't have
irq_work support and fall back to the timer, right?

>                 irq_work_run_list(raised);
> -       irq_work_run_list(&__get_cpu_var(lazy_list));
> -#endif
> +       irq_work_run_list(this_cpu_ptr(&lazy_list));
>  }
>  
>  /*
> 

The patch is whitespace-damaged - could you resent? I'm trying to
visualize the full diff for me.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-23  6:11             ` Mike Galbraith
  2015-04-23  6:29               ` Jan Kiszka
@ 2015-04-23  6:50               ` Jan Kiszka
  2015-04-23  7:01                 ` Mike Galbraith
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Kiszka @ 2015-04-23  6:50 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On 2015-04-23 08:11, Mike Galbraith wrote:
> @@ -103,6 +98,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
>  /* Enqueue the irq work @work on the current CPU */
>  bool irq_work_queue(struct irq_work *work)
>  {
> +       bool realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
> +       bool raise = false;
> +
>         /* Only queue if not already pending */
>         if (!irq_work_claim(work))
>                 return false;
> @@ -110,25 +108,22 @@ bool irq_work_queue(struct irq_work *wor
>         /* Queue the entry and raise the IPI if needed. */
>         preempt_disable();
>  
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -       if (work->flags & IRQ_WORK_HARD_IRQ) {
> +       if (realtime && (work->flags & IRQ_WORK_HARD_IRQ)) {
>                 if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))

This boils down to

#ifdef CONFIG_X
some_type x;
#endif
...

if (IS_ENABLED(CONFIG_X) && ...)
	use(x);

And here we even have an indirection for IS_ENABLED via that local bool
variable. Is that pattern OK for Linux? Does it compile in all supported
optimization levels of all supported compilers?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-23  6:29               ` Jan Kiszka
@ 2015-04-23  6:58                 ` Mike Galbraith
  2015-04-23  7:14                   ` Jan Kiszka
  0 siblings, 1 reply; 28+ messages in thread
From: Mike Galbraith @ 2015-04-23  6:58 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]

On Thu, 2015-04-23 at 08:29 +0200, Jan Kiszka wrote:
> 
> >  void irq_work_tick(void)
> >  {
> > -#ifdef CONFIG_PREEMPT_RT_FULL
> > -       irq_work_run_list(this_cpu_ptr(&lazy_list));
> > -#else
> > -       struct llist_head *raised = &__get_cpu_var(raised_list);
> > +       struct llist_head *raised = this_cpu_ptr(&raised_list);
> >  
> > -       if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
> > +       if (!llist_empty(raised) && 
> > (!arch_irq_work_has_interrupt() ||
> > +           IS_ENABLED(CONFIG_PREEMPT_RT_FULL)))
> 
> OK, that additional condition is addressing archs that don't have
> irq_work support and fall back to the timer, right?

How will ever run if it is not run in either irq_work_run() or 
irq_work_tick()?  There are two choices, we better pick one.

Attaching patch since either evolution fscked up again (it does that), 
or someone has managed to turn it into a completely useless piece of 
crap... if so, likely the same dipstick who made it save messages such 
that you need fromdos to wipe away the shite it smears all over it.

        -Mike

[-- Attachment #2: irq_work-Provide-a-soft-irq-based-queue.patch --]
[-- Type: text/x-patch, Size: 4954 bytes --]

Subject: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
Date:	Thu, 16 Apr 2015 18:28:16 +0200
From:	Jan Kiszka <jan.kiszka@siemens.com>

Instead of turning all irq_work requests into lazy ones on -rt, just
move their execution from hard into soft-irq context.

This resolves deadlocks of ftrace which will queue work from arbitrary
contexts, including those that have locks held that are needed for
raising a soft-irq.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Second try, looks much better so far. And it also removes my concerns
regarding other potential cases besides ftrace.

 kernel/irq_work.c |   84 ++++++++++++++++++++++++++----------------------------
 1 file changed, 41 insertions(+), 43 deletions(-)

--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -80,17 +80,12 @@ bool irq_work_queue_on(struct irq_work *
 	if (!irq_work_claim(work))
 		return false;
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	if (work->flags & IRQ_WORK_HARD_IRQ)
+	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && (work->flags & IRQ_WORK_HARD_IRQ))
 		raise_irqwork = llist_add(&work->llnode,
 					  &per_cpu(hirq_work_list, cpu));
 	else
 		raise_irqwork = llist_add(&work->llnode,
-					  &per_cpu(lazy_list, cpu));
-#else
-		raise_irqwork = llist_add(&work->llnode,
 					  &per_cpu(raised_list, cpu));
-#endif
 
 	if (raise_irqwork)
 		arch_send_call_function_single_ipi(cpu);
@@ -103,6 +98,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
 /* Enqueue the irq work @work on the current CPU */
 bool irq_work_queue(struct irq_work *work)
 {
+	bool realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
+	bool raise = false;
+
 	/* Only queue if not already pending */
 	if (!irq_work_claim(work))
 		return false;
@@ -110,25 +108,22 @@ bool irq_work_queue(struct irq_work *wor
 	/* Queue the entry and raise the IPI if needed. */
 	preempt_disable();
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	if (work->flags & IRQ_WORK_HARD_IRQ) {
+	if (realtime && (work->flags & IRQ_WORK_HARD_IRQ)) {
 		if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
-			arch_irq_work_raise();
-	} else {
-		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
-		    tick_nohz_tick_stopped())
-			raise_softirq(TIMER_SOFTIRQ);
-	}
-#else
-	if (work->flags & IRQ_WORK_LAZY) {
+			raise = 1;
+	} else if (work->flags & IRQ_WORK_LAZY) {
 		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
-		    tick_nohz_tick_stopped())
-			arch_irq_work_raise();
-	} else {
-		if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
-			arch_irq_work_raise();
-	}
-#endif
+		    tick_nohz_tick_stopped()) {
+			if (realtime)
+				raise_softirq(TIMER_SOFTIRQ);
+			else
+				raise = true;
+		}
+	} else if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
+		raise = true;
+
+	if (raise)
+		arch_irq_work_raise();
 
 	preempt_enable();
 
@@ -143,12 +138,13 @@ bool irq_work_needs_cpu(void)
 	raised = this_cpu_ptr(&raised_list);
 	lazy = this_cpu_ptr(&lazy_list);
 
-	if (llist_empty(raised))
-		if (llist_empty(lazy))
-#ifdef CONFIG_PREEMPT_RT_FULL
+	if (llist_empty(raised) && llist_empty(lazy)) {
+		if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
 			if (llist_empty(this_cpu_ptr(&hirq_work_list)))
-#endif
 				return false;
+		} else
+			return false;
+	}
 
 	/* All work should have been flushed before going offline */
 	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
@@ -162,9 +158,7 @@ static void irq_work_run_list(struct lli
 	struct irq_work *work;
 	struct llist_node *llnode;
 
-#ifndef CONFIG_PREEMPT_RT_FULL
-	BUG_ON(!irqs_disabled());
-#endif
+	BUG_ON(!IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !irqs_disabled());
 
 	if (llist_empty(list))
 		return;
@@ -200,26 +194,30 @@ static void irq_work_run_list(struct lli
  */
 void irq_work_run(void)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-	irq_work_run_list(this_cpu_ptr(&hirq_work_list));
-#else
-	irq_work_run_list(this_cpu_ptr(&raised_list));
-	irq_work_run_list(this_cpu_ptr(&lazy_list));
-#endif
+	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
+		irq_work_run_list(this_cpu_ptr(&hirq_work_list));
+		/*
+		 * NOTE: we raise softirq via IPI for safety,
+		 * and execute in irq_work_tick() to move the
+		 * overhead from hard to soft irq context.
+		 */
+		if (!llist_empty(this_cpu_ptr(&raised_list)))
+			raise_softirq(TIMER_SOFTIRQ);
+	} else {
+		irq_work_run_list(this_cpu_ptr(&raised_list));
+		irq_work_run_list(this_cpu_ptr(&lazy_list));
+	}
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
 
 void irq_work_tick(void)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-	irq_work_run_list(this_cpu_ptr(&lazy_list));
-#else
-	struct llist_head *raised = &__get_cpu_var(raised_list);
+	struct llist_head *raised = this_cpu_ptr(&raised_list);
 
-	if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
+	if (!llist_empty(raised) && (!arch_irq_work_has_interrupt() ||
+	    IS_ENABLED(CONFIG_PREEMPT_RT_FULL)))
 		irq_work_run_list(raised);
-	irq_work_run_list(&__get_cpu_var(lazy_list));
-#endif
+	irq_work_run_list(this_cpu_ptr(&lazy_list));
 }
 
 /*

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-23  6:50               ` Jan Kiszka
@ 2015-04-23  7:01                 ` Mike Galbraith
  2015-04-23  7:12                     ` Jan Kiszka
  0 siblings, 1 reply; 28+ messages in thread
From: Mike Galbraith @ 2015-04-23  7:01 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Thu, 2015-04-23 at 08:50 +0200, Jan Kiszka wrote:
> On 2015-04-23 08:11, Mike Galbraith wrote:
> > @@ -103,6 +98,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
> >  /* Enqueue the irq work @work on the current CPU */
> >  bool irq_work_queue(struct irq_work *work)
> >  {
> > +       bool realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
> > +       bool raise = false;
> > +
> >         /* Only queue if not already pending */
> >         if (!irq_work_claim(work))
> >                 return false;
> > @@ -110,25 +108,22 @@ bool irq_work_queue(struct irq_work *wor
> >         /* Queue the entry and raise the IPI if needed. */
> >         preempt_disable();
> >  
> > -#ifdef CONFIG_PREEMPT_RT_FULL
> > -       if (work->flags & IRQ_WORK_HARD_IRQ) {
> > +       if (realtime && (work->flags & IRQ_WORK_HARD_IRQ)) {
> >                 if (llist_add(&work->llnode, 
> > this_cpu_ptr(&hirq_work_list)))
> 
> This boils down to
> 
> #ifdef CONFIG_X
> some_type x;
> #endif
> ...
> 
> if (IS_ENABLED(CONFIG_X) && ...)
>       use(x);
> 
> And here we even have an indirection for IS_ENABLED via that local 
> bool
> variable. Is that pattern OK for Linux? Does it compile in all 
> supported
> optimization levels of all supported compilers?

I hope it all goes away, that being what IS_ENABLED() is there for.

        -Mike

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-23  7:01                 ` Mike Galbraith
@ 2015-04-23  7:12                     ` Jan Kiszka
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2015-04-23  7:12 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On 2015-04-23 09:01, Mike Galbraith wrote:
> On Thu, 2015-04-23 at 08:50 +0200, Jan Kiszka wrote:
>> On 2015-04-23 08:11, Mike Galbraith wrote:
>>> @@ -103,6 +98,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
>>>  /* Enqueue the irq work @work on the current CPU */
>>>  bool irq_work_queue(struct irq_work *work)
>>>  {
>>> +       bool realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
>>> +       bool raise = false;
>>> +
>>>         /* Only queue if not already pending */
>>>         if (!irq_work_claim(work))
>>>                 return false;
>>> @@ -110,25 +108,22 @@ bool irq_work_queue(struct irq_work *wor
>>>         /* Queue the entry and raise the IPI if needed. */
>>>         preempt_disable();
>>>  
>>> -#ifdef CONFIG_PREEMPT_RT_FULL
>>> -       if (work->flags & IRQ_WORK_HARD_IRQ) {
>>> +       if (realtime && (work->flags & IRQ_WORK_HARD_IRQ)) {
>>>                 if (llist_add(&work->llnode, 
>>> this_cpu_ptr(&hirq_work_list)))
>>
>> This boils down to
>>
>> #ifdef CONFIG_X
>> some_type x;
>> #endif
>> ...
>>
>> if (IS_ENABLED(CONFIG_X) && ...)
>>       use(x);
>>
>> And here we even have an indirection for IS_ENABLED via that local 
>> bool
>> variable. Is that pattern OK for Linux? Does it compile in all 
>> supported
>> optimization levels of all supported compilers?
> 
> I hope it all goes away, that being what IS_ENABLED() is there for.

Hope is good - but not enough here: it breaks the build under
!CONFIG_X, even the case without the bool var.

  CC      kernel/irq_work.o
In file included from ../include/asm-generic/percpu.h:6:0,
                 from ../arch/x86/include/asm/percpu.h:522,
                 from ../arch/x86/include/asm/current.h:5,
                 from ../arch/x86/include/asm/processor.h:15,
                 from ../arch/x86/include/asm/irq_work.h:4,
                 from ../include/linux/irq_work.h:47,
                 from ../kernel/irq_work.c:11:
../kernel/irq_work.c: In function ‘irq_work_queue_on’:
../kernel/irq_work.c:85:17: error: ‘hirq_work_list’ undeclared (first use in this function)
        &per_cpu(hirq_work_list, cpu));

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
@ 2015-04-23  7:12                     ` Jan Kiszka
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2015-04-23  7:12 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On 2015-04-23 09:01, Mike Galbraith wrote:
> On Thu, 2015-04-23 at 08:50 +0200, Jan Kiszka wrote:
>> On 2015-04-23 08:11, Mike Galbraith wrote:
>>> @@ -103,6 +98,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
>>>  /* Enqueue the irq work @work on the current CPU */
>>>  bool irq_work_queue(struct irq_work *work)
>>>  {
>>> +       bool realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
>>> +       bool raise = false;
>>> +
>>>         /* Only queue if not already pending */
>>>         if (!irq_work_claim(work))
>>>                 return false;
>>> @@ -110,25 +108,22 @@ bool irq_work_queue(struct irq_work *wor
>>>         /* Queue the entry and raise the IPI if needed. */
>>>         preempt_disable();
>>>  
>>> -#ifdef CONFIG_PREEMPT_RT_FULL
>>> -       if (work->flags & IRQ_WORK_HARD_IRQ) {
>>> +       if (realtime && (work->flags & IRQ_WORK_HARD_IRQ)) {
>>>                 if (llist_add(&work->llnode, 
>>> this_cpu_ptr(&hirq_work_list)))
>>
>> This boils down to
>>
>> #ifdef CONFIG_X
>> some_type x;
>> #endif
>> ...
>>
>> if (IS_ENABLED(CONFIG_X) && ...)
>>       use(x);
>>
>> And here we even have an indirection for IS_ENABLED via that local 
>> bool
>> variable. Is that pattern OK for Linux? Does it compile in all 
>> supported
>> optimization levels of all supported compilers?
> 
> I hope it all goes away, that being what IS_ENABLED() is there for.

Hope is good - but not enough here: it breaks the build under
!CONFIG_X, even the case without the bool var.

  CC      kernel/irq_work.o
In file included from ../include/asm-generic/percpu.h:6:0,
                 from ../arch/x86/include/asm/percpu.h:522,
                 from ../arch/x86/include/asm/current.h:5,
                 from ../arch/x86/include/asm/processor.h:15,
                 from ../arch/x86/include/asm/irq_work.h:4,
                 from ../include/linux/irq_work.h:47,
                 from ../kernel/irq_work.c:11:
../kernel/irq_work.c: In function ‘irq_work_queue_on’:
../kernel/irq_work.c:85:17: error: ‘hirq_work_list’ undeclared (first use in this function)
        &per_cpu(hirq_work_list, cpu));

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-23  6:58                 ` Mike Galbraith
@ 2015-04-23  7:14                   ` Jan Kiszka
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2015-04-23  7:14 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On 2015-04-23 08:58, Mike Galbraith wrote:
> On Thu, 2015-04-23 at 08:29 +0200, Jan Kiszka wrote:
>>
>>>  void irq_work_tick(void)
>>>  {
>>> -#ifdef CONFIG_PREEMPT_RT_FULL
>>> -       irq_work_run_list(this_cpu_ptr(&lazy_list));
>>> -#else
>>> -       struct llist_head *raised = &__get_cpu_var(raised_list);
>>> +       struct llist_head *raised = this_cpu_ptr(&raised_list);
>>>  
>>> -       if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
>>> +       if (!llist_empty(raised) && 
>>> (!arch_irq_work_has_interrupt() ||
>>> +           IS_ENABLED(CONFIG_PREEMPT_RT_FULL)))
>>
>> OK, that additional condition is addressing archs that don't have
>> irq_work support and fall back to the timer, right?
> 
> How will ever run if it is not run in either irq_work_run() or 
> irq_work_tick()?  There are two choices, we better pick one.

Ah, now I see it. Indeed.

OK, will run through your fix and suggestions and come up with a new
version.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-23  7:12                     ` Jan Kiszka
@ 2015-04-23  7:19                       ` Mike Galbraith
  -1 siblings, 0 replies; 28+ messages in thread
From: Mike Galbraith @ 2015-04-23  7:19 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Thu, 2015-04-23 at 09:12 +0200, Jan Kiszka wrote:
> On 2015-04-23 09:01, Mike Galbraith wrote:
> > On Thu, 2015-04-23 at 08:50 +0200, Jan Kiszka wrote:
> > > On 2015-04-23 08:11, Mike Galbraith wrote:
> > > > @@ -103,6 +98,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
> > > >  /* Enqueue the irq work @work on the current CPU */
> > > >  bool irq_work_queue(struct irq_work *work)
> > > >  {
> > > > +       bool realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
> > > > +       bool raise = false;
> > > > +
> > > >         /* Only queue if not already pending */
> > > >         if (!irq_work_claim(work))
> > > >                 return false;
> > > > @@ -110,25 +108,22 @@ bool irq_work_queue(struct irq_work *wor
> > > >         /* Queue the entry and raise the IPI if needed. */
> > > >         preempt_disable();
> > > >  
> > > > -#ifdef CONFIG_PREEMPT_RT_FULL
> > > > -       if (work->flags & IRQ_WORK_HARD_IRQ) {
> > > > +       if (realtime && (work->flags & IRQ_WORK_HARD_IRQ)) {
> > > >                 if (llist_add(&work->llnode, 
> > > > this_cpu_ptr(&hirq_work_list)))
> > > 
> > > This boils down to
> > > 
> > > #ifdef CONFIG_X
> > > some_type x;
> > > #endif
> > > ...
> > > 
> > > if (IS_ENABLED(CONFIG_X) && ...)
> > >       use(x);
> > > 
> > > And here we even have an indirection for IS_ENABLED via that 
> > > local 
> > > bool
> > > variable. Is that pattern OK for Linux? Does it compile in all 
> > > supported
> > > optimization levels of all supported compilers?
> > 
> > I hope it all goes away, that being what IS_ENABLED() is there for.
> 
> Hope is good - but not enough here: it breaks the build under
> !CONFIG_X, even the case without the bool var.
> 
>   CC      kernel/irq_work.o
> In file included from ../include/asm-generic/percpu.h:6:0,
>                  from ../arch/x86/include/asm/percpu.h:522,
>                  from ../arch/x86/include/asm/current.h:5,
>                  from ../arch/x86/include/asm/processor.h:15,
>                  from ../arch/x86/include/asm/irq_work.h:4,
>                  from ../include/linux/irq_work.h:47,
>                  from ../kernel/irq_work.c:11:
> ../kernel/irq_work.c: In function ‘irq_work_queue_on’:
> ../kernel/irq_work.c:85:17: error: ‘hirq_work_list’ undeclared 
> (first use in this function)
>         &per_cpu(hirq_work_list, cpu));

Aw poo, so that's just what I _thought_ it was for.

        -Mike

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
@ 2015-04-23  7:19                       ` Mike Galbraith
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Galbraith @ 2015-04-23  7:19 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Thu, 2015-04-23 at 09:12 +0200, Jan Kiszka wrote:
> On 2015-04-23 09:01, Mike Galbraith wrote:
> > On Thu, 2015-04-23 at 08:50 +0200, Jan Kiszka wrote:
> > > On 2015-04-23 08:11, Mike Galbraith wrote:
> > > > @@ -103,6 +98,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
> > > >  /* Enqueue the irq work @work on the current CPU */
> > > >  bool irq_work_queue(struct irq_work *work)
> > > >  {
> > > > +       bool realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
> > > > +       bool raise = false;
> > > > +
> > > >         /* Only queue if not already pending */
> > > >         if (!irq_work_claim(work))
> > > >                 return false;
> > > > @@ -110,25 +108,22 @@ bool irq_work_queue(struct irq_work *wor
> > > >         /* Queue the entry and raise the IPI if needed. */
> > > >         preempt_disable();
> > > >  
> > > > -#ifdef CONFIG_PREEMPT_RT_FULL
> > > > -       if (work->flags & IRQ_WORK_HARD_IRQ) {
> > > > +       if (realtime && (work->flags & IRQ_WORK_HARD_IRQ)) {
> > > >                 if (llist_add(&work->llnode, 
> > > > this_cpu_ptr(&hirq_work_list)))
> > > 
> > > This boils down to
> > > 
> > > #ifdef CONFIG_X
> > > some_type x;
> > > #endif
> > > ...
> > > 
> > > if (IS_ENABLED(CONFIG_X) && ...)
> > >       use(x);
> > > 
> > > And here we even have an indirection for IS_ENABLED via that 
> > > local 
> > > bool
> > > variable. Is that pattern OK for Linux? Does it compile in all 
> > > supported
> > > optimization levels of all supported compilers?
> > 
> > I hope it all goes away, that being what IS_ENABLED() is there for.
> 
> Hope is good - but not enough here: it breaks the build under
> !CONFIG_X, even the case without the bool var.
> 
>   CC      kernel/irq_work.o
> In file included from ../include/asm-generic/percpu.h:6:0,
>                  from ../arch/x86/include/asm/percpu.h:522,
>                  from ../arch/x86/include/asm/current.h:5,
>                  from ../arch/x86/include/asm/processor.h:15,
>                  from ../arch/x86/include/asm/irq_work.h:4,
>                  from ../include/linux/irq_work.h:47,
>                  from ../kernel/irq_work.c:11:
> ../kernel/irq_work.c: In function ‘irq_work_queue_on’:
> ../kernel/irq_work.c:85:17: error: ‘hirq_work_list’ undeclared 
> (first use in this function)
>         &per_cpu(hirq_work_list, cpu));

Aw poo, so that's just what I _thought_ it was for.

        -Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-23  7:19                       ` Mike Galbraith
  (?)
@ 2015-04-23 21:00                       ` Steven Rostedt
  2015-04-24  6:54                         ` Mike Galbraith
  -1 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2015-04-23 21:00 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Jan Kiszka, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Thu, 23 Apr 2015 09:19:26 +0200
Mike Galbraith <umgwanakikbuti@gmail.com> wrote:

> >   CC      kernel/irq_work.o
> > In file included from ../include/asm-generic/percpu.h:6:0,
> >                  from ../arch/x86/include/asm/percpu.h:522,
> >                  from ../arch/x86/include/asm/current.h:5,
> >                  from ../arch/x86/include/asm/processor.h:15,
> >                  from ../arch/x86/include/asm/irq_work.h:4,
> >                  from ../include/linux/irq_work.h:47,
> >                  from ../kernel/irq_work.c:11:
> > ../kernel/irq_work.c: In function ‘irq_work_queue_on’:
> > ../kernel/irq_work.c:85:17: error: ‘hirq_work_list’ undeclared 
> > (first use in this function)
> >         &per_cpu(hirq_work_list, cpu));
> 
> Aw poo, so that's just what I _thought_ it was for.

It helps optimization but does nothing for undefined symbols.

That said, why don't we clean up that irq_work code and at least
declare both lists, and get rid of all the #ifdefs. I wonder if gcc is
smart enough to not allocate a static variable if it happens to be
optimized out?

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-23 21:00                       ` Steven Rostedt
@ 2015-04-24  6:54                         ` Mike Galbraith
  2015-04-24  9:00                           ` Jan Kiszka
  0 siblings, 1 reply; 28+ messages in thread
From: Mike Galbraith @ 2015-04-24  6:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jan Kiszka, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Thu, 2015-04-23 at 17:00 -0400, Steven Rostedt wrote:
> On Thu, 23 Apr 2015 09:19:26 +0200
> Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
> 
> > >   CC      kernel/irq_work.o
> > > In file included from ../include/asm-generic/percpu.h:6:0,
> > >                  from ../arch/x86/include/asm/percpu.h:522,
> > >                  from ../arch/x86/include/asm/current.h:5,
> > >                  from ../arch/x86/include/asm/processor.h:15,
> > >                  from ../arch/x86/include/asm/irq_work.h:4,
> > >                  from ../include/linux/irq_work.h:47,
> > >                  from ../kernel/irq_work.c:11:
> > > ../kernel/irq_work.c: In function ‘irq_work_queue_on’:
> > > ../kernel/irq_work.c:85:17: error: ‘hirq_work_list’ undeclared 
> > > (first use in this function)
> > >         &per_cpu(hirq_work_list, cpu));
> > 
> > Aw poo, so that's just what I _thought_ it was for.
> 
> It helps optimization but does nothing for undefined symbols.
> 
> That said, why don't we clean up that irq_work code and at least
> declare both lists, and get rid of all the #ifdefs. I wonder if gcc is
> smart enough to not allocate a static variable if it happens to be
> optimized out?

Nope, it didn't notice a thing.

This is a stab at that cleanup.  Usable as is with Jan's ok, or as
fodder for your bitmaster-9000 patch shredder, or whatever.  Box works
and it makes line count shrink...

I downgraded evolution v3.16->v3.12 to restore its ability to read it's
own fscking "Preformatted" switch, so whitespace should be fine. 

Oh, btw, if anyone (else) makes a 4.1-rt, your rt push work will want
one of those nifty hirq tags lest box make boom due to trying to do that
not only way late, but with irqs enabled which pisses sched all off.

Subject: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
Date:	Thu, 16 Apr 2015 18:28:16 +0200
From:	Jan Kiszka <jan.kiszka@siemens.com>

Instead of turning all irq_work requests into lazy ones on -rt, just
move their execution from hard into soft-irq context.

This resolves deadlocks of ftrace which will queue work from arbitrary
contexts, including those that have locks held that are needed for
raising a soft-irq.

Mike: cleanup ifdef mess and kill hirq_work_list.  We need two lists,
and already have them, merely need to select according to work type.
In -rt all work not tagged for hirq execution is queued to the lazy
list and runs via irq_work_tick().  Raising SOFTIRQ_TIMER is always
done via IPI for deadlock safety, if the work item is not a lazy work
or the tick is stopped, fire IPI immediately, otherwise let it wait.
IOW, lazy work is lazy in -rt only until someone queues immediate work.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---

Second try, looks much better so far. And it also removes my concerns
regarding other potential cases besides ftrace.

 kernel/irq_work.c |   85 ++++++++++++++++++++----------------------------------
 1 file changed, 33 insertions(+), 52 deletions(-)

--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -23,9 +23,7 @@
 
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
-#ifdef CONFIG_PREEMPT_RT_FULL
-static DEFINE_PER_CPU(struct llist_head, hirq_work_list);
-#endif
+
 /*
  * Claim the entry so that no one else will poke at it.
  */
@@ -68,7 +66,7 @@ void __weak arch_irq_work_raise(void)
  */
 bool irq_work_queue_on(struct irq_work *work, int cpu)
 {
-	bool raise_irqwork;
+	struct llist_head *list;
 
 	/* All work should have been flushed before going offline */
 	WARN_ON_ONCE(cpu_is_offline(cpu));
@@ -80,19 +78,12 @@ bool irq_work_queue_on(struct irq_work *
 	if (!irq_work_claim(work))
 		return false;
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	if (work->flags & IRQ_WORK_HARD_IRQ)
-		raise_irqwork = llist_add(&work->llnode,
-					  &per_cpu(hirq_work_list, cpu));
+	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !(work->flags & IRQ_WORK_HARD_IRQ))
+		list = &per_cpu(lazy_list, cpu);
 	else
-		raise_irqwork = llist_add(&work->llnode,
-					  &per_cpu(lazy_list, cpu));
-#else
-		raise_irqwork = llist_add(&work->llnode,
-					  &per_cpu(raised_list, cpu));
-#endif
+		list = &per_cpu(raised_list, cpu);
 
-	if (raise_irqwork)
+	if (llist_add(&work->llnode, list))
 		arch_send_call_function_single_ipi(cpu);
 
 	return true;
@@ -103,6 +94,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
 /* Enqueue the irq work @work on the current CPU */
 bool irq_work_queue(struct irq_work *work)
 {
+	struct llist_head *list;
+	bool lazy_work, realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
+
 	/* Only queue if not already pending */
 	if (!irq_work_claim(work))
 		return false;
@@ -110,25 +104,17 @@ bool irq_work_queue(struct irq_work *wor
 	/* Queue the entry and raise the IPI if needed. */
 	preempt_disable();
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	if (work->flags & IRQ_WORK_HARD_IRQ) {
-		if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
-			arch_irq_work_raise();
-	} else {
-		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
-		    tick_nohz_tick_stopped())
-			raise_softirq(TIMER_SOFTIRQ);
-	}
-#else
-	if (work->flags & IRQ_WORK_LAZY) {
-		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
-		    tick_nohz_tick_stopped())
-			arch_irq_work_raise();
-	} else {
-		if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
+	lazy_work = work->flags & IRQ_WORK_LAZY;
+
+	if (lazy_work || (realtime && !(work->flags & IRQ_WORK_HARD_IRQ)))
+		list = this_cpu_ptr(&lazy_list);
+	else
+		list = this_cpu_ptr(&raised_list);
+
+	if (llist_add(&work->llnode, list)) {
+		if (!lazy_work || tick_nohz_tick_stopped())
 			arch_irq_work_raise();
 	}
-#endif
 
 	preempt_enable();
 
@@ -143,12 +129,8 @@ bool irq_work_needs_cpu(void)
 	raised = this_cpu_ptr(&raised_list);
 	lazy = this_cpu_ptr(&lazy_list);
 
-	if (llist_empty(raised))
-		if (llist_empty(lazy))
-#ifdef CONFIG_PREEMPT_RT_FULL
-			if (llist_empty(this_cpu_ptr(&hirq_work_list)))
-#endif
-				return false;
+	if (llist_empty(raised) && llist_empty(lazy))
+		return false;
 
 	/* All work should have been flushed before going offline */
 	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
@@ -162,9 +144,7 @@ static void irq_work_run_list(struct lli
 	struct irq_work *work;
 	struct llist_node *llnode;
 
-#ifndef CONFIG_PREEMPT_RT_FULL
-	BUG_ON(!irqs_disabled());
-#endif
+	BUG_ON(!IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !irqs_disabled());
 
 	if (llist_empty(list))
 		return;
@@ -200,26 +180,27 @@ static void irq_work_run_list(struct lli
  */
 void irq_work_run(void)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-	irq_work_run_list(this_cpu_ptr(&hirq_work_list));
-#else
 	irq_work_run_list(this_cpu_ptr(&raised_list));
-	irq_work_run_list(this_cpu_ptr(&lazy_list));
-#endif
+	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
+		/*
+		 * NOTE: we raise softirq via IPI for safety,
+		 * and execute in irq_work_tick() to move the
+		 * overhead from hard to soft irq context.
+		 */
+		if (!llist_empty(this_cpu_ptr(&lazy_list)))
+			raise_softirq(TIMER_SOFTIRQ);
+	} else
+		irq_work_run_list(this_cpu_ptr(&lazy_list));
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
 
 void irq_work_tick(void)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-	irq_work_run_list(this_cpu_ptr(&lazy_list));
-#else
-	struct llist_head *raised = &__get_cpu_var(raised_list);
+	struct llist_head *raised = this_cpu_ptr(&raised_list);
 
 	if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
 		irq_work_run_list(raised);
-	irq_work_run_list(&__get_cpu_var(lazy_list));
-#endif
+	irq_work_run_list(this_cpu_ptr(&lazy_list));
 }
 
 /*



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-24  6:54                         ` Mike Galbraith
@ 2015-04-24  9:00                           ` Jan Kiszka
  2015-04-24  9:59                             ` Mike Galbraith
  2015-04-25  7:20                             ` Mike Galbraith
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Kiszka @ 2015-04-24  9:00 UTC (permalink / raw)
  To: Mike Galbraith, Steven Rostedt
  Cc: Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On 2015-04-24 08:54, Mike Galbraith wrote:
> On Thu, 2015-04-23 at 17:00 -0400, Steven Rostedt wrote:
>> On Thu, 23 Apr 2015 09:19:26 +0200
>> Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
>>
>>>>   CC      kernel/irq_work.o
>>>> In file included from ../include/asm-generic/percpu.h:6:0,
>>>>                  from ../arch/x86/include/asm/percpu.h:522,
>>>>                  from ../arch/x86/include/asm/current.h:5,
>>>>                  from ../arch/x86/include/asm/processor.h:15,
>>>>                  from ../arch/x86/include/asm/irq_work.h:4,
>>>>                  from ../include/linux/irq_work.h:47,
>>>>                  from ../kernel/irq_work.c:11:
>>>> ../kernel/irq_work.c: In function ‘irq_work_queue_on’:
>>>> ../kernel/irq_work.c:85:17: error: ‘hirq_work_list’ undeclared 
>>>> (first use in this function)
>>>>         &per_cpu(hirq_work_list, cpu));
>>>
>>> Aw poo, so that's just what I _thought_ it was for.
>>
>> It helps optimization but does nothing for undefined symbols.
>>
>> That said, why don't we clean up that irq_work code and at least
>> declare both lists, and get rid of all the #ifdefs. I wonder if gcc is
>> smart enough to not allocate a static variable if it happens to be
>> optimized out?
> 
> Nope, it didn't notice a thing.
> 
> This is a stab at that cleanup.  Usable as is with Jan's ok, or as
> fodder for your bitmaster-9000 patch shredder, or whatever.  Box works
> and it makes line count shrink...
> 
> I downgraded evolution v3.16->v3.12 to restore its ability to read it's
> own fscking "Preformatted" switch, so whitespace should be fine. 
> 
> Oh, btw, if anyone (else) makes a 4.1-rt, your rt push work will want
> one of those nifty hirq tags lest box make boom due to trying to do that
> not only way late, but with irqs enabled which pisses sched all off.
> 
> Subject: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
> Date:	Thu, 16 Apr 2015 18:28:16 +0200
> From:	Jan Kiszka <jan.kiszka@siemens.com>
> 
> Instead of turning all irq_work requests into lazy ones on -rt, just
> move their execution from hard into soft-irq context.
> 
> This resolves deadlocks of ftrace which will queue work from arbitrary
> contexts, including those that have locks held that are needed for
> raising a soft-irq.
> 
> Mike: cleanup ifdef mess and kill hirq_work_list.  We need two lists,
> and already have them, merely need to select according to work type.
> In -rt all work not tagged for hirq execution is queued to the lazy
> list and runs via irq_work_tick().  Raising SOFTIRQ_TIMER is always
> done via IPI for deadlock safety, if the work item is not a lazy work
> or the tick is stopped, fire IPI immediately, otherwise let it wait.
> IOW, lazy work is lazy in -rt only until someone queues immediate work.

The approach looks good to me, but the commit log deserves a rework now.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-24  9:00                           ` Jan Kiszka
@ 2015-04-24  9:59                             ` Mike Galbraith
  2015-04-25  7:20                             ` Mike Galbraith
  1 sibling, 0 replies; 28+ messages in thread
From: Mike Galbraith @ 2015-04-24  9:59 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Fri, 2015-04-24 at 11:00 +0200, Jan Kiszka wrote:

> The approach looks good to me, but the commit log deserves a rework now.

Yeah.  While you're at it, you should change my chop to an ack or a
tested-by too, as it's your patch, I just rearranged a bit.

	-Mike



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-24  9:00                           ` Jan Kiszka
  2015-04-24  9:59                             ` Mike Galbraith
@ 2015-04-25  7:20                             ` Mike Galbraith
  2015-04-25  7:26                               ` Jan Kiszka
  1 sibling, 1 reply; 28+ messages in thread
From: Mike Galbraith @ 2015-04-25  7:20 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, RT, Linux Kernel Mailing List

On Fri, 2015-04-24 at 11:00 +0200, Jan Kiszka wrote:

> The approach looks good to me, but the commit log deserves a rework now.

Ok, we agree on the approach, and that the changelog wants a bit of
attention, so either you're gonna rewrite it to suit you, do a pretty
changelog, and I ack, or I take the blame for the posted form, scribble
something that I hope is a better log, and you ack.  Either will work.

Here's my changelog+blame-taking, if you're ok with it, ack, and we can
call it a day, otherwise onward to plan B.



irq_work: Delegate non-immediate irq work to ksoftirqd

Based on a patch from Jan Kiszka.

Jan reported that ftrace queueing work from arbitrary contexts can
and does lead to deadlock.  trace-cmd -e sched:* deadlocked in fact.

Resolve the problem by delegating all non-immediate work to ksoftirqd.

We need two lists to do this, one for hard irq, one for soft, so we
can use the two existing lists, eliminating the -rt specific list and
all of the ifdefery while we're at it.  

Strategy: Queue work tagged for hirq invocation to the raised_list,
invoke via IPI as usual.  If a work item being queued to lazy_list,
which becomes our all others list, is not a lazy work item, or the
tick is stopped, fire an IPI to raise SOFTIRQ_TIMER immediately,
otherwise let ksofirqd find it when the tick comes along.  Raising
SOFTIRQ_TIMER via IPI even when queueing local ensures delegation.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>

---
 kernel/irq_work.c |   85 ++++++++++++++++++++----------------------------------
 1 file changed, 33 insertions(+), 52 deletions(-)

--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -23,9 +23,7 @@
 
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
-#ifdef CONFIG_PREEMPT_RT_FULL
-static DEFINE_PER_CPU(struct llist_head, hirq_work_list);
-#endif
+
 /*
  * Claim the entry so that no one else will poke at it.
  */
@@ -68,7 +66,7 @@ void __weak arch_irq_work_raise(void)
  */
 bool irq_work_queue_on(struct irq_work *work, int cpu)
 {
-	bool raise_irqwork;
+	struct llist_head *list;
 
 	/* All work should have been flushed before going offline */
 	WARN_ON_ONCE(cpu_is_offline(cpu));
@@ -80,19 +78,12 @@ bool irq_work_queue_on(struct irq_work *
 	if (!irq_work_claim(work))
 		return false;
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	if (work->flags & IRQ_WORK_HARD_IRQ)
-		raise_irqwork = llist_add(&work->llnode,
-					  &per_cpu(hirq_work_list, cpu));
+	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !(work->flags & IRQ_WORK_HARD_IRQ))
+		list = &per_cpu(lazy_list, cpu);
 	else
-		raise_irqwork = llist_add(&work->llnode,
-					  &per_cpu(lazy_list, cpu));
-#else
-		raise_irqwork = llist_add(&work->llnode,
-					  &per_cpu(raised_list, cpu));
-#endif
+		list = &per_cpu(raised_list, cpu);
 
-	if (raise_irqwork)
+	if (llist_add(&work->llnode, list))
 		arch_send_call_function_single_ipi(cpu);
 
 	return true;
@@ -103,6 +94,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
 /* Enqueue the irq work @work on the current CPU */
 bool irq_work_queue(struct irq_work *work)
 {
+	struct llist_head *list;
+	bool lazy_work, realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
+
 	/* Only queue if not already pending */
 	if (!irq_work_claim(work))
 		return false;
@@ -110,25 +104,17 @@ bool irq_work_queue(struct irq_work *wor
 	/* Queue the entry and raise the IPI if needed. */
 	preempt_disable();
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	if (work->flags & IRQ_WORK_HARD_IRQ) {
-		if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
-			arch_irq_work_raise();
-	} else {
-		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
-		    tick_nohz_tick_stopped())
-			raise_softirq(TIMER_SOFTIRQ);
-	}
-#else
-	if (work->flags & IRQ_WORK_LAZY) {
-		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
-		    tick_nohz_tick_stopped())
-			arch_irq_work_raise();
-	} else {
-		if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
+	lazy_work = work->flags & IRQ_WORK_LAZY;
+
+	if (lazy_work || (realtime && !(work->flags & IRQ_WORK_HARD_IRQ)))
+		list = this_cpu_ptr(&lazy_list);
+	else
+		list = this_cpu_ptr(&raised_list);
+
+	if (llist_add(&work->llnode, list)) {
+		if (!lazy_work || tick_nohz_tick_stopped())
 			arch_irq_work_raise();
 	}
-#endif
 
 	preempt_enable();
 
@@ -143,12 +129,8 @@ bool irq_work_needs_cpu(void)
 	raised = this_cpu_ptr(&raised_list);
 	lazy = this_cpu_ptr(&lazy_list);
 
-	if (llist_empty(raised))
-		if (llist_empty(lazy))
-#ifdef CONFIG_PREEMPT_RT_FULL
-			if (llist_empty(this_cpu_ptr(&hirq_work_list)))
-#endif
-				return false;
+	if (llist_empty(raised) && llist_empty(lazy))
+		return false;
 
 	/* All work should have been flushed before going offline */
 	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
@@ -162,9 +144,7 @@ static void irq_work_run_list(struct lli
 	struct irq_work *work;
 	struct llist_node *llnode;
 
-#ifndef CONFIG_PREEMPT_RT_FULL
-	BUG_ON(!irqs_disabled());
-#endif
+	BUG_ON(!IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !irqs_disabled());
 
 	if (llist_empty(list))
 		return;
@@ -200,26 +180,27 @@ static void irq_work_run_list(struct lli
  */
 void irq_work_run(void)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-	irq_work_run_list(this_cpu_ptr(&hirq_work_list));
-#else
 	irq_work_run_list(this_cpu_ptr(&raised_list));
-	irq_work_run_list(this_cpu_ptr(&lazy_list));
-#endif
+	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
+		/*
+		 * NOTE: we raise softirq via IPI for safety,
+		 * and execute in irq_work_tick() to move the
+		 * overhead from hard to soft irq context.
+		 */
+		if (!llist_empty(this_cpu_ptr(&lazy_list)))
+			raise_softirq(TIMER_SOFTIRQ);
+	} else
+		irq_work_run_list(this_cpu_ptr(&lazy_list));
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
 
 void irq_work_tick(void)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-	irq_work_run_list(this_cpu_ptr(&lazy_list));
-#else
-	struct llist_head *raised = &__get_cpu_var(raised_list);
+	struct llist_head *raised = this_cpu_ptr(&raised_list);
 
 	if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
 		irq_work_run_list(raised);
-	irq_work_run_list(&__get_cpu_var(lazy_list));
-#endif
+	irq_work_run_list(this_cpu_ptr(&lazy_list));
 }
 
 /*




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-25  7:20                             ` Mike Galbraith
@ 2015-04-25  7:26                               ` Jan Kiszka
  2015-05-18 19:52                                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kiszka @ 2015-04-25  7:26 UTC (permalink / raw)
  To: Mike Galbraith, Sebastian Andrzej Siewior
  Cc: Steven Rostedt, RT, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 6965 bytes --]

On 2015-04-25 09:20, Mike Galbraith wrote:
> On Fri, 2015-04-24 at 11:00 +0200, Jan Kiszka wrote:
> 
>> The approach looks good to me, but the commit log deserves a rework now.
> 
> Ok, we agree on the approach, and that the changelog wants a bit of
> attention, so either you're gonna rewrite it to suit you, do a pretty
> changelog, and I ack, or I take the blame for the posted form, scribble
> something that I hope is a better log, and you ack.  Either will work.
> 
> Here's my changelog+blame-taking, if you're ok with it, ack, and we can
> call it a day, otherwise onward to plan B.
> 
> 
> 
> irq_work: Delegate non-immediate irq work to ksoftirqd
> 
> Based on a patch from Jan Kiszka.
> 
> Jan reported that ftrace queueing work from arbitrary contexts can
> and does lead to deadlock.  trace-cmd -e sched:* deadlocked in fact.
> 
> Resolve the problem by delegating all non-immediate work to ksoftirqd.
> 
> We need two lists to do this, one for hard irq, one for soft, so we
> can use the two existing lists, eliminating the -rt specific list and
> all of the ifdefery while we're at it.  
> 
> Strategy: Queue work tagged for hirq invocation to the raised_list,
> invoke via IPI as usual.  If a work item being queued to lazy_list,
> which becomes our all others list, is not a lazy work item, or the
> tick is stopped, fire an IPI to raise SOFTIRQ_TIMER immediately,
> otherwise let ksofirqd find it when the tick comes along.  Raising
> SOFTIRQ_TIMER via IPI even when queueing local ensures delegation.
> 
> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> 
> ---
>  kernel/irq_work.c |   85 ++++++++++++++++++++----------------------------------
>  1 file changed, 33 insertions(+), 52 deletions(-)
> 
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -23,9 +23,7 @@
>  
>  static DEFINE_PER_CPU(struct llist_head, raised_list);
>  static DEFINE_PER_CPU(struct llist_head, lazy_list);
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -static DEFINE_PER_CPU(struct llist_head, hirq_work_list);
> -#endif
> +
>  /*
>   * Claim the entry so that no one else will poke at it.
>   */
> @@ -68,7 +66,7 @@ void __weak arch_irq_work_raise(void)
>   */
>  bool irq_work_queue_on(struct irq_work *work, int cpu)
>  {
> -	bool raise_irqwork;
> +	struct llist_head *list;
>  
>  	/* All work should have been flushed before going offline */
>  	WARN_ON_ONCE(cpu_is_offline(cpu));
> @@ -80,19 +78,12 @@ bool irq_work_queue_on(struct irq_work *
>  	if (!irq_work_claim(work))
>  		return false;
>  
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -	if (work->flags & IRQ_WORK_HARD_IRQ)
> -		raise_irqwork = llist_add(&work->llnode,
> -					  &per_cpu(hirq_work_list, cpu));
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !(work->flags & IRQ_WORK_HARD_IRQ))
> +		list = &per_cpu(lazy_list, cpu);
>  	else
> -		raise_irqwork = llist_add(&work->llnode,
> -					  &per_cpu(lazy_list, cpu));
> -#else
> -		raise_irqwork = llist_add(&work->llnode,
> -					  &per_cpu(raised_list, cpu));
> -#endif
> +		list = &per_cpu(raised_list, cpu);
>  
> -	if (raise_irqwork)
> +	if (llist_add(&work->llnode, list))
>  		arch_send_call_function_single_ipi(cpu);
>  
>  	return true;
> @@ -103,6 +94,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
>  /* Enqueue the irq work @work on the current CPU */
>  bool irq_work_queue(struct irq_work *work)
>  {
> +	struct llist_head *list;
> +	bool lazy_work, realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
> +
>  	/* Only queue if not already pending */
>  	if (!irq_work_claim(work))
>  		return false;
> @@ -110,25 +104,17 @@ bool irq_work_queue(struct irq_work *wor
>  	/* Queue the entry and raise the IPI if needed. */
>  	preempt_disable();
>  
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -	if (work->flags & IRQ_WORK_HARD_IRQ) {
> -		if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
> -			arch_irq_work_raise();
> -	} else {
> -		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
> -		    tick_nohz_tick_stopped())
> -			raise_softirq(TIMER_SOFTIRQ);
> -	}
> -#else
> -	if (work->flags & IRQ_WORK_LAZY) {
> -		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
> -		    tick_nohz_tick_stopped())
> -			arch_irq_work_raise();
> -	} else {
> -		if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
> +	lazy_work = work->flags & IRQ_WORK_LAZY;
> +
> +	if (lazy_work || (realtime && !(work->flags & IRQ_WORK_HARD_IRQ)))
> +		list = this_cpu_ptr(&lazy_list);
> +	else
> +		list = this_cpu_ptr(&raised_list);
> +
> +	if (llist_add(&work->llnode, list)) {
> +		if (!lazy_work || tick_nohz_tick_stopped())
>  			arch_irq_work_raise();
>  	}
> -#endif
>  
>  	preempt_enable();
>  
> @@ -143,12 +129,8 @@ bool irq_work_needs_cpu(void)
>  	raised = this_cpu_ptr(&raised_list);
>  	lazy = this_cpu_ptr(&lazy_list);
>  
> -	if (llist_empty(raised))
> -		if (llist_empty(lazy))
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -			if (llist_empty(this_cpu_ptr(&hirq_work_list)))
> -#endif
> -				return false;
> +	if (llist_empty(raised) && llist_empty(lazy))
> +		return false;
>  
>  	/* All work should have been flushed before going offline */
>  	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
> @@ -162,9 +144,7 @@ static void irq_work_run_list(struct lli
>  	struct irq_work *work;
>  	struct llist_node *llnode;
>  
> -#ifndef CONFIG_PREEMPT_RT_FULL
> -	BUG_ON(!irqs_disabled());
> -#endif
> +	BUG_ON(!IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !irqs_disabled());
>  
>  	if (llist_empty(list))
>  		return;
> @@ -200,26 +180,27 @@ static void irq_work_run_list(struct lli
>   */
>  void irq_work_run(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -	irq_work_run_list(this_cpu_ptr(&hirq_work_list));
> -#else
>  	irq_work_run_list(this_cpu_ptr(&raised_list));
> -	irq_work_run_list(this_cpu_ptr(&lazy_list));
> -#endif
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
> +		/*
> +		 * NOTE: we raise softirq via IPI for safety,
> +		 * and execute in irq_work_tick() to move the
> +		 * overhead from hard to soft irq context.
> +		 */
> +		if (!llist_empty(this_cpu_ptr(&lazy_list)))
> +			raise_softirq(TIMER_SOFTIRQ);
> +	} else
> +		irq_work_run_list(this_cpu_ptr(&lazy_list));
>  }
>  EXPORT_SYMBOL_GPL(irq_work_run);
>  
>  void irq_work_tick(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -	irq_work_run_list(this_cpu_ptr(&lazy_list));
> -#else
> -	struct llist_head *raised = &__get_cpu_var(raised_list);
> +	struct llist_head *raised = this_cpu_ptr(&raised_list);
>  
>  	if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
>  		irq_work_run_list(raised);
> -	irq_work_run_list(&__get_cpu_var(lazy_list));
> -#endif
> +	irq_work_run_list(this_cpu_ptr(&lazy_list));
>  }
>  
>  /*
> 
> 
> 

Acked-by: Jan Kiszka <jan.kiszka@siemens.com>

This way around makes more sense as you changed the patch significantly.

Thanks,
Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RT 3.18] irq_work: Provide a soft-irq based queue
  2015-04-25  7:26                               ` Jan Kiszka
@ 2015-05-18 19:52                                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-05-18 19:52 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Mike Galbraith, Steven Rostedt, RT, Linux Kernel Mailing List

* Jan Kiszka | 2015-04-25 09:26:13 [+0200]:

>Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
>
>This way around makes more sense as you changed the patch significantly.

Now I took the proper one.

>Thanks,
>Jan

Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-05-18 19:52 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-16 14:06 [PATCH RT 3.18] ring-buffer: Mark irq_work as HARD_IRQ to prevent deadlocks Jan Kiszka
2015-04-16 14:12 ` Steven Rostedt
2015-04-16 14:26 ` Sebastian Andrzej Siewior
2015-04-16 14:28   ` Jan Kiszka
2015-04-16 14:57     ` Sebastian Andrzej Siewior
2015-04-16 15:31       ` Jan Kiszka
2015-04-16 15:10     ` Steven Rostedt
2015-04-16 15:29       ` Jan Kiszka
2015-04-16 15:33         ` Sebastian Andrzej Siewior
2015-04-16 16:28         ` [PATCH RT 3.18] irq_work: Provide a soft-irq based queue Jan Kiszka
2015-04-20  8:03           ` Mike Galbraith
2015-04-23  6:11             ` Mike Galbraith
2015-04-23  6:29               ` Jan Kiszka
2015-04-23  6:58                 ` Mike Galbraith
2015-04-23  7:14                   ` Jan Kiszka
2015-04-23  6:50               ` Jan Kiszka
2015-04-23  7:01                 ` Mike Galbraith
2015-04-23  7:12                   ` Jan Kiszka
2015-04-23  7:12                     ` Jan Kiszka
2015-04-23  7:19                     ` Mike Galbraith
2015-04-23  7:19                       ` Mike Galbraith
2015-04-23 21:00                       ` Steven Rostedt
2015-04-24  6:54                         ` Mike Galbraith
2015-04-24  9:00                           ` Jan Kiszka
2015-04-24  9:59                             ` Mike Galbraith
2015-04-25  7:20                             ` Mike Galbraith
2015-04-25  7:26                               ` Jan Kiszka
2015-05-18 19:52                                 ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.