linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Long latencies during disk-io
@ 2019-09-03  7:23 Martin.Wirth
  2019-09-05 15:41 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Martin.Wirth @ 2019-09-03  7:23 UTC (permalink / raw)
  To: linux-rt-users; +Cc: bigeasy

Hi,

I'm operating a data acquisition system with several digitizer cards and an Adlink  cPCI-6530
 (i7-4700EQ  / Intel QM87 chipset) cpu card. The digitizer cards are hardware triggered with
an external low jitter clock at 200 Hz. I monitor the wakeup latencies and with older 
rt-kernels (4.14) I get maximum latencies of about 24 micro seconds most of the time
with extremely rare spikes of 50 micro seconds under medium other system load.

I now switched to 4.19 and 5.2 rt kernels and suddenly got latencies of 500 micro seconds
and one time I even noticed 1.5 ms, if there is concurrent SATA disk traffic going on (not
from the same DAQ-process, but unrelated file copies for example). I played a bit with different
configurations and found that the long latencies only occur if the ahci interrupt thread is on
the same processor as the interrupt thread of the digitizer card. If I separate them via the
 /proc/irq interface the latencies are gone. The real-time priority of the digitizer IRQ handler
 is set to 53 and that of the corresponding user space thread to 51. The ahci interrupt
 thread is at the standard value of 50.     

Although I now have the solution to put the ahci-handler on a different processor, I would
like to know if it is expected behavior that the ahci interrupt handler blocks other interrupt 
threads of higher priority for more than one millisecond?

Another observation which I cannot understand though, is that a concurrent 
cyclictest -m -Sp98 -i200 -h400 -n
run does not show the latencies...

Does anyone have an explanation for this? As said above this is absent on 4.14-rt and
I observed it with 4.19.59-rt24, 5.2.9-rt3 and 5.2.10-rt5.

Cheers,

Martin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Long latencies during disk-io
  2019-09-03  7:23 Long latencies during disk-io Martin.Wirth
@ 2019-09-05 15:41 ` Sebastian Andrzej Siewior
  2019-09-06 13:23   ` AW: " Martin.Wirth
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-05 15:41 UTC (permalink / raw)
  To: Martin.Wirth; +Cc: linux-rt-users

On 2019-09-03 07:23:35 [+0000], Martin.Wirth@dlr.de wrote:
> Hi,
Hi,

…
>  /proc/irq interface the latencies are gone. The real-time priority of the digitizer IRQ handler
>  is set to 53 and that of the corresponding user space thread to 51. The ahci interrupt
>  thread is at the standard value of 50.     
> 
> Although I now have the solution to put the ahci-handler on a different processor, I would
> like to know if it is expected behavior that the ahci interrupt handler blocks other interrupt 
> threads of higher priority for more than one millisecond?

Based on the CPU split via proc/irq you describe here I assume that
both devices (ahci and the digitizer cards) do not share an interrupt
line.
In that case both handlers are independent and should not affect each
other. Running on the same CPU, the AHCI handler can be interrupt by
another thread (with a higher priority) if that interrupt line was
raised while the AHCI handler was active.

So based on that, the "digitizer IRQ handler" should run first, followed
by the user space thread, followed by the AHCI interrupt.

> Another observation which I cannot understand though, is that a concurrent 
> cyclictest -m -Sp98 -i200 -h400 -n
> run does not show the latencies...

you have here "-p98" while the other two threads have priority 53 and
51. So that is a difference unless there is nothing between 53 and 51.
But if you disable the "digitizer card(s)" then with -p51 you should be
close to that setup with cyclictest.

If there are no spikes in cyclictest then there must be something
after the threaded handler runs that delays the user space task. Some
tracing should help to figure out what happens between the hardware
interrupt and the context switch to the user task.

> Cheers,
> 
> Martin

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* AW: Long latencies during disk-io
  2019-09-05 15:41 ` Sebastian Andrzej Siewior
@ 2019-09-06 13:23   ` Martin.Wirth
  2019-09-06 22:06     ` Austin Schuh
  0 siblings, 1 reply; 12+ messages in thread
From: Martin.Wirth @ 2019-09-06 13:23 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users

> Von: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de]


> So based on that, the "digitizer IRQ handler" should run first, followed
> by the user space thread, followed by the AHCI interrupt.

Yes, that's how I understood it. But that is actually not happening. The ahci interrupt handler 
seems not to be interrupted by the higher prio DAQ tasks (both interrupt and user space thread).
I added a timestamp readout to the digitizer cards irq routine and it turned out that the 
interrupt thread is already delayed, not only the user land thread...

Just to be sure, this is a ps of the rt tasks (with some unrelated irq/threads at rtprio 50 removed)

tid,  pid,  cpu, cls, rtpri, pri, nice,  wchan, task
   10    10   0  FF      1  41   - -      rcuc/0
   11    11   1  FF      1  41   - -      rcu_preempt
   12    12   0  FF      1  41   - -      rcub/0
   13    13   0  FF     99 139   - -      posixcputmr/0
   14    14   0  FF     99 139   - -      migration/0
   17    17   1  FF     99 139   - -      migration/1
   18    18   1  FF     99 139   - -      posixcputmr/1
   19    19   1  FF      1  41   - -      rcuc/1
   24    24   2  FF     99 139   - -      migration/2
   25    25   2  FF     99 139   - -      posixcputmr/2
   26    26   2  FF      1  41   - -      rcuc/2
   31    31   3  FF     99 139   - -      migration/3
   32    32   3  FF     99 139   - -      posixcputmr/3
   33    33   3  FF      1  41   - -      rcuc/3
  102   102   1  FF     50  90   - -      irq/9-acpi
  108   108   1  FF     99 139   - -      watchdogd
  120   120   1  FF     50  90   - -      irq/34-i915
  122   122   2  FF     50  90   - -      irq/35-ahci[000
  300   300   2  FF     53  93   - -      irq/21-spec_
 7027  7027   0  FF      1  41   - -      lidaqserver
 7035  7027   3  FF     51  91   - -      ADCControl0

irq/21-spec_ is the digitizer irq thread, which does a simple hardware register read over 
the pci bus and then does a wake_up. No locking involved. ADCControl0 is the user land 
task which is woken up by the interrupt thread.

cat /proc/interrupt looks like this (without the separation of irqs of course...):

           CPU0       CPU1       CPU2       CPU3
  0:          5          0          0          0   IO-APIC   2-edge      timer
  8:          1          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          5          0          0   IO-APIC   9-fasteoi   acpi
 16:         30          0          0          0   IO-APIC  16-fasteoi   ehci_hcd:usb1
 18:          0          0          0          0   IO-APIC  18-fasteoi   i801_smbus
 21:          0          0     447302          0   IO-APIC  21-fasteoi   spec_
 34:          0       7758          0          0   PCI-MSI 32768-edge      i915
 35:          0          0     254225          0   PCI-MSI 512000-edge      ahci[0000:00:1f.2]
 36:          0          0          0       4211   PCI-MSI 409600-edge      eth5
70:          0          0    2206737          0   PCI-MSI 327680-edge      xhci_hcd
 71:          0          0          0     109935   PCI-MSI 2097152-edge      xhci_hcd
 72:          0          0          0          0   PCI-MSI 2097153-edge      xhci_hcd
 73:          0          0          0          0   PCI-MSI 2097154-edge      xhci_hcd
 74:          0          0          0          0   PCI-MSI 2097155-edge      xhci_hcd
 75:          0          0          0          0   PCI-MSI 2097156-edge      xhci_hcd
NMI:          0          0          0          0   Non-maskable interrupts
LOC:    1190870    1000137     935198     947995   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0   Performance monitoring interrupts
IWI:         93         66         34          4   IRQ work interrupts
RTR:          3          0          0          0   APIC ICR read retries
RES:     160318     125789     106359     110046   Rescheduling interrupts
CAL:       5801       5217       4330       4275   Function call interrupts
TLB:       1610        415        244        335   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:         28         29         29         29   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0   Posted-interrupt wakeup event  

(I removed a bunch of interrupts from the on board Ethernet controllers)

Interestingly disk-io to an usb-disk handled by the xhci_hcd at irq  70 does not cause latencies, 
although it is handled also by CPU2, like the ahci and the digitizer (spec_).
So I think it is some specific to the ahci!

> > Another observation which I cannot understand though, is that a
> concurrent
> > cyclictest -m -Sp98 -i200 -h400 -n
> > run does not show the latencies...
> 
> you have here "-p98" while the other two threads have priority 53 and
> 51. So that is a difference unless there is nothing between 53 and 51.
> But if you disable the "digitizer card(s)" then with -p51 you should be
> close to that setup with cyclictest.
> 
> If there are no spikes in cyclictest then there must be something
> after the threaded handler runs that delays the user space task. Some
> tracing should help to figure out what happens between the hardware
> interrupt and the context switch to the user task.

If I run cyclictest at prio 51 there are no latencies. Only if I go to 50 I immediately 
see the 1 - 1.5 ms on CPU2 if I start a file copy  -- which is of course as expected.

For the tracing, I at the moment have no idea how to set it up. I only did some 
tracing which was triggered by cyclictest. But that won't help here...

Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Long latencies during disk-io
  2019-09-06 13:23   ` AW: " Martin.Wirth
@ 2019-09-06 22:06     ` Austin Schuh
  2019-09-10 12:04       ` AW: " Martin.Wirth
  0 siblings, 1 reply; 12+ messages in thread
From: Austin Schuh @ 2019-09-06 22:06 UTC (permalink / raw)
  To: Martin.Wirth; +Cc: bigeasy, linux-rt-users

On Fri, Sep 6, 2019 at 11:48 AM <Martin.Wirth@dlr.de> wrote:
> For the tracing, I at the moment have no idea how to set it up. I only did some
> tracing which was triggered by cyclictest. But that won't help here...

Modify your code to open up /sys/kernel/debug/tracing/tracing_on and
write 0 to it when you detect a high latency event.  I'd suggest
opening up the file when the code starts and writing when the event
happens, but opening and writing all at once would be fine.  I've very
successfully used that to catch events in the past.

Austin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* AW: Long latencies during disk-io
  2019-09-06 22:06     ` Austin Schuh
@ 2019-09-10 12:04       ` Martin.Wirth
  2019-09-13 13:36         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Martin.Wirth @ 2019-09-10 12:04 UTC (permalink / raw)
  To: linux-rt-users; +Cc: bigeasy, austin.linux

Thanks to Austin's suggestion I was able to get a race which looks suspicious.
# tracer: function
#
# entries-in-buffer/entries-written: 39893/219671100   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _----=> need-resched
#                            || / _---=> hardirq/softirq
#                            ||| / _--=> preempt-depth
#                            ||||/     delay
#           TASK-PID   CPU#  |||||   TIMESTAMP  FUNCTION
#              | |       |   |||||      |         |
irq/35-ahci[000-121   [002] ....113 28719.098051: preempt_count_sub <-percpu_counter_add_batch
 irq/35-ahci[000-121   [002] .....13 28719.098051: preempt_count_add <-percpu_counter_add_batch
 irq/35-ahci[000-121   [002] ....113 28719.098052: preempt_count_sub <-percpu_counter_add_batch
 irq/35-ahci[000-121   [002] .....13 28719.098052: preempt_count_add <-percpu_counter_add_batch
 irq/35-ahci[000-121   [002] ....113 28719.098052: _raw_spin_lock_irqsave <-percpu_counter_add_batch
 irq/35-ahci[000-121   [002] d...113 28719.098052: preempt_count_add <-_raw_spin_lock_irqsave
 irq/35-ahci[000-121   [002] d...213 28719.098052: _raw_spin_unlock_irqrestore <-percpu_counter_add_batch
 irq/35-ahci[000-121   [002] ....213 28719.098052: preempt_count_sub <-_raw_spin_unlock_irqrestore
 irq/35-ahci[000-121   [002] ....113 28719.098052: preempt_count_sub <-percpu_counter_add_batch
 irq/35-ahci[000-121   [002] .....13 28719.098052: preempt_count_add <-percpu_counter_add_batch
 irq/35-ahci[000-121   [002] d..h313 28719.098053: handle_irq <-do_IRQ
 irq/35-ahci[000-121   [002] d..h313 28719.098053: handle_fasteoi_irq <-handle_irq
 irq/35-ahci[000-121   [002] d..h313 28719.098053: _raw_spin_lock <-handle_fasteoi_irq
 irq/35-ahci[000-121   [002] d..h313 28719.098053: preempt_count_add <-_raw_spin_lock
 irq/35-ahci[000-121   [002] d..h413 28719.098054: mask_irq.part.36 <-handle_fasteoi_irq
 irq/35-ahci[000-121   [002] d..h413 28719.098054: mask_ioapic_irq <-mask_irq.part.36
 irq/35-ahci[000-121   [002] d..h413 28719.098054: _raw_spin_lock_irqsave <-mask_ioapic_irq
 irq/35-ahci[000-121   [002] d..h413 28719.098054: preempt_count_add <-_raw_spin_lock_irqsave
 irq/35-ahci[000-121   [002] d..h513 28719.098054: io_apic_modify_irq <-mask_ioapic_irq
 irq/35-ahci[000-121   [002] d..h513 28719.098055: io_apic_sync <-io_apic_modify_irq
 irq/35-ahci[000-121   [002] d..h513 28719.098056: _raw_spin_unlock_irqrestore <-mask_irq.part.36
 irq/35-ahci[000-121   [002] d..h513 28719.098056: preempt_count_sub <-_raw_spin_unlock_irqrestore
 irq/35-ahci[000-121   [002] d..h413 28719.098057: handle_irq_event <-handle_fasteoi_irq
 irq/35-ahci[000-121   [002] d..h413 28719.098057: preempt_count_sub <-handle_irq_event
 irq/35-ahci[000-121   [002] d..h313 28719.098057: handle_irq_event_percpu <-handle_irq_event
 irq/35-ahci[000-121   [002] d..h313 28719.098057: __handle_irq_event_percpu <-handle_irq_event_percpu
 irq/35-ahci[000-121   [002] d..h313 28719.098057: irq_default_primary_handler <-__handle_irq_event_percpu
 irq/35-ahci[000-121   [002] d..h313 28719.098057: __irq_wake_thread <-__handle_irq_event_percpu
 irq/35-ahci[000-121   [002] d..h313 28719.098057: wake_up_process <-__handle_irq_event_percpu
 irq/35-ahci[000-121   [002] d..h313 28719.098058: try_to_wake_up <-__handle_irq_event_percpu
 irq/35-ahci[000-121   [002] d..h313 28719.098058: _raw_spin_lock_irqsave <-try_to_wake_up
 irq/35-ahci[000-121   [002] d..h313 28719.098058: preempt_count_add <-_raw_spin_lock_irqsave
 irq/35-ahci[000-121   [002] d..h413 28719.098058: _raw_spin_lock <-try_to_wake_up
 irq/35-ahci[000-121   [002] d..h413 28719.098058: preempt_count_add <-_raw_spin_lock
 irq/35-ahci[000-121   [002] d..h513 28719.098058: update_rq_clock <-try_to_wake_up
 irq/35-ahci[000-121   [002] d..h513 28719.098058: update_irq_load_avg <-update_rq_clock
 irq/35-ahci[000-121   [002] d..h513 28719.098058: __accumulate_pelt_segments <-update_irq_load_avg
 irq/35-ahci[000-121   [002] d..h513 28719.098059: ttwu_do_activate <-try_to_wake_up
 irq/35-ahci[000-121   [002] d..h513 28719.098059: activate_task <-ttwu_do_activate
 irq/35-ahci[000-121   [002] d..h513 28719.098059: enqueue_task_rt <-activate_task
 irq/35-ahci[000-121   [002] d..h513 28719.098059: dequeue_rt_stack <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d..h513 28719.098059: dequeue_top_rt_rq <-dequeue_rt_stack
 irq/35-ahci[000-121   [002] d..h513 28719.098059: cpupri_set <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d..h513 28719.098060: update_rt_migration <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d..h513 28719.098060: _raw_spin_lock <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d..h513 28719.098060: preempt_count_add <-_raw_spin_lock
 irq/35-ahci[000-121   [002] d..h613 28719.098060: preempt_count_sub <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d..h513 28719.098060: enqueue_top_rt_rq <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d..h513 28719.098060: intel_pstate_update_util <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d..h513 28719.098060: ttwu_do_wakeup <-try_to_wake_up
 irq/35-ahci[000-121   [002] d..h513 28719.098060: check_preempt_curr <-ttwu_do_wakeup
 irq/35-ahci[000-121   [002] d..h513 28719.098061: check_preempt_curr_rt <-check_preempt_curr
 irq/35-ahci[000-121   [002] d..h513 28719.098061: resched_curr <-check_preempt_curr
 irq/35-ahci[000-121   [002] dN.h513 28719.098061: task_woken_rt <-ttwu_do_wakeup
 irq/35-ahci[000-121   [002] dN.h413 28719.098061: _raw_spin_unlock_irqrestore <-try_to_wake_up
 irq/35-ahci[000-121   [002] dN.h413 28719.098061: preempt_count_sub <-_raw_spin_unlock_irqrestore
 irq/35-ahci[000-121   [002] dN.h313 28719.098061: note_interrupt <-handle_irq_event_percpu
 irq/35-ahci[000-121   [002] dN.h313 28719.098062: _raw_spin_lock <-handle_irq_event
 irq/35-ahci[000-121   [002] dN.h313 28719.098062: preempt_count_add <-_raw_spin_lock
 irq/35-ahci[000-121   [002] dN.h413 28719.098062: ioapic_ack_level <-handle_fasteoi_irq
 irq/35-ahci[000-121   [002] dN.h413 28719.098062: irqd_cfg <-ioapic_ack_level
 irq/35-ahci[000-121   [002] dN.h413 28719.098062: apic_chip_data.part.17 <-ioapic_ack_level
 irq/35-ahci[000-121   [002] dN.h413 28719.098062: irq_complete_move <-ioapic_ack_level
 irq/35-ahci[000-121   [002] dN.h413 28719.098062: __irq_complete_move <-ioapic_ack_level
 irq/35-ahci[000-121   [002] dN.h413 28719.098062: preempt_count_sub <-handle_fasteoi_irq
 irq/35-ahci[000-121   [002] dN.h313 28719.098063: irq_exit <-do_IRQ
 irq/35-ahci[000-121   [002] dN.h313 28719.098063: irqtime_account_irq <-irq_exit
 irq/35-ahci[000-121   [002] dN.h313 28719.098063: preempt_count_sub <-irq_exit
 irq/35-ahci[000-121   [002] .N..113 28719.098063: preempt_count_sub <-percpu_counter_add_batch
 irq/35-ahci[000-121   [002] dN..113 28719.098064: rcu_note_context_switch <-__schedule
 irq/35-ahci[000-121   [002] dN..113 28719.098064: rcu_preempt_deferred_qs <-rcu_note_context_switch
 irq/35-ahci[000-121   [002] dN..113 28719.098064: _raw_spin_lock <-__schedule
 irq/35-ahci[000-121   [002] dN..113 28719.098064: preempt_count_add <-_raw_spin_lock
 irq/35-ahci[000-121   [002] dN..213 28719.098064: update_rq_clock <-__schedule
 irq/35-ahci[000-121   [002] dN..213 28719.098064: pick_next_task_stop <-__schedule
 irq/35-ahci[000-121   [002] dN..213 28719.098064: pick_next_task_dl <-__schedule
 irq/35-ahci[000-121   [002] dN..213 28719.098065: pick_next_task_rt <-__schedule
 irq/35-ahci[000-121   [002] dN..213 28719.098065: update_curr_rt <-pick_next_task_rt
 irq/35-ahci[000-121   [002] dN..213 28719.098065: __rcu_read_lock <-update_curr_rt
 irq/35-ahci[000-121   [002] dN..213 28719.098065: __rcu_read_unlock <-update_curr_rt
 irq/35-ahci[000-121   [002] dN..213 28719.098065: _raw_spin_lock <-update_curr_rt
 irq/35-ahci[000-121   [002] dN..213 28719.098065: preempt_count_add <-_raw_spin_lock
 irq/35-ahci[000-121   [002] dN..313 28719.098065: preempt_count_sub <-update_curr_rt
 irq/35-ahci[000-121   [002] dN..213 28719.098065: put_prev_task_rt <-pick_next_task_rt
 irq/35-ahci[000-121   [002] dN..213 28719.098065: update_curr_rt <-put_prev_task_rt
 irq/35-ahci[000-121   [002] dN..213 28719.098066: update_rt_rq_load_avg <-put_prev_task_rt
 irq/35-ahci[000-121   [002] d...213 28719.098066: enter_lazy_tlb <-__schedule
    irq/21-spec_-292   [002] d...2.. 28719.098067: finish_task_switch <-__schedule
    irq/21-spec_-292   [002] d...2.. 28719.098067: _raw_spin_unlock_irq <-finish_task_switch
    irq/21-spec_-292   [002] ....2.. 28719.098067: preempt_count_sub <-_raw_spin_unlock_irq
    irq/21-spec_-292   [002] ....... 28719.098067: kthread_should_stop <-irq_thread
    irq/21-spec_-292   [002] ....... 28719.098068: irq_thread_check_affinity <-irq_thread
    irq/21-spec_-292   [002] ....... 28719.098068: irq_forced_thread_fn <-irq_thread
    irq/21-spec_-292   [002] ....... 28719.098068: __local_bh_disable_ip <-irq_forced_thread_fn
    irq/21-spec_-292   [002] ....... 28719.098068: migrate_disable <-__local_bh_disable_ip
    irq/21-spec_-292   [002] ....11. 28719.098068: pin_current_cpu <-migrate_disable
    irq/21-spec_-292   [002] ....11. 28719.098068: __read_rt_trylock <-pin_current_cpu
    irq/21-spec_-292   [002] ....11. 28719.098068: task_rq_lock <-migrate_disable
    irq/21-spec_-292   [002] ....11. 28719.098069: _raw_spin_lock_irqsave <-task_rq_lock
    irq/21-spec_-292   [002] d...11. 28719.098069: preempt_count_add <-_raw_spin_lock_irqsave
    irq/21-spec_-292   [002] d...21. 28719.098069: _raw_spin_lock <-task_rq_lock
    irq/21-spec_-292   [002] d...21. 28719.098069: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...21. 28719.098069: _raw_spin_unlock_irqrestore <-migrate_disable
    irq/21-spec_-292   [002] ....21. 28719.098069: preempt_count_sub <-_raw_spin_unlock_irqrestore
    irq/21-spec_-292   [002] .....11 28719.098069: rt_spin_lock <-__local_bh_disable_ip
    irq/21-spec_-292   [002] .....11 28719.098069: migrate_disable <-rt_spin_lock
    irq/21-spec_-292   [002] .....12 28719.098070: rt_spin_lock_slowlock <-__local_bh_disable_ip
    irq/21-spec_-292   [002] .....12 28719.098070: _raw_spin_lock_irqsave <-rt_spin_lock_slowlock
    irq/21-spec_-292   [002] d....12 28719.098070: preempt_count_add <-_raw_spin_lock_irqsave
    irq/21-spec_-292   [002] d...112 28719.098070: rt_spin_lock_slowlock_locked <-rt_spin_lock_slowlock
    irq/21-spec_-292   [002] d...112 28719.098070: __try_to_take_rt_mutex <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.098070: _raw_spin_lock <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.098071: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...212 28719.098071: preempt_count_sub <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.098071: task_blocks_on_rt_mutex <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.098071: _raw_spin_lock <-task_blocks_on_rt_mutex
    irq/21-spec_-292   [002] d...112 28719.098071: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...212 28719.098071: rt_mutex_enqueue <-task_blocks_on_rt_mutex
    irq/21-spec_-292   [002] d...212 28719.098071: preempt_count_sub <-task_blocks_on_rt_mutex
    irq/21-spec_-292   [002] d...112 28719.098071: _raw_spin_lock <-task_blocks_on_rt_mutex
    irq/21-spec_-292   [002] d...112 28719.098072: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...212 28719.098072: rt_mutex_enqueue_pi <-task_blocks_on_rt_mutex
    irq/21-spec_-292   [002] d...212 28719.098072: rt_mutex_setprio <-task_blocks_on_rt_mutex
    irq/21-spec_-292   [002] d...212 28719.098072: __task_rq_lock <-rt_mutex_setprio
    irq/21-spec_-292   [002] d...212 28719.098072: _raw_spin_lock <-__task_rq_lock
    irq/21-spec_-292   [002] d...212 28719.098072: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...312 28719.098072: update_rq_clock <-rt_mutex_setprio
    irq/21-spec_-292   [002] d...312 28719.098072: update_irq_load_avg <-update_rq_clock
    irq/21-spec_-292   [002] d...312 28719.098073: dequeue_task_rt <-rt_mutex_setprio
    irq/21-spec_-292   [002] d...312 28719.098073: update_curr_rt <-dequeue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098073: __rcu_read_lock <-update_curr_rt
    irq/21-spec_-292   [002] d...312 28719.098073: __rcu_read_unlock <-update_curr_rt
    irq/21-spec_-292   [002] d...312 28719.098073: _raw_spin_lock <-update_curr_rt
    irq/21-spec_-292   [002] d...312 28719.098073: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...412 28719.098073: preempt_count_sub <-update_curr_rt
    irq/21-spec_-292   [002] d...312 28719.098074: dequeue_rt_stack <-dequeue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098074: dequeue_top_rt_rq <-dequeue_rt_stack
    irq/21-spec_-292   [002] d...312 28719.098074: update_rt_migration <-dequeue_rt_stack
    irq/21-spec_-292   [002] d...312 28719.098074: enqueue_top_rt_rq <-dequeue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098074: intel_pstate_update_util <-dequeue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098074: enqueue_task_rt <-rt_mutex_setprio
    irq/21-spec_-292   [002] d...312 28719.098074: dequeue_rt_stack <-enqueue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098074: dequeue_top_rt_rq <-dequeue_rt_stack
    irq/21-spec_-292   [002] d...312 28719.098075: update_rt_migration <-enqueue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098075: _raw_spin_lock <-enqueue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098075: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...412 28719.098075: preempt_count_sub <-enqueue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098075: enqueue_top_rt_rq <-enqueue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098075: intel_pstate_update_util <-enqueue_task_rt
    irq/21-spec_-292   [002] d...312 28719.098076: prio_changed_rt <-rt_mutex_setprio
    irq/21-spec_-292   [002] d...212 28719.098076: preempt_count_sub <-task_blocks_on_rt_mutex
    irq/21-spec_-292   [002] d...112 28719.098076: __try_to_take_rt_mutex <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.098076: _raw_spin_unlock_irqrestore <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] ....112 28719.098076: preempt_count_sub <-_raw_spin_unlock_irqrestore
    irq/21-spec_-292   [002] .....12 28719.098076: __rcu_read_lock <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] .....12 28719.098077: __rcu_read_unlock <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] .....12 28719.098077: schedule <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.098077: rcu_note_context_switch <-__schedule
    irq/21-spec_-292   [002] d...112 28719.098077: rcu_preempt_deferred_qs <-rcu_note_context_switch
    irq/21-spec_-292   [002] d...112 28719.098077: _raw_spin_lock <-__schedule
    irq/21-spec_-292   [002] d...112 28719.098077: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...212 28719.098077: update_rq_clock <-__schedule
    irq/21-spec_-292   [002] d...212 28719.098077: deactivate_task <-__schedule
    irq/21-spec_-292   [002] d...212 28719.098077: dequeue_task_rt <-__schedule
    irq/21-spec_-292   [002] d...212 28719.098078: update_curr_rt <-dequeue_task_rt
    irq/21-spec_-292   [002] d...212 28719.098078: __rcu_read_lock <-update_curr_rt
    irq/21-spec_-292   [002] d...212 28719.098078: __rcu_read_unlock <-update_curr_rt
    irq/21-spec_-292   [002] d...212 28719.098078: _raw_spin_lock <-update_curr_rt
    irq/21-spec_-292   [002] d...212 28719.098078: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...312 28719.098078: preempt_count_sub <-update_curr_rt
    irq/21-spec_-292   [002] d...212 28719.098078: dequeue_rt_stack <-dequeue_task_rt
    irq/21-spec_-292   [002] d...212 28719.098078: dequeue_top_rt_rq <-dequeue_rt_stack
    irq/21-spec_-292   [002] d...212 28719.098078: update_rt_migration <-dequeue_rt_stack
    irq/21-spec_-292   [002] d...212 28719.098079: enqueue_top_rt_rq <-dequeue_task_rt
    irq/21-spec_-292   [002] d...212 28719.098079: intel_pstate_update_util <-dequeue_task_rt
    irq/21-spec_-292   [002] d...212 28719.098079: pick_next_task_stop <-__schedule
    irq/21-spec_-292   [002] d...212 28719.098079: pick_next_task_dl <-__schedule
    irq/21-spec_-292   [002] d...212 28719.098079: pick_next_task_rt <-__schedule
    irq/21-spec_-292   [002] d...212 28719.098079: update_curr_rt <-pick_next_task_rt
    irq/21-spec_-292   [002] d...212 28719.098079: put_prev_task_rt <-pick_next_task_rt
    irq/21-spec_-292   [002] d...212 28719.098079: update_curr_rt <-put_prev_task_rt
    irq/21-spec_-292   [002] d...212 28719.098079: update_rt_rq_load_avg <-put_prev_task_rt
    irq/21-spec_-292   [002] d...212 28719.098080: enter_lazy_tlb <-__schedule
 irq/35-ahci[000-121   [002] d...213 28719.098080: finish_task_switch <-__schedule
 irq/35-ahci[000-121   [002] d...213 28719.098080: _raw_spin_unlock_irq <-finish_task_switch
 irq/35-ahci[000-121   [002] ....213 28719.098080: preempt_count_sub <-_raw_spin_unlock_irq
 irq/35-ahci[000-121   [002] .....13 28719.098080: rt_spin_unlock <-test_clear_page_writeback
 irq/35-ahci[000-121   [002] .....13 28719.098080: migrate_enable <-rt_spin_unlock
 irq/35-ahci[000-121   [002] .....12 28719.098081: mod_node_page_state <-test_clear_page_writeback
 irq/35-ahci[000-121   [002] .....12 28719.098081: dec_zone_page_state <-test_clear_page_writeback
 irq/35-ahci[000-121   [002] .....12 28719.098081: inc_node_page_state <-test_clear_page_writeback
 irq/35-ahci[000-121   [002] .....12 28719.098081: rt_spin_lock <-ext4_finish_bio
 irq/35-ahci[000-121   [002] .....12 28719.098081: migrate_disable <-rt_spin_lock
 irq/35-ahci[000-121   [002] .....13 28719.098081: rt_spin_unlock <-ext4_finish_bio
 irq/35-ahci[000-121   [002] .....13 28719.098082: migrate_enable <-rt_spin_unlock
 irq/35-ahci[000-121   [002] .....12 28719.098082: end_page_writeback <-ext4_finish_bio
 irq/35-ahci[000-121   [002] .....12 28719.098082: test_clear_page_writeback <-end_page_writeback
 irq/35-ahci[000-121   [002] .....12 28719.098082: page_mapping <-test_clear_page_writeback
 irq/35-ahci[000-121   [002] .....12 28719.098082: rt_spin_lock <-test_clear_page_writeback
 irq/35-ahci[000-121   [002] .....12 28719.098082: migrate_disable <-rt_spin_lock


With my limited knowledge of kernel internals I interpret it this way:
First the ahci interrupt thread is running. Then the IRQ from the digitizer card (spec_) comes in and the 
low level handler runs. But it does not switch to the threaded part of the spec_ handler but continues 
with the ahci interrupt thread - despite the higher rt priority of the spec_ handler (52 instead of 50).
The ahci continues for about 2 ms until this happens:

irq/35-ahci[000-121   [002] ....112 28719.100176: __wake_up_q <-rt_mutex_postunlock
 irq/35-ahci[000-121   [002] ....112 28719.100176: __wake_up_q <-rt_mutex_postunlock
 irq/35-ahci[000-121   [002] ....112 28719.100176: try_to_wake_up <-__wake_up_q
 irq/35-ahci[000-121   [002] ....112 28719.100176: _raw_spin_lock_irqsave <-try_to_wake_up
 irq/35-ahci[000-121   [002] d...112 28719.100176: preempt_count_add <-_raw_spin_lock_irqsave
 irq/35-ahci[000-121   [002] d...212 28719.100176: _raw_spin_lock <-try_to_wake_up
 irq/35-ahci[000-121   [002] d...212 28719.100176: preempt_count_add <-_raw_spin_lock
 irq/35-ahci[000-121   [002] d...312 28719.100177: update_rq_clock <-try_to_wake_up
 irq/35-ahci[000-121   [002] d...312 28719.100177: update_irq_load_avg <-update_rq_clock
 irq/35-ahci[000-121   [002] d...312 28719.100177: ttwu_do_activate <-try_to_wake_up
 irq/35-ahci[000-121   [002] d...312 28719.100177: activate_task <-ttwu_do_activate
 irq/35-ahci[000-121   [002] d...312 28719.100177: enqueue_task_rt <-activate_task
 irq/35-ahci[000-121   [002] d...312 28719.100177: dequeue_rt_stack <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d...312 28719.100177: dequeue_top_rt_rq <-dequeue_rt_stack
 irq/35-ahci[000-121   [002] d...312 28719.100178: cpupri_set <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d...312 28719.100178: update_rt_migration <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d...312 28719.100178: _raw_spin_lock <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d...312 28719.100178: preempt_count_add <-_raw_spin_lock
 irq/35-ahci[000-121   [002] d...412 28719.100178: preempt_count_sub <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d...312 28719.100178: enqueue_top_rt_rq <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d...312 28719.100178: intel_pstate_update_util <-enqueue_task_rt
 irq/35-ahci[000-121   [002] d...312 28719.100178: ttwu_do_wakeup <-try_to_wake_up
 irq/35-ahci[000-121   [002] d...312 28719.100178: check_preempt_curr <-ttwu_do_wakeup
 irq/35-ahci[000-121   [002] d...312 28719.100179: check_preempt_curr_rt <-check_preempt_curr
 irq/35-ahci[000-121   [002] d...312 28719.100179: resched_curr <-check_preempt_curr
 irq/35-ahci[000-121   [002] dN..312 28719.100179: task_woken_rt <-ttwu_do_wakeup
 irq/35-ahci[000-121   [002] dN..212 28719.100179: _raw_spin_unlock_irqrestore <-try_to_wake_up
 irq/35-ahci[000-121   [002] .N..212 28719.100179: preempt_count_sub <-_raw_spin_unlock_irqrestore
 irq/35-ahci[000-121   [002] .N..112 28719.100179: preempt_count_sub <-rt_mutex_postunlock
 irq/35-ahci[000-121   [002] dN..112 28719.100179: rcu_note_context_switch <-__schedule
 irq/35-ahci[000-121   [002] dN..112 28719.100180: rcu_preempt_deferred_qs <-rcu_note_context_switch
 irq/35-ahci[000-121   [002] dN..112 28719.100180: _raw_spin_lock <-__schedule
 irq/35-ahci[000-121   [002] dN..112 28719.100180: preempt_count_add <-_raw_spin_lock
 irq/35-ahci[000-121   [002] dN..212 28719.100180: update_rq_clock <-__schedule
 irq/35-ahci[000-121   [002] dN..212 28719.100180: pick_next_task_stop <-__schedule
 irq/35-ahci[000-121   [002] dN..212 28719.100180: pick_next_task_dl <-__schedule
 irq/35-ahci[000-121   [002] dN..212 28719.100180: pick_next_task_rt <-__schedule
 irq/35-ahci[000-121   [002] dN..212 28719.100180: update_curr_rt <-pick_next_task_rt
 irq/35-ahci[000-121   [002] dN..212 28719.100180: put_prev_task_rt <-pick_next_task_rt
 irq/35-ahci[000-121   [002] dN..212 28719.100181: update_curr_rt <-put_prev_task_rt
 irq/35-ahci[000-121   [002] dN..212 28719.100181: update_rt_rq_load_avg <-put_prev_task_rt
 irq/35-ahci[000-121   [002] d...212 28719.100181: enter_lazy_tlb <-__schedule
    irq/21-spec_-292   [002] d...212 28719.100181: finish_task_switch <-__schedule
    irq/21-spec_-292   [002] d...212 28719.100182: _raw_spin_unlock_irq <-finish_task_switch
    irq/21-spec_-292   [002] ....212 28719.100182: preempt_count_sub <-_raw_spin_unlock_irq
    irq/21-spec_-292   [002] .....12 28719.100182: _raw_spin_lock_irqsave <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d....12 28719.100182: preempt_count_add <-_raw_spin_lock_irqsave
    irq/21-spec_-292   [002] d...112 28719.100182: _raw_spin_lock <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.100182: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...212 28719.100182: preempt_count_sub <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.100183: __try_to_take_rt_mutex <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.100183: _raw_spin_lock <-__try_to_take_rt_mutex
    irq/21-spec_-292   [002] d...112 28719.100183: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...212 28719.100183: preempt_count_sub <-__try_to_take_rt_mutex
    irq/21-spec_-292   [002] d...112 28719.100183: _raw_spin_lock <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.100183: preempt_count_add <-_raw_spin_lock
    irq/21-spec_-292   [002] d...212 28719.100183: preempt_count_sub <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.100183: fixup_rt_mutex_waiters.part.6 <-rt_spin_lock_slowlock_locked
    irq/21-spec_-292   [002] d...112 28719.100184: _raw_spin_unlock_irqrestore <-rt_spin_lock_slowlock
    irq/21-spec_-292   [002] ....112 28719.100184: preempt_count_sub <-_raw_spin_unlock_irqrestore
    irq/21-spec_-292   [002] .....12 28719.100184: vInterruptService <-irq_forced_thread_fn
    irq/21-spec_-292   [002] .....12 28719.100185: ktime_get_real_ts64 <-vInterruptService
    irq/21-spec_-292   [002] .....12 28719.100189: __wake_up <-vInterruptService
    irq/21-spec_-292   [002] .....12 28719.100189: __wake_up_common_lock <-vInterruptService
.....

Now suddenly the spec handler is running again and this time the driver supplied IRQ-routine (vInterruptService) 
is called, but with the 2 ms of delay. 

Unfortunately I'm not able to interpret the flags correctly. Especially since the ones described in the header 
do not match the number of fields in the output...

If I cut away too much to interpret this please let me know. The whole trace is about 3 MB.

Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Long latencies during disk-io
  2019-09-10 12:04       ` AW: " Martin.Wirth
@ 2019-09-13 13:36         ` Sebastian Andrzej Siewior
  2019-09-13 14:09           ` AW: " Martin.Wirth
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-13 13:36 UTC (permalink / raw)
  To: Martin.Wirth; +Cc: linux-rt-users, austin.linux

On 2019-09-10 12:04:45 [+0000], Martin.Wirth@dlr.de wrote:
> Thanks to Austin's suggestion I was able to get a race which looks suspicious.
> # tracer: function
> #
> # entries-in-buffer/entries-written: 39893/219671100   #P:4
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _----=> need-resched
> #                            || / _---=> hardirq/softirq
> #                            ||| / _--=> preempt-depth
> #                            ||||/     delay
> #           TASK-PID   CPU#  |||||   TIMESTAMP  FUNCTION
> #              | |       |   |||||      |         |
> irq/35-ahci[000-121   [002] ....113 28719.098051: preempt_count_sub <-percpu_counter_add_batch
>  irq/35-ahci[000-121   [002] .....13 28719.098051: preempt_count_add <-percpu_counter_add_batch
>  irq/35-ahci[000-121   [002] ....113 28719.098052: preempt_count_sub <-percpu_counter_add_batch
>  irq/35-ahci[000-121   [002] .....13 28719.098052: preempt_count_add <-percpu_counter_add_batch
>  irq/35-ahci[000-121   [002] ....113 28719.098052: _raw_spin_lock_irqsave <-percpu_counter_add_batch
>  irq/35-ahci[000-121   [002] d...113 28719.098052: preempt_count_add <-_raw_spin_lock_irqsave
>  irq/35-ahci[000-121   [002] d...213 28719.098052: _raw_spin_unlock_irqrestore <-percpu_counter_add_batch
>  irq/35-ahci[000-121   [002] ....213 28719.098052: preempt_count_sub <-_raw_spin_unlock_irqrestore
>  irq/35-ahci[000-121   [002] ....113 28719.098052: preempt_count_sub <-percpu_counter_add_batch
>  irq/35-ahci[000-121   [002] .....13 28719.098052: preempt_count_add <-percpu_counter_add_batch

so until here AHCI threaded handler runs. Then the hard-IRQ triggers…

>  irq/35-ahci[000-121   [002] d..h313 28719.098053: handle_irq <-do_IRQ
>  irq/35-ahci[000-121   [002] d..h313 28719.098053: handle_fasteoi_irq <-handle_irq
>  irq/35-ahci[000-121   [002] d..h313 28719.098053: _raw_spin_lock <-handle_fasteoi_irq
>  irq/35-ahci[000-121   [002] d..h313 28719.098053: preempt_count_add <-_raw_spin_lock
>  irq/35-ahci[000-121   [002] d..h413 28719.098054: mask_irq.part.36 <-handle_fasteoi_irq
>  irq/35-ahci[000-121   [002] d..h413 28719.098054: mask_ioapic_irq <-mask_irq.part.36
>  irq/35-ahci[000-121   [002] d..h413 28719.098054: _raw_spin_lock_irqsave <-mask_ioapic_irq
>  irq/35-ahci[000-121   [002] d..h413 28719.098054: preempt_count_add <-_raw_spin_lock_irqsave
>  irq/35-ahci[000-121   [002] d..h513 28719.098054: io_apic_modify_irq <-mask_ioapic_irq
>  irq/35-ahci[000-121   [002] d..h513 28719.098055: io_apic_sync <-io_apic_modify_irq
>  irq/35-ahci[000-121   [002] d..h513 28719.098056: _raw_spin_unlock_irqrestore <-mask_irq.part.36
>  irq/35-ahci[000-121   [002] d..h513 28719.098056: preempt_count_sub <-_raw_spin_unlock_irqrestore
>  irq/35-ahci[000-121   [002] d..h413 28719.098057: handle_irq_event <-handle_fasteoi_irq
>  irq/35-ahci[000-121   [002] d..h413 28719.098057: preempt_count_sub <-handle_irq_event
>  irq/35-ahci[000-121   [002] d..h313 28719.098057: handle_irq_event_percpu <-handle_irq_event
>  irq/35-ahci[000-121   [002] d..h313 28719.098057: __handle_irq_event_percpu <-handle_irq_event_percpu
>  irq/35-ahci[000-121   [002] d..h313 28719.098057: irq_default_primary_handler <-__handle_irq_event_percpu

The primary handler does a wake of the threaded handler which is still
ongoing…

>  irq/35-ahci[000-121   [002] d..h313 28719.098057: __irq_wake_thread <-__handle_irq_event_percpu
>  irq/35-ahci[000-121   [002] d..h313 28719.098057: wake_up_process <-__handle_irq_event_percpu
>  irq/35-ahci[000-121   [002] d..h313 28719.098058: try_to_wake_up <-__handle_irq_event_percpu
>  irq/35-ahci[000-121   [002] d..h313 28719.098058: _raw_spin_lock_irqsave <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d..h313 28719.098058: preempt_count_add <-_raw_spin_lock_irqsave
>  irq/35-ahci[000-121   [002] d..h413 28719.098058: _raw_spin_lock <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d..h413 28719.098058: preempt_count_add <-_raw_spin_lock
>  irq/35-ahci[000-121   [002] d..h513 28719.098058: update_rq_clock <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d..h513 28719.098058: update_irq_load_avg <-update_rq_clock
>  irq/35-ahci[000-121   [002] d..h513 28719.098058: __accumulate_pelt_segments <-update_irq_load_avg
>  irq/35-ahci[000-121   [002] d..h513 28719.098059: ttwu_do_activate <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d..h513 28719.098059: activate_task <-ttwu_do_activate
>  irq/35-ahci[000-121   [002] d..h513 28719.098059: enqueue_task_rt <-activate_task
>  irq/35-ahci[000-121   [002] d..h513 28719.098059: dequeue_rt_stack <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d..h513 28719.098059: dequeue_top_rt_rq <-dequeue_rt_stack
>  irq/35-ahci[000-121   [002] d..h513 28719.098059: cpupri_set <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d..h513 28719.098060: update_rt_migration <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d..h513 28719.098060: _raw_spin_lock <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d..h513 28719.098060: preempt_count_add <-_raw_spin_lock
>  irq/35-ahci[000-121   [002] d..h613 28719.098060: preempt_count_sub <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d..h513 28719.098060: enqueue_top_rt_rq <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d..h513 28719.098060: intel_pstate_update_util <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d..h513 28719.098060: ttwu_do_wakeup <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d..h513 28719.098060: check_preempt_curr <-ttwu_do_wakeup
>  irq/35-ahci[000-121   [002] d..h513 28719.098061: check_preempt_curr_rt <-check_preempt_curr
>  irq/35-ahci[000-121   [002] d..h513 28719.098061: resched_curr <-check_preempt_curr
>  irq/35-ahci[000-121   [002] dN.h513 28719.098061: task_woken_rt <-ttwu_do_wakeup
The scheduler decided that another needs to run on this CPU. This is
denoted by the capital N (as in need-resched) here.

>  irq/35-ahci[000-121   [002] dN.h413 28719.098061: _raw_spin_unlock_irqrestore <-try_to_wake_up
>  irq/35-ahci[000-121   [002] dN.h413 28719.098061: preempt_count_sub <-_raw_spin_unlock_irqrestore
>  irq/35-ahci[000-121   [002] dN.h313 28719.098061: note_interrupt <-handle_irq_event_percpu
>  irq/35-ahci[000-121   [002] dN.h313 28719.098062: _raw_spin_lock <-handle_irq_event
>  irq/35-ahci[000-121   [002] dN.h313 28719.098062: preempt_count_add <-_raw_spin_lock
>  irq/35-ahci[000-121   [002] dN.h413 28719.098062: ioapic_ack_level <-handle_fasteoi_irq
>  irq/35-ahci[000-121   [002] dN.h413 28719.098062: irqd_cfg <-ioapic_ack_level
>  irq/35-ahci[000-121   [002] dN.h413 28719.098062: apic_chip_data.part.17 <-ioapic_ack_level
>  irq/35-ahci[000-121   [002] dN.h413 28719.098062: irq_complete_move <-ioapic_ack_level
>  irq/35-ahci[000-121   [002] dN.h413 28719.098062: __irq_complete_move <-ioapic_ack_level
>  irq/35-ahci[000-121   [002] dN.h413 28719.098062: preempt_count_sub <-handle_fasteoi_irq
>  irq/35-ahci[000-121   [002] dN.h313 28719.098063: irq_exit <-do_IRQ
>  irq/35-ahci[000-121   [002] dN.h313 28719.098063: irqtime_account_irq <-irq_exit
>  irq/35-ahci[000-121   [002] dN.h313 28719.098063: preempt_count_sub <-irq_exit
>  irq/35-ahci[000-121   [002] .N..113 28719.098063: preempt_count_sub <-percpu_counter_add_batch

The CPU is back from hard-IRQ and can't switch right away to the other
task due to the preempt-disable section. But this happens just one line
later

>  irq/35-ahci[000-121   [002] dN..113 28719.098064: rcu_note_context_switch <-__schedule
>  irq/35-ahci[000-121   [002] dN..113 28719.098064: rcu_preempt_deferred_qs <-rcu_note_context_switch
>  irq/35-ahci[000-121   [002] dN..113 28719.098064: _raw_spin_lock <-__schedule
>  irq/35-ahci[000-121   [002] dN..113 28719.098064: preempt_count_add <-_raw_spin_lock
>  irq/35-ahci[000-121   [002] dN..213 28719.098064: update_rq_clock <-__schedule
>  irq/35-ahci[000-121   [002] dN..213 28719.098064: pick_next_task_stop <-__schedule
>  irq/35-ahci[000-121   [002] dN..213 28719.098064: pick_next_task_dl <-__schedule
>  irq/35-ahci[000-121   [002] dN..213 28719.098065: pick_next_task_rt <-__schedule
>  irq/35-ahci[000-121   [002] dN..213 28719.098065: update_curr_rt <-pick_next_task_rt
>  irq/35-ahci[000-121   [002] dN..213 28719.098065: __rcu_read_lock <-update_curr_rt
>  irq/35-ahci[000-121   [002] dN..213 28719.098065: __rcu_read_unlock <-update_curr_rt
>  irq/35-ahci[000-121   [002] dN..213 28719.098065: _raw_spin_lock <-update_curr_rt
>  irq/35-ahci[000-121   [002] dN..213 28719.098065: preempt_count_add <-_raw_spin_lock
>  irq/35-ahci[000-121   [002] dN..313 28719.098065: preempt_count_sub <-update_curr_rt
>  irq/35-ahci[000-121   [002] dN..213 28719.098065: put_prev_task_rt <-pick_next_task_rt
>  irq/35-ahci[000-121   [002] dN..213 28719.098065: update_curr_rt <-put_prev_task_rt
>  irq/35-ahci[000-121   [002] dN..213 28719.098066: update_rt_rq_load_avg <-put_prev_task_rt
>  irq/35-ahci[000-121   [002] d...213 28719.098066: enter_lazy_tlb <-__schedule
>     irq/21-spec_-292   [002] d...2.. 28719.098067: finish_task_switch <-__schedule
>     irq/21-spec_-292   [002] d...2.. 28719.098067: _raw_spin_unlock_irq <-finish_task_switch
>     irq/21-spec_-292   [002] ....2.. 28719.098067: preempt_count_sub <-_raw_spin_unlock_irq
>     irq/21-spec_-292   [002] ....... 28719.098067: kthread_should_stop <-irq_thread
>     irq/21-spec_-292   [002] ....... 28719.098068: irq_thread_check_affinity <-irq_thread
>     irq/21-spec_-292   [002] ....... 28719.098068: irq_forced_thread_fn <-irq_thread
>     irq/21-spec_-292   [002] ....... 28719.098068: __local_bh_disable_ip <-irq_forced_thread_fn

before the threaded function of your handler is invoked, it does
local_bh_disable(). This is required because the hard-irq would not be
interrupted by the softirq so "this" must be preserved for the handler
in force-threaded mode.
However, the lock is held by the ahci thread so what follows is PI-boost
and a switch to the owner of the lock…

>     irq/21-spec_-292   [002] ....... 28719.098068: migrate_disable <-__local_bh_disable_ip
>     irq/21-spec_-292   [002] ....11. 28719.098068: pin_current_cpu <-migrate_disable
>     irq/21-spec_-292   [002] ....11. 28719.098068: __read_rt_trylock <-pin_current_cpu
>     irq/21-spec_-292   [002] ....11. 28719.098068: task_rq_lock <-migrate_disable
>     irq/21-spec_-292   [002] ....11. 28719.098069: _raw_spin_lock_irqsave <-task_rq_lock
>     irq/21-spec_-292   [002] d...11. 28719.098069: preempt_count_add <-_raw_spin_lock_irqsave
>     irq/21-spec_-292   [002] d...21. 28719.098069: _raw_spin_lock <-task_rq_lock
>     irq/21-spec_-292   [002] d...21. 28719.098069: preempt_count_add <-_raw_spin_lock
>     irq/21-spec_-292   [002] d...21. 28719.098069: _raw_spin_unlock_irqrestore <-migrate_disable
>     irq/21-spec_-292   [002] ....21. 28719.098069: preempt_count_sub <-_raw_spin_unlock_irqrestore
>     irq/21-spec_-292   [002] .....11 28719.098069: rt_spin_lock <-__local_bh_disable_ip
>     irq/21-spec_-292   [002] .....11 28719.098069: migrate_disable <-rt_spin_lock
>     irq/21-spec_-292   [002] .....12 28719.098070: rt_spin_lock_slowlock <-__local_bh_disable_ip
>  irq/35-ahci[000-121   [002] .....13 28719.098080: rt_spin_unlock <-test_clear_page_writeback
>  irq/35-ahci[000-121   [002] .....13 28719.098080: migrate_enable <-rt_spin_unlock
>  irq/35-ahci[000-121   [002] .....12 28719.098081: mod_node_page_state <-test_clear_page_writeback
>  irq/35-ahci[000-121   [002] .....12 28719.098081: dec_zone_page_state <-test_clear_page_writeback
>  irq/35-ahci[000-121   [002] .....12 28719.098081: inc_node_page_state <-test_clear_page_writeback
>  irq/35-ahci[000-121   [002] .....12 28719.098081: rt_spin_lock <-ext4_finish_bio
>  irq/35-ahci[000-121   [002] .....12 28719.098081: migrate_disable <-rt_spin_lock
>  irq/35-ahci[000-121   [002] .....13 28719.098081: rt_spin_unlock <-ext4_finish_bio
>  irq/35-ahci[000-121   [002] .....13 28719.098082: migrate_enable <-rt_spin_unlock
>  irq/35-ahci[000-121   [002] .....12 28719.098082: end_page_writeback <-ext4_finish_bio
>  irq/35-ahci[000-121   [002] .....12 28719.098082: test_clear_page_writeback <-end_page_writeback
>  irq/35-ahci[000-121   [002] .....12 28719.098082: page_mapping <-test_clear_page_writeback
>  irq/35-ahci[000-121   [002] .....12 28719.098082: rt_spin_lock <-test_clear_page_writeback
>  irq/35-ahci[000-121   [002] .....12 28719.098082: migrate_disable <-rt_spin_lock
> irq/35-ahci[000-121   [002] ....112 28719.100176: __wake_up_q <-rt_mutex_postunlock

so slightly before this, local_bh_enable() is invoked which drops the
lock in question. This leads to a de-boost and the lock is obtained by
the irq-21 threaded handler.

>  irq/35-ahci[000-121   [002] ....112 28719.100176: __wake_up_q <-rt_mutex_postunlock
>  irq/35-ahci[000-121   [002] ....112 28719.100176: try_to_wake_up <-__wake_up_q
>  irq/35-ahci[000-121   [002] ....112 28719.100176: _raw_spin_lock_irqsave <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d...112 28719.100176: preempt_count_add <-_raw_spin_lock_irqsave
>  irq/35-ahci[000-121   [002] d...212 28719.100176: _raw_spin_lock <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d...212 28719.100176: preempt_count_add <-_raw_spin_lock
>  irq/35-ahci[000-121   [002] d...312 28719.100177: update_rq_clock <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d...312 28719.100177: update_irq_load_avg <-update_rq_clock
>  irq/35-ahci[000-121   [002] d...312 28719.100177: ttwu_do_activate <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d...312 28719.100177: activate_task <-ttwu_do_activate
>  irq/35-ahci[000-121   [002] d...312 28719.100177: enqueue_task_rt <-activate_task
>  irq/35-ahci[000-121   [002] d...312 28719.100177: dequeue_rt_stack <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d...312 28719.100177: dequeue_top_rt_rq <-dequeue_rt_stack
>  irq/35-ahci[000-121   [002] d...312 28719.100178: cpupri_set <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d...312 28719.100178: update_rt_migration <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d...312 28719.100178: _raw_spin_lock <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d...312 28719.100178: preempt_count_add <-_raw_spin_lock
>  irq/35-ahci[000-121   [002] d...412 28719.100178: preempt_count_sub <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d...312 28719.100178: enqueue_top_rt_rq <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d...312 28719.100178: intel_pstate_update_util <-enqueue_task_rt
>  irq/35-ahci[000-121   [002] d...312 28719.100178: ttwu_do_wakeup <-try_to_wake_up
>  irq/35-ahci[000-121   [002] d...312 28719.100178: check_preempt_curr <-ttwu_do_wakeup
>  irq/35-ahci[000-121   [002] d...312 28719.100179: check_preempt_curr_rt <-check_preempt_curr
>  irq/35-ahci[000-121   [002] d...312 28719.100179: resched_curr <-check_preempt_curr
>  irq/35-ahci[000-121   [002] dN..312 28719.100179: task_woken_rt <-ttwu_do_wakeup
>  irq/35-ahci[000-121   [002] dN..212 28719.100179: _raw_spin_unlock_irqrestore <-try_to_wake_up
>  irq/35-ahci[000-121   [002] .N..212 28719.100179: preempt_count_sub <-_raw_spin_unlock_irqrestore
>  irq/35-ahci[000-121   [002] .N..112 28719.100179: preempt_count_sub <-rt_mutex_postunlock
>  irq/35-ahci[000-121   [002] dN..112 28719.100179: rcu_note_context_switch <-__schedule
>  irq/35-ahci[000-121   [002] dN..112 28719.100180: rcu_preempt_deferred_qs <-rcu_note_context_switch
>  irq/35-ahci[000-121   [002] dN..112 28719.100180: _raw_spin_lock <-__schedule
>  irq/35-ahci[000-121   [002] dN..112 28719.100180: preempt_count_add <-_raw_spin_lock
>  irq/35-ahci[000-121   [002] dN..212 28719.100180: update_rq_clock <-__schedule
>  irq/35-ahci[000-121   [002] dN..212 28719.100180: pick_next_task_stop <-__schedule
>  irq/35-ahci[000-121   [002] dN..212 28719.100180: pick_next_task_dl <-__schedule
>  irq/35-ahci[000-121   [002] dN..212 28719.100180: pick_next_task_rt <-__schedule
>  irq/35-ahci[000-121   [002] dN..212 28719.100180: update_curr_rt <-pick_next_task_rt
>  irq/35-ahci[000-121   [002] dN..212 28719.100180: put_prev_task_rt <-pick_next_task_rt
>  irq/35-ahci[000-121   [002] dN..212 28719.100181: update_curr_rt <-put_prev_task_rt
>  irq/35-ahci[000-121   [002] dN..212 28719.100181: update_rt_rq_load_avg <-put_prev_task_rt
>  irq/35-ahci[000-121   [002] d...212 28719.100181: enter_lazy_tlb <-__schedule
>     irq/21-spec_-292   [002] d...212 28719.100181: finish_task_switch <-__schedule
>     irq/21-spec_-292   [002] d...212 28719.100182: _raw_spin_unlock_irq <-finish_task_switch
>     irq/21-spec_-292   [002] ....212 28719.100182: preempt_count_sub <-_raw_spin_unlock_irq
>     irq/21-spec_-292   [002] .....12 28719.100182: _raw_spin_lock_irqsave <-rt_spin_lock_slowlock_locked
>     irq/21-spec_-292   [002] d....12 28719.100182: preempt_count_add <-_raw_spin_lock_irqsave
>     irq/21-spec_-292   [002] d...112 28719.100182: _raw_spin_lock <-rt_spin_lock_slowlock_locked
>     irq/21-spec_-292   [002] d...112 28719.100182: preempt_count_add <-_raw_spin_lock
>     irq/21-spec_-292   [002] d...212 28719.100182: preempt_count_sub <-rt_spin_lock_slowlock_locked
>     irq/21-spec_-292   [002] d...112 28719.100183: __try_to_take_rt_mutex <-rt_spin_lock_slowlock_locked
>     irq/21-spec_-292   [002] d...112 28719.100183: _raw_spin_lock <-__try_to_take_rt_mutex
>     irq/21-spec_-292   [002] d...112 28719.100183: preempt_count_add <-_raw_spin_lock
>     irq/21-spec_-292   [002] d...212 28719.100183: preempt_count_sub <-__try_to_take_rt_mutex
>     irq/21-spec_-292   [002] d...112 28719.100183: _raw_spin_lock <-rt_spin_lock_slowlock_locked
>     irq/21-spec_-292   [002] d...112 28719.100183: preempt_count_add <-_raw_spin_lock
>     irq/21-spec_-292   [002] d...212 28719.100183: preempt_count_sub <-rt_spin_lock_slowlock_locked
>     irq/21-spec_-292   [002] d...112 28719.100183: fixup_rt_mutex_waiters.part.6 <-rt_spin_lock_slowlock_locked
>     irq/21-spec_-292   [002] d...112 28719.100184: _raw_spin_unlock_irqrestore <-rt_spin_lock_slowlock
>     irq/21-spec_-292   [002] ....112 28719.100184: preempt_count_sub <-_raw_spin_unlock_irqrestore
>     irq/21-spec_-292   [002] .....12 28719.100184: vInterruptService <-irq_forced_thread_fn

lock is owned, the threaded handler (vInterruptService()) can be
invoked.

>     irq/21-spec_-292   [002] .....12 28719.100185: ktime_get_real_ts64 <-vInterruptService
>     irq/21-spec_-292   [002] .....12 28719.100189: __wake_up <-vInterruptService
>     irq/21-spec_-292   [002] .....12 28719.100189: __wake_up_common_lock <-vInterruptService
> .....
> 
> Now suddenly the spec handler is running again and this time the driver supplied IRQ-routine (vInterruptService) 
> is called, but with the 2 ms of delay. 
> 
> Unfortunately I'm not able to interpret the flags correctly. Especially since the ones described in the header 
> do not match the number of fields in the output...

So this delay is caused by the softirq rework. I didn't think about this
but now that you show the trace it is obvious.

If you use
	request_threaded_irq(, NULL, vInterruptService, …);

instead of
	request_irq(, vInterruptService, …);

then the IRQ will not be force threaded (but threaded on request) and
there will be no local_bh_disable() before the thread is invoked. So no
waiting until the AHCI thread is done.

If vInterruptService() does nothing else than just wake up userland then
you could use swait_event_…() / swake_up_one() instead of the
wait_event_… / wake_up() combo. That would be allowed from the primary
handler and you could avoid the switch to the threaded-handler.

> Martin

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* AW: Long latencies during disk-io
  2019-09-13 13:36         ` Sebastian Andrzej Siewior
@ 2019-09-13 14:09           ` Martin.Wirth
  2019-09-13 15:06             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Martin.Wirth @ 2019-09-13 14:09 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users, austin.linux

Hi Sebastian

> Von: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de]
> So this delay is caused by the softirq rework. I didn't think about this
> but now that you show the trace it is obvious.
> 
> If you use
> 	request_threaded_irq(, NULL, vInterruptService, …);
> 
> instead of
> 	request_irq(, vInterruptService, …);
> 
> then the IRQ will not be force threaded (but threaded on request) and
> there will be no local_bh_disable() before the thread is invoked. So no
> waiting until the AHCI thread is done.
> 
> If vInterruptService() does nothing else than just wake up userland then
> you could use swait_event_…() / swake_up_one() instead of the
> wait_event_… / wake_up() combo. That would be allowed from the primary
> handler and you could avoid the switch to the threaded-handler.

thanks for the thorough analysis!

I will try the request_threaded_irq(, NULL, vInterruptService, …) solution on 
Monday and report back. Since the driver also allows to poll several cards 
(which I don't use at the moment) I think it is not possible to switch to swait/swake...
And the interrupt line may be shared and if I remember correctly in this case it is also 
not possible to use primary handlers, or am I wrong?  
 
Cheers,
Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Long latencies during disk-io
  2019-09-13 14:09           ` AW: " Martin.Wirth
@ 2019-09-13 15:06             ` Sebastian Andrzej Siewior
  2019-09-16 11:30               ` AW: " Martin.Wirth
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-13 15:06 UTC (permalink / raw)
  To: Martin.Wirth; +Cc: linux-rt-users, austin.linux

On 2019-09-13 14:09:22 [+0000], Martin.Wirth@dlr.de wrote:
> Hi Sebastian
Hi Martin,

> I will try the request_threaded_irq(, NULL, vInterruptService, …) solution on 
> Monday and report back. Since the driver also allows to poll several cards 
> (which I don't use at the moment) I think it is not possible to switch to swait/swake...

If you support the ->poll callback then it is not that simple to use
swait. For tasks that block in ->read()/->write() you could use it, for
->poll() you need the threaded handler to handle poll().

> And the interrupt line may be shared and if I remember correctly in this case it is also 
> not possible to use primary handlers, or am I wrong?  

puh. So you need to specify IRQF_ONESHOT so that the IRQ-core disables
the interrupt line until the threaded handler finishes its work (a
detail I forgot to mention in the previous mail). You are not allowed to
mix the IRQF_ONESHOT flag with multiple handlers (either all should have
them or none). So I think in the force-threaded case you should be fine
but in the non-threaded case you would have trouble.

> Cheers,
> Martin

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* AW: Long latencies during disk-io
  2019-09-13 15:06             ` Sebastian Andrzej Siewior
@ 2019-09-16 11:30               ` Martin.Wirth
  2019-09-17 10:52                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Martin.Wirth @ 2019-09-16 11:30 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users

Hi Sebastian,

> Von: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de]
> Gesendet: Freitag, 13. September 2019 17:07
> puh. So you need to specify IRQF_ONESHOT so that the IRQ-core disables
> the interrupt line until the threaded handler finishes its work (a
> detail I forgot to mention in the previous mail). You are not allowed to
> mix the IRQF_ONESHOT flag with multiple handlers (either all should have
> them or none). So I think in the force-threaded case you should be fine
> but in the non-threaded case you would have trouble.

I tried the request_threaded_irq(.., vInterruptService, IRQF_SHARED|IRQF_ONESHOT, ...) 
variant and indeed the latency spikes caused by concurrent disk-io go away!!

So I now have a working solution for me and also learned how to separate interrupts
using /proc/irq as a second option to cope with this problem.

But in general I find it not very transparent that with the latest changes softirq work can block
higher priority irq threads for several ms or maybe even longer and nothing can be done 
against it with normal RT control-knobs, but a change to the driver code is required.
Isn't there an automatic way to decide, when it is possible to convert a request_irq to a 
request_threaded_irq? 

Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Long latencies during disk-io
  2019-09-16 11:30               ` AW: " Martin.Wirth
@ 2019-09-17 10:52                 ` Sebastian Andrzej Siewior
  2019-09-17 11:09                   ` AW: " Martin.Wirth
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-17 10:52 UTC (permalink / raw)
  To: Martin.Wirth; +Cc: linux-rt-users

On 2019-09-16 11:30:30 [+0000], Martin.Wirth@dlr.de wrote:
> Hi Sebastian,
Hi Martin,

> But in general I find it not very transparent that with the latest changes softirq work can block
> higher priority irq threads for several ms or maybe even longer and nothing can be done 
> against it with normal RT control-knobs, but a change to the driver code is required.
> Isn't there an automatic way to decide, when it is possible to convert a request_irq to a 
> request_threaded_irq? 

If something is automatically converted then we need to disable the
softirq while invoking the handler. This was also the case before
force-threading and the interrupt thread may assume such behaviour. If
you explicitly request a threaded IRQ then there is no such need because
the handler knows what it does and should disable softirq on its own if
needed.
Due to the softirq rework we can't have multiple softirqs vectors
running in parallel _or_ have multiple local_bh_disable() section which
are executed in parallel on the same CPU. This made the softirq code not
only simpler but also allowed us to remove a little bit duct tape from
places which were expecting this kind of behaviour.
In the long term we need to figure out if we want to go down that road
again and if so we need to figure out how to avoid that duct tape…

> Martin

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* AW: Long latencies during disk-io
  2019-09-17 10:52                 ` Sebastian Andrzej Siewior
@ 2019-09-17 11:09                   ` Martin.Wirth
  2019-09-17 11:20                     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Martin.Wirth @ 2019-09-17 11:09 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users

Hi Sebastian,

> Von: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de]
> Gesendet: Dienstag, 17. September 2019 12:53
> 

> Due to the softirq rework we can't have multiple softirqs vectors
> running in parallel _or_ have multiple local_bh_disable() section which
> are executed in parallel on the same CPU. This made the softirq code not
> only simpler but also allowed us to remove a little bit duct tape from
> places which were expecting this kind of behaviour.

Ok that explains why I haven’t seen this with older RT kernels, 
although the force-threading was always there...

> In the long term we need to figure out if we want to go down that road
> again and if so we need to figure out how to avoid that duct tape…

Maybe not. But this means that a lot of drivers which have no requirement 
to be protected against softirq will have to be converted to make them fit for RT. 
But maybe I'm wrong and most RT applications are triggered by the timer 
and not by some io-card...

Thanks again for all the work!

Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Long latencies during disk-io
  2019-09-17 11:09                   ` AW: " Martin.Wirth
@ 2019-09-17 11:20                     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-17 11:20 UTC (permalink / raw)
  To: Martin.Wirth; +Cc: linux-rt-users

On 2019-09-17 11:09:31 [+0000], Martin.Wirth@dlr.de wrote:
> Hi Sebastian,
Hi Martin,

> Maybe not. But this means that a lot of drivers which have no requirement 
> to be protected against softirq will have to be converted to make them fit for RT. 
> But maybe I'm wrong and most RT applications are triggered by the timer 
> and not by some io-card...

Your argument is valid. If we don't disable softirqs while
force-threading then we need duct-tape in other places. So it is not
ideal. Now we know that this seems to be problem.

> Thanks again for all the work!
> 
> Martin

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-09-17 11:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-03  7:23 Long latencies during disk-io Martin.Wirth
2019-09-05 15:41 ` Sebastian Andrzej Siewior
2019-09-06 13:23   ` AW: " Martin.Wirth
2019-09-06 22:06     ` Austin Schuh
2019-09-10 12:04       ` AW: " Martin.Wirth
2019-09-13 13:36         ` Sebastian Andrzej Siewior
2019-09-13 14:09           ` AW: " Martin.Wirth
2019-09-13 15:06             ` Sebastian Andrzej Siewior
2019-09-16 11:30               ` AW: " Martin.Wirth
2019-09-17 10:52                 ` Sebastian Andrzej Siewior
2019-09-17 11:09                   ` AW: " Martin.Wirth
2019-09-17 11:20                     ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).