All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] sc16is7xx: switch to threaded irq for fixing RT issues
       [not found]       ` <56C41FA5.4010605@prevas.dk>
@ 2016-02-17 21:54           ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2016-02-17 21:54 UTC (permalink / raw)
  To: Sean Nyekjær; +Cc: linux-serial, kubakici, linux-rt-users

[CC: linux-rt-users, can you guys help?]

On Wed, 17 Feb 2016 08:22:13 +0100, Sean Nyekjær wrote:
> On 2016-02-16 22:02, Jakub Kiciński wrote:
> > On Tue, 16 Feb 2016 08:37:11 +0100, Sean Nyekjær wrote:
> >> On 2016-02-15 23:08, Jakub Kicinski wrote:
> >>> On Mon, 15 Feb 2016 08:58:53 +0100, Sean Nyekjaer wrote:
> >>>> Working with RT patches reveals this driver have some spin_lock issues.
> >>>>
> >>>> spin_lock moved from sc16is7xx_reg_proc to protect to call to
> >>>> uart_handle_cts_change as it's the only call that isn't already
> >>>> protected in serial-core.
> >>> Sorry but this does not look fine.  You are removing all async items
> >>> from the driver.  This works for you because in RT code can sleep with
> >>> spin locks taken but for upstream it's not acceptable.  Did you test
> >>> this patch with upstream kernel and lockdep enabled?
> >> I have a 4.1.y with cherry-pick of the commits from tty-testing
> >> regarding sc16is7xx. (exept the new gpio api)
> >> On our custom board we have 3 sc16is750 on a single SPI channel. I'm
> >> using an FTDI chip(connected to a test PC) TX wired to the 3x RX pins on
> >> the sc16is750.
> >> The problem is easy reproducible just by creating data from the PC at
> >> 115200 baud, causes the kernel oops almost immediately.
> >>
> >> I have not seen any issues with a vanilla 4.1.y, only when i'm applying
> >> the RT patch and loading the system.
> > Thanks for providing the details, I think there is a misunderstanding
> > between us.  Allow me to explain again what I meant ;)
> >
> > I do acknowledge that there may be a problem with the driver on RT
> > kernels and I do trust you that you tested your patch on RT kernel
> > with latest version of the driver cherry-picked from mainline and it
> > works there.
> >
> > However, IIUC you are proposing your patch to be included in mainline.
> >
> > What I'm trying to say is that in mainline we need the async works
> > which you remove in your patch because we cannot perform SPI/I2C
> > blocking I/O under a spinlock...  If you compile mainline with your
> > patch you will see a slew of "sleeping in atomic context" warnings.
> >
> > So basically the questions are do you want this patch in mainline and
> > have you tested it on non-RT kernels?  (Please don't be discouraged -
> > there definitely must be a bug somewhere if you're seeing crashes, but
> > I'm afraid the proposed solutions is not good enough...)
> >
> > Thanks
> Yeah the first priority must be for the driver to work on the mainline 
> kernel :-)
> 
> I just wanted to flag a problem with the driver with the RT patches 
> applied. I thought
> RT patch (maybe) revaled that the driver have underlying problem.
> 
> So maybe this patch should be included in the RT patch?
> No i have not tested this with a non-RT kernel, maybe i should try :-)

I've looked at the stack dump you posted in previous email.  It looks
like we deadlock on the on spinlock_irqsave() in queue_kthread_work()
because our interrupt is not threaded!?  I thought all interrupts are
threaded in RT by default.

================================= [92/1901]
[ INFO: inconsistent lock state ]
4.1.15-rt17-00120-g6a753b6-dirty #10 Not tainted
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
mdev/309 [HC1[1]:SC0[0]:HE0:SE1] takes:
  ((&s->kworker)->lock){?.+...}, at: [<80046c50>] 
queue_kthread_work+0x18/0x54
{HARDIRQ-ON-W} state was registered at:
   [<80500028>] rt_spin_lock+0x74/0x88
   [<80046930>] kthread_worker_fn+0x84/0x198
   [<80046820>] kthread+0xd8/0xec
   [<8000f658>] ret_from_fork+0x14/0x3c
irq event stamp: 1362
hardirqs last  enabled at (1361): [<800bc364>] filemap_map_pages+0x2b0/0x424
hardirqs last disabled at (1362): [<800139b4>] __irq_svc+0x34/0x90
softirqs last  enabled at (0): [<80026108>] copy_process.part.2+0x370/0x1650
softirqs last disabled at (0): [<  (null)>]   (null)

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock((&s->kworker)->lock);
   <Interrupt>
     lock((&s->kworker)->lock);

  *** DEADLOCK ***

3 locks held by mdev/309:
  #0:  (&mm->mmap_sem){++++++}, at: [<8001d420>] do_page_fault+0x80/0x368
  #1:  (&mm->page_table_lock){+.+...}, at: [<800e19fc>] 
handle_mm_fault+0x630/0xcd0
  #2:  (rcu_read_lock){......}, at: [<800bc0b4>] filemap_map_pages+0x0/0x424

stack backtrace:
CPU: 0 PID: 309 Comm: mdev Not tainted 4.1.15-rt17-00120-g6a753b6-dirty #10
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[<80016e58>] (unwind_backtrace) from [<80012e48>] (show_stack+0x10/0x14)
[<80012e48>] (show_stack) from [<804fb00c>] (dump_stack+0x7c/0x90)
[<804fb00c>] (dump_stack) from [<804f8c6c>] (print_usage_bug+0x2d8/0x2e8)
[<804f8c6c>] (print_usage_bug) from [<800665ec>] (mark_lock+0x214/0x7a0)
[<800665ec>] (mark_lock) from [<800678dc>] (__lock_acquire+0x778/0x1dd8)
[<800678dc>] (__lock_acquire) from [<800697d4>] (lock_acquire+0x6c/0x8c)
[<800697d4>] (lock_acquire) from [<80500028>] (rt_spin_lock+0x74/0x88)
[<80500028>] (rt_spin_lock) from [<80046c50>] (queue_kthread_work+0x18/0x54)
[<80046c50>] (queue_kthread_work) from [<7f00b260>] 
(sc16is7xx_irq+0x10/0x18 [sc16is7xx])
[<7f00b260>] (sc16is7xx_irq [sc16is7xx]) from [<800734c0>] 
(handle_irq_event_percpu+0x70/0x15c)
[<800734c0>] (handle_irq_event_percpu) from [<800735e8>] 
(handle_irq_event+0x3c/0x5c)
[<800735e8>] (handle_irq_event) from [<80076890>] 
(handle_level_irq+0xc4/0x13c)
[<80076890>] (handle_level_irq) from [<80072b2c>] 
(generic_handle_irq+0x2c/0x3c)
[<80072b2c>] (generic_handle_irq) from [<8025f8e0>] 
(mxc_gpio_irq_handler+0x3c/0x108)
[<8025f8e0>] (mxc_gpio_irq_handler) from [<8025fa2c>] 
(mx3_gpio_irq_handler+0x80/0xcc)
[<8025fa2c>] (mx3_gpio_irq_handler) from [<80072b2c>] 
(generic_handle_irq+0x2c/0x3c)
[<80072b2c>] (generic_handle_irq) from [<80072e1c>] 
(__handle_domain_irq+0x7c/0xe8)
[<80072e1c>] (__handle_domain_irq) from [<800094dc>] 
(gic_handle_irq+0x24/0x5c)
[<800094dc>] (gic_handle_irq) from [<800139c4>] (__irq_svc+0x44/0x90)
Exception stack(0xa9ecdda0 to 0xa9ecdde8)
dda0: 00000001 00000110 00000000 a9f40b00 20070113 00000001 aa811eb8 
80782edf
ddc0: a9ecde84 aaa231a0 aaa232ec aae4b800 00000003 a9ecdde8 80066be0 
800bc368
dde0: 20070113 ffffffff
[<800139c4>] (__irq_svc) from [<800bc368>] (filemap_map_pages+0x2b4/0x424)
[<800bc368>] (filemap_map_pages) from [<800e1b0c>] 
(handle_mm_fault+0x740/0xcd0)
[<800e1b0c>] (handle_mm_fault) from [<8001d604>] (do_page_fault+0x264/0x368)
[<8001d604>] (do_page_fault) from [<800093b4>] (do_PrefetchAbort+0x34/0x9c)
[<800093b4>] (do_PrefetchAbort) from [<80013e9c>] 
(ret_from_exception+0x0/0x24)
Exception stack(0xa9ecdfb0 to 0xa9ecdff8)
dfa0:                                     003f8008 00001000 00079c84 
7ed0bf99
dfc0: 000808f0 ffffffff 7ed0bfc3 003f8008 00000000 00000000 000808f0 
00000001
dfe0: 76e18598 7ed0bc28 00040d1c 76e18598 60070010 ffffffff
random: nonblocking pool is initialized
------------[ cut here ]------------
Kernel BUG at 804fe6d8 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in: gpio_sn65hvs885 ti_ads8688 ad5755 sc16is7xx 
regmap_spi gpio_pca953x
CPU: 0 PID: 215 Comm: kworker/0:2 Not tainted 
4.1.15-rt17-00120-g6a753b6-dirty #10
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [17/1901]
Workqueue: events flush_to_ldisc
task: aa414d00 ti: a9c0c000 task.ti: a9c0c000
PC is at rt_spin_lock_slowlock+0x2c8/0x2e4
LR is at rt_spin_lock_slowlock+0x54/0x2e4
pc : [<804fe6d8>]    lr : [<804fe464>]    psr: 60070193
sp : a9c0dc20  ip : aa414d01  fp : a9edc0c4
r10: 00000000  r9 : 80046be0  r8 : 80782e40
r7 : aa414d00  r6 : 00000000  r5 : 00000001  r4 : a9c0dc38
r3 : aa414d00  r2 : 00000000  r1 : aa414d00  r0 : 00000000
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 39fe804a  DAC: 00000015
Process kworker/0:2 (pid: 215, stack limit = 0xa9c0c210)
Stack: (0xa9c0dc20 to 0xa9c0e000)
dc20: a9c0dce0 aac0c000 a9c0dd64 800697d4 00000001 00000080 a9c0dc38 
800735f4
dc40: 00000000 a9c0dc44 00000000 aad6b868 00000000 00000002 00000001 
804ff9ac
dc60: aa414d00 a9edc0c4 80046c50 00000067 aad96400 80782e40 80046be0 
00000000
dc80: c0ac9018 80500038 00000001 00000001 80f8474c 80748984 7f00b250 
a9edc0c4
dca0: a9edc11c 80046c50 7f00b250 a9efb2c0 00000000 7f00b260 7f00b250 
800734c0
dcc0: 800777d4 00000004 00060002 aad96400 aad96468 a9efb2c0 aad41610 
00000001
dce0: 80f8474c 80748984 c0ac9018 800735e8 aad96400 aad96468 00000002 
80076890
dd00: 800767cc 00000067 00000004 80072b2c 00000004 8025f8e0 d17b30c0 
00000001
dd20: 8073e500 a9fdc200 00000000 aad41610 80751108 aad95a00 00000000 
00000001
dd40: a9c0dda0 aac0c000 c0ac9018 8025fa2c 00000063 00000000 00000063 
80072b2c
dd60: 8073c6d0 80072e1c f400010c 00000056 80748ca0 a9c0dda0 f4000100 
00000008
dd80: a9fe2000 800094dc 80046be0 60070013 ffffffff a9c0ddd4 ffffffff 
800139c4
dda0: a9edc0c4 a9edc268 a9edc108 a9edc108 a9edc268 a9edc108 a9edc0c4 
a9fe38a9
ddc0: ffffffff 00000008 a9fe2000 c0ac9018 a9c0dda8 a9c0dde8 80046c88 
80046be0
dde0: 60070013 ffffffff a9edc0c4 00000001 00000000 80046c88 aa4f8800 
a9edc12c
de00: a9fe2000 802a6914 aa414d00 802a75f8 802a7608 c0ac9000 c0ac9000 
80292ce8
de20: 60070013 a9fe39a1 a9fe38a1 00000008 c0ac9000 00000000 00000008 
80526d20
de40: 55555556 c0ac9000 00000000 a9fe2124 00000000 a9fe2000 aa5722a0 
a9ea5cc0
de60: a9fe3800 aa572280 aa57227c 00000000 00000000 80293120 00000001 
aa572280
de80: a9fe2000 80296374 00000000 80040ee0 aa678d00 aa572280 ab70ce80 
a9c0dec8
dea0: ab710700 00000001 00000000 80040f60 00000001 00000000 80040ee0 
800412cc
dec0: 00000000 00000000 80f87484 00000000 00000000 80663790 80782dca 
ab70ce80
dee0: ab70ced4 a9c0c000 80782dca aa678d18 aa678d00 00000008 ab70ce80 
80041230
df00: 80742a40 aa414d00 aa678d00 aa655c00 00000000 aa678d00 800411e4 
00000000
df20: 00000000 00000000 00000000 80046820 a9c0df9c 00000000 aa414d00 
aa678d00
df40: 00000000 00000000 dead4ead ffffffff ffffffff 80785368 00000000 
00000000
df60: 8066b7cc a9c0df64 a9c0df64 00000000 00000000 dead4ead ffffffff 
ffffffff
df80: 80785368 00000000 00000000 8066b7cc a9c0df90 a9c0df90 a9c0dfac 
aa655c00
dfa0: 80046748 00000000 00000000 8000f658 00000000 00000000 00000000 
00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 3bf5a811 
3bf5ac11
[<804fe6d8>] (rt_spin_lock_slowlock) from [<80500038>] 
(rt_spin_lock+0x84/0x88)
[<80500038>] (rt_spin_lock) from [<80046c50>] (queue_kthread_work+0x18/0x54)
[<80046c50>] (queue_kthread_work) from [<7f00b260>] 
(sc16is7xx_irq+0x10/0x18 [sc16is7xx])
[<7f00b260>] (sc16is7xx_irq [sc16is7xx]) from [<800734c0>] 
(handle_irq_event_percpu+0x70/0x15c)
[<800734c0>] (handle_irq_event_percpu) from [<800735e8>] 
(handle_irq_event+0x3c/0x5c)
[<800735e8>] (handle_irq_event) from [<80076890>] 
(handle_level_irq+0xc4/0x13c)
[<80076890>] (handle_level_irq) from [<80072b2c>] 
(generic_handle_irq+0x2c/0x3c)
[<80072b2c>] (generic_handle_irq) from [<8025f8e0>] 
(mxc_gpio_irq_handler+0x3c/0x108)
[<8025f8e0>] (mxc_gpio_irq_handler) from [<8025fa2c>] 
(mx3_gpio_irq_handler+0x80/0xcc)
[<8025fa2c>] (mx3_gpio_irq_handler) from [<80072b2c>] 
(generic_handle_irq+0x2c/0x3c)
[<80072b2c>] (generic_handle_irq) from [<80072e1c>] 
(__handle_domain_irq+0x7c/0xe8)
[<80072e1c>] (__handle_domain_irq) from [<800094dc>] 
(gic_handle_irq+0x24/0x5c)
[<800094dc>] (gic_handle_irq) from [<800139c4>] (__irq_svc+0x44/0x90)
Exception stack(0xa9c0dda0 to 0xa9c0dde8)
dda0: a9edc0c4 a9edc268 a9edc108 a9edc108 a9edc268 a9edc108 a9edc0c4 
a9fe38a9
ddc0: ffffffff 00000008 a9fe2000 c0ac9018 a9c0dda8 a9c0dde8 80046c88 
80046be0
dde0: 60070013 ffffffff
[<800139c4>] (__irq_svc) from [<80046be0>] (insert_kthread_work+0x28/0x80)
[<80046be0>] (insert_kthread_work) from [<80046c88>] 
(queue_kthread_work+0x50/0x54)
[<80046c88>] (queue_kthread_work) from [<802a6914>] (__uart_start+0x38/0x3c)
[<802a6914>] (__uart_start) from [<802a75f8>] (uart_start+0x24/0x34)
[<802a75f8>] (uart_start) from [<80292ce8>] 
(n_tty_receive_buf_common+0x55c/0x980)
[<80292ce8>] (n_tty_receive_buf_common) from [<80293120>] 
(n_tty_receive_buf2+0x14/0x1c)
[<80293120>] (n_tty_receive_buf2) from [<80296374>] 
(flush_to_ldisc+0xd4/0x188)
[<80296374>] (flush_to_ldisc) from [<80040f60>] 
(process_one_work+0x1a0/0x424)
[<80040f60>] (process_one_work) from [<80041230>] (worker_thread+0x4c/0x4cc)
[<80041230>] (worker_thread) from [<80046820>] (kthread+0xd8/0xec)
[<80046820>] (kthread) from [<8000f658>] (ret_from_fork+0x14/0x3c)
Code: 0a000002 e59d3018 e1530004 0a000000 (e7f001f2)
---[ end trace 0000000000000002 ]---
Kernel panic - not syncing: Fatal exception in interrupt
CPU1: stopping
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D 
4.1.15-rt17-00120-g6a753b6-dirty #10
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[<80016e58>] (unwind_backtrace) from [<80012e48>] (show_stack+0x10/0x14)
[<80012e48>] (show_stack) from [<804fb00c>] (dump_stack+0x7c/0x90)
[<804fb00c>] (dump_stack) from [<80015d4c>] (handle_IPI+0x17c/0x190)
[<80015d4c>] (handle_IPI) from [<80009510>] (gic_handle_irq+0x58/0x5c)
[<80009510>] (gic_handle_irq) from [<800139c4>] (__irq_svc+0x44/0x90)
Exception stack(0xaacadf58 to 0xaacadfa0)
df40:                                                       8036afbc 
aaca2100
df60: 00000001 00000000 ba4c1c99 0000000f d76b24e1 0000000f 00000001 
ab7197e8
df80: 8073a894 00000001 aacac000 aacadfa0 8036afbc 8036afc0 60000013 
ffffffff
[<800139c4>] (__irq_svc) from [<8036afc0>] (cpuidle_enter_state+0xbc/0x1f4)
[<8036afc0>] (cpuidle_enter_state) from [<80063198>] 
(cpu_startup_entry+0x23c/0x394)
[<80063198>] (cpu_startup_entry) from [<100095ac>] (0x100095ac)
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sc16is7xx: switch to threaded irq for fixing RT issues
@ 2016-02-17 21:54           ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2016-02-17 21:54 UTC (permalink / raw)
  To: Sean Nyekjær; +Cc: linux-serial, kubakici, linux-rt-users

[CC: linux-rt-users, can you guys help?]

On Wed, 17 Feb 2016 08:22:13 +0100, Sean Nyekjær wrote:
> On 2016-02-16 22:02, Jakub Kiciński wrote:
> > On Tue, 16 Feb 2016 08:37:11 +0100, Sean Nyekjær wrote:
> >> On 2016-02-15 23:08, Jakub Kicinski wrote:
> >>> On Mon, 15 Feb 2016 08:58:53 +0100, Sean Nyekjaer wrote:
> >>>> Working with RT patches reveals this driver have some spin_lock issues.
> >>>>
> >>>> spin_lock moved from sc16is7xx_reg_proc to protect to call to
> >>>> uart_handle_cts_change as it's the only call that isn't already
> >>>> protected in serial-core.
> >>> Sorry but this does not look fine.  You are removing all async items
> >>> from the driver.  This works for you because in RT code can sleep with
> >>> spin locks taken but for upstream it's not acceptable.  Did you test
> >>> this patch with upstream kernel and lockdep enabled?
> >> I have a 4.1.y with cherry-pick of the commits from tty-testing
> >> regarding sc16is7xx. (exept the new gpio api)
> >> On our custom board we have 3 sc16is750 on a single SPI channel. I'm
> >> using an FTDI chip(connected to a test PC) TX wired to the 3x RX pins on
> >> the sc16is750.
> >> The problem is easy reproducible just by creating data from the PC at
> >> 115200 baud, causes the kernel oops almost immediately.
> >>
> >> I have not seen any issues with a vanilla 4.1.y, only when i'm applying
> >> the RT patch and loading the system.
> > Thanks for providing the details, I think there is a misunderstanding
> > between us.  Allow me to explain again what I meant ;)
> >
> > I do acknowledge that there may be a problem with the driver on RT
> > kernels and I do trust you that you tested your patch on RT kernel
> > with latest version of the driver cherry-picked from mainline and it
> > works there.
> >
> > However, IIUC you are proposing your patch to be included in mainline.
> >
> > What I'm trying to say is that in mainline we need the async works
> > which you remove in your patch because we cannot perform SPI/I2C
> > blocking I/O under a spinlock...  If you compile mainline with your
> > patch you will see a slew of "sleeping in atomic context" warnings.
> >
> > So basically the questions are do you want this patch in mainline and
> > have you tested it on non-RT kernels?  (Please don't be discouraged -
> > there definitely must be a bug somewhere if you're seeing crashes, but
> > I'm afraid the proposed solutions is not good enough...)
> >
> > Thanks
> Yeah the first priority must be for the driver to work on the mainline 
> kernel :-)
> 
> I just wanted to flag a problem with the driver with the RT patches 
> applied. I thought
> RT patch (maybe) revaled that the driver have underlying problem.
> 
> So maybe this patch should be included in the RT patch?
> No i have not tested this with a non-RT kernel, maybe i should try :-)

I've looked at the stack dump you posted in previous email.  It looks
like we deadlock on the on spinlock_irqsave() in queue_kthread_work()
because our interrupt is not threaded!?  I thought all interrupts are
threaded in RT by default.

================================= [92/1901]
[ INFO: inconsistent lock state ]
4.1.15-rt17-00120-g6a753b6-dirty #10 Not tainted
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
mdev/309 [HC1[1]:SC0[0]:HE0:SE1] takes:
  ((&s->kworker)->lock){?.+...}, at: [<80046c50>] 
queue_kthread_work+0x18/0x54
{HARDIRQ-ON-W} state was registered at:
   [<80500028>] rt_spin_lock+0x74/0x88
   [<80046930>] kthread_worker_fn+0x84/0x198
   [<80046820>] kthread+0xd8/0xec
   [<8000f658>] ret_from_fork+0x14/0x3c
irq event stamp: 1362
hardirqs last  enabled at (1361): [<800bc364>] filemap_map_pages+0x2b0/0x424
hardirqs last disabled at (1362): [<800139b4>] __irq_svc+0x34/0x90
softirqs last  enabled at (0): [<80026108>] copy_process.part.2+0x370/0x1650
softirqs last disabled at (0): [<  (null)>]   (null)

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock((&s->kworker)->lock);
   <Interrupt>
     lock((&s->kworker)->lock);

  *** DEADLOCK ***

3 locks held by mdev/309:
  #0:  (&mm->mmap_sem){++++++}, at: [<8001d420>] do_page_fault+0x80/0x368
  #1:  (&mm->page_table_lock){+.+...}, at: [<800e19fc>] 
handle_mm_fault+0x630/0xcd0
  #2:  (rcu_read_lock){......}, at: [<800bc0b4>] filemap_map_pages+0x0/0x424

stack backtrace:
CPU: 0 PID: 309 Comm: mdev Not tainted 4.1.15-rt17-00120-g6a753b6-dirty #10
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[<80016e58>] (unwind_backtrace) from [<80012e48>] (show_stack+0x10/0x14)
[<80012e48>] (show_stack) from [<804fb00c>] (dump_stack+0x7c/0x90)
[<804fb00c>] (dump_stack) from [<804f8c6c>] (print_usage_bug+0x2d8/0x2e8)
[<804f8c6c>] (print_usage_bug) from [<800665ec>] (mark_lock+0x214/0x7a0)
[<800665ec>] (mark_lock) from [<800678dc>] (__lock_acquire+0x778/0x1dd8)
[<800678dc>] (__lock_acquire) from [<800697d4>] (lock_acquire+0x6c/0x8c)
[<800697d4>] (lock_acquire) from [<80500028>] (rt_spin_lock+0x74/0x88)
[<80500028>] (rt_spin_lock) from [<80046c50>] (queue_kthread_work+0x18/0x54)
[<80046c50>] (queue_kthread_work) from [<7f00b260>] 
(sc16is7xx_irq+0x10/0x18 [sc16is7xx])
[<7f00b260>] (sc16is7xx_irq [sc16is7xx]) from [<800734c0>] 
(handle_irq_event_percpu+0x70/0x15c)
[<800734c0>] (handle_irq_event_percpu) from [<800735e8>] 
(handle_irq_event+0x3c/0x5c)
[<800735e8>] (handle_irq_event) from [<80076890>] 
(handle_level_irq+0xc4/0x13c)
[<80076890>] (handle_level_irq) from [<80072b2c>] 
(generic_handle_irq+0x2c/0x3c)
[<80072b2c>] (generic_handle_irq) from [<8025f8e0>] 
(mxc_gpio_irq_handler+0x3c/0x108)
[<8025f8e0>] (mxc_gpio_irq_handler) from [<8025fa2c>] 
(mx3_gpio_irq_handler+0x80/0xcc)
[<8025fa2c>] (mx3_gpio_irq_handler) from [<80072b2c>] 
(generic_handle_irq+0x2c/0x3c)
[<80072b2c>] (generic_handle_irq) from [<80072e1c>] 
(__handle_domain_irq+0x7c/0xe8)
[<80072e1c>] (__handle_domain_irq) from [<800094dc>] 
(gic_handle_irq+0x24/0x5c)
[<800094dc>] (gic_handle_irq) from [<800139c4>] (__irq_svc+0x44/0x90)
Exception stack(0xa9ecdda0 to 0xa9ecdde8)
dda0: 00000001 00000110 00000000 a9f40b00 20070113 00000001 aa811eb8 
80782edf
ddc0: a9ecde84 aaa231a0 aaa232ec aae4b800 00000003 a9ecdde8 80066be0 
800bc368
dde0: 20070113 ffffffff
[<800139c4>] (__irq_svc) from [<800bc368>] (filemap_map_pages+0x2b4/0x424)
[<800bc368>] (filemap_map_pages) from [<800e1b0c>] 
(handle_mm_fault+0x740/0xcd0)
[<800e1b0c>] (handle_mm_fault) from [<8001d604>] (do_page_fault+0x264/0x368)
[<8001d604>] (do_page_fault) from [<800093b4>] (do_PrefetchAbort+0x34/0x9c)
[<800093b4>] (do_PrefetchAbort) from [<80013e9c>] 
(ret_from_exception+0x0/0x24)
Exception stack(0xa9ecdfb0 to 0xa9ecdff8)
dfa0:                                     003f8008 00001000 00079c84 
7ed0bf99
dfc0: 000808f0 ffffffff 7ed0bfc3 003f8008 00000000 00000000 000808f0 
00000001
dfe0: 76e18598 7ed0bc28 00040d1c 76e18598 60070010 ffffffff
random: nonblocking pool is initialized
------------[ cut here ]------------
Kernel BUG at 804fe6d8 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in: gpio_sn65hvs885 ti_ads8688 ad5755 sc16is7xx 
regmap_spi gpio_pca953x
CPU: 0 PID: 215 Comm: kworker/0:2 Not tainted 
4.1.15-rt17-00120-g6a753b6-dirty #10
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [17/1901]
Workqueue: events flush_to_ldisc
task: aa414d00 ti: a9c0c000 task.ti: a9c0c000
PC is at rt_spin_lock_slowlock+0x2c8/0x2e4
LR is at rt_spin_lock_slowlock+0x54/0x2e4
pc : [<804fe6d8>]    lr : [<804fe464>]    psr: 60070193
sp : a9c0dc20  ip : aa414d01  fp : a9edc0c4
r10: 00000000  r9 : 80046be0  r8 : 80782e40
r7 : aa414d00  r6 : 00000000  r5 : 00000001  r4 : a9c0dc38
r3 : aa414d00  r2 : 00000000  r1 : aa414d00  r0 : 00000000
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 39fe804a  DAC: 00000015
Process kworker/0:2 (pid: 215, stack limit = 0xa9c0c210)
Stack: (0xa9c0dc20 to 0xa9c0e000)
dc20: a9c0dce0 aac0c000 a9c0dd64 800697d4 00000001 00000080 a9c0dc38 
800735f4
dc40: 00000000 a9c0dc44 00000000 aad6b868 00000000 00000002 00000001 
804ff9ac
dc60: aa414d00 a9edc0c4 80046c50 00000067 aad96400 80782e40 80046be0 
00000000
dc80: c0ac9018 80500038 00000001 00000001 80f8474c 80748984 7f00b250 
a9edc0c4
dca0: a9edc11c 80046c50 7f00b250 a9efb2c0 00000000 7f00b260 7f00b250 
800734c0
dcc0: 800777d4 00000004 00060002 aad96400 aad96468 a9efb2c0 aad41610 
00000001
dce0: 80f8474c 80748984 c0ac9018 800735e8 aad96400 aad96468 00000002 
80076890
dd00: 800767cc 00000067 00000004 80072b2c 00000004 8025f8e0 d17b30c0 
00000001
dd20: 8073e500 a9fdc200 00000000 aad41610 80751108 aad95a00 00000000 
00000001
dd40: a9c0dda0 aac0c000 c0ac9018 8025fa2c 00000063 00000000 00000063 
80072b2c
dd60: 8073c6d0 80072e1c f400010c 00000056 80748ca0 a9c0dda0 f4000100 
00000008
dd80: a9fe2000 800094dc 80046be0 60070013 ffffffff a9c0ddd4 ffffffff 
800139c4
dda0: a9edc0c4 a9edc268 a9edc108 a9edc108 a9edc268 a9edc108 a9edc0c4 
a9fe38a9
ddc0: ffffffff 00000008 a9fe2000 c0ac9018 a9c0dda8 a9c0dde8 80046c88 
80046be0
dde0: 60070013 ffffffff a9edc0c4 00000001 00000000 80046c88 aa4f8800 
a9edc12c
de00: a9fe2000 802a6914 aa414d00 802a75f8 802a7608 c0ac9000 c0ac9000 
80292ce8
de20: 60070013 a9fe39a1 a9fe38a1 00000008 c0ac9000 00000000 00000008 
80526d20
de40: 55555556 c0ac9000 00000000 a9fe2124 00000000 a9fe2000 aa5722a0 
a9ea5cc0
de60: a9fe3800 aa572280 aa57227c 00000000 00000000 80293120 00000001 
aa572280
de80: a9fe2000 80296374 00000000 80040ee0 aa678d00 aa572280 ab70ce80 
a9c0dec8
dea0: ab710700 00000001 00000000 80040f60 00000001 00000000 80040ee0 
800412cc
dec0: 00000000 00000000 80f87484 00000000 00000000 80663790 80782dca 
ab70ce80
dee0: ab70ced4 a9c0c000 80782dca aa678d18 aa678d00 00000008 ab70ce80 
80041230
df00: 80742a40 aa414d00 aa678d00 aa655c00 00000000 aa678d00 800411e4 
00000000
df20: 00000000 00000000 00000000 80046820 a9c0df9c 00000000 aa414d00 
aa678d00
df40: 00000000 00000000 dead4ead ffffffff ffffffff 80785368 00000000 
00000000
df60: 8066b7cc a9c0df64 a9c0df64 00000000 00000000 dead4ead ffffffff 
ffffffff
df80: 80785368 00000000 00000000 8066b7cc a9c0df90 a9c0df90 a9c0dfac 
aa655c00
dfa0: 80046748 00000000 00000000 8000f658 00000000 00000000 00000000 
00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 3bf5a811 
3bf5ac11
[<804fe6d8>] (rt_spin_lock_slowlock) from [<80500038>] 
(rt_spin_lock+0x84/0x88)
[<80500038>] (rt_spin_lock) from [<80046c50>] (queue_kthread_work+0x18/0x54)
[<80046c50>] (queue_kthread_work) from [<7f00b260>] 
(sc16is7xx_irq+0x10/0x18 [sc16is7xx])
[<7f00b260>] (sc16is7xx_irq [sc16is7xx]) from [<800734c0>] 
(handle_irq_event_percpu+0x70/0x15c)
[<800734c0>] (handle_irq_event_percpu) from [<800735e8>] 
(handle_irq_event+0x3c/0x5c)
[<800735e8>] (handle_irq_event) from [<80076890>] 
(handle_level_irq+0xc4/0x13c)
[<80076890>] (handle_level_irq) from [<80072b2c>] 
(generic_handle_irq+0x2c/0x3c)
[<80072b2c>] (generic_handle_irq) from [<8025f8e0>] 
(mxc_gpio_irq_handler+0x3c/0x108)
[<8025f8e0>] (mxc_gpio_irq_handler) from [<8025fa2c>] 
(mx3_gpio_irq_handler+0x80/0xcc)
[<8025fa2c>] (mx3_gpio_irq_handler) from [<80072b2c>] 
(generic_handle_irq+0x2c/0x3c)
[<80072b2c>] (generic_handle_irq) from [<80072e1c>] 
(__handle_domain_irq+0x7c/0xe8)
[<80072e1c>] (__handle_domain_irq) from [<800094dc>] 
(gic_handle_irq+0x24/0x5c)
[<800094dc>] (gic_handle_irq) from [<800139c4>] (__irq_svc+0x44/0x90)
Exception stack(0xa9c0dda0 to 0xa9c0dde8)
dda0: a9edc0c4 a9edc268 a9edc108 a9edc108 a9edc268 a9edc108 a9edc0c4 
a9fe38a9
ddc0: ffffffff 00000008 a9fe2000 c0ac9018 a9c0dda8 a9c0dde8 80046c88 
80046be0
dde0: 60070013 ffffffff
[<800139c4>] (__irq_svc) from [<80046be0>] (insert_kthread_work+0x28/0x80)
[<80046be0>] (insert_kthread_work) from [<80046c88>] 
(queue_kthread_work+0x50/0x54)
[<80046c88>] (queue_kthread_work) from [<802a6914>] (__uart_start+0x38/0x3c)
[<802a6914>] (__uart_start) from [<802a75f8>] (uart_start+0x24/0x34)
[<802a75f8>] (uart_start) from [<80292ce8>] 
(n_tty_receive_buf_common+0x55c/0x980)
[<80292ce8>] (n_tty_receive_buf_common) from [<80293120>] 
(n_tty_receive_buf2+0x14/0x1c)
[<80293120>] (n_tty_receive_buf2) from [<80296374>] 
(flush_to_ldisc+0xd4/0x188)
[<80296374>] (flush_to_ldisc) from [<80040f60>] 
(process_one_work+0x1a0/0x424)
[<80040f60>] (process_one_work) from [<80041230>] (worker_thread+0x4c/0x4cc)
[<80041230>] (worker_thread) from [<80046820>] (kthread+0xd8/0xec)
[<80046820>] (kthread) from [<8000f658>] (ret_from_fork+0x14/0x3c)
Code: 0a000002 e59d3018 e1530004 0a000000 (e7f001f2)
---[ end trace 0000000000000002 ]---
Kernel panic - not syncing: Fatal exception in interrupt
CPU1: stopping
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D 
4.1.15-rt17-00120-g6a753b6-dirty #10
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[<80016e58>] (unwind_backtrace) from [<80012e48>] (show_stack+0x10/0x14)
[<80012e48>] (show_stack) from [<804fb00c>] (dump_stack+0x7c/0x90)
[<804fb00c>] (dump_stack) from [<80015d4c>] (handle_IPI+0x17c/0x190)
[<80015d4c>] (handle_IPI) from [<80009510>] (gic_handle_irq+0x58/0x5c)
[<80009510>] (gic_handle_irq) from [<800139c4>] (__irq_svc+0x44/0x90)
Exception stack(0xaacadf58 to 0xaacadfa0)
df40:                                                       8036afbc 
aaca2100
df60: 00000001 00000000 ba4c1c99 0000000f d76b24e1 0000000f 00000001 
ab7197e8
df80: 8073a894 00000001 aacac000 aacadfa0 8036afbc 8036afc0 60000013 
ffffffff
[<800139c4>] (__irq_svc) from [<8036afc0>] (cpuidle_enter_state+0xbc/0x1f4)
[<8036afc0>] (cpuidle_enter_state) from [<80063198>] 
(cpu_startup_entry+0x23c/0x394)
[<80063198>] (cpu_startup_entry) from [<100095ac>] (0x100095ac)
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sc16is7xx: switch to threaded irq for fixing RT issues
  2016-02-17 21:54           ` Jakub Kicinski
  (?)
@ 2016-02-17 23:25           ` Josh Cartwright
  2016-02-18  7:29               ` Sean Nyekjær
  -1 siblings, 1 reply; 11+ messages in thread
From: Josh Cartwright @ 2016-02-17 23:25 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Sean Nyekj?r, linux-serial, kubakici, linux-rt-users

On Wed, Feb 17, 2016 at 09:54:27PM +0000, Jakub Kicinski wrote:
> [CC: linux-rt-users, can you guys help?]
> 
> On Wed, 17 Feb 2016 08:22:13 +0100, Sean Nyekjær wrote:
> > On 2016-02-16 22:02, Jakub Kici??ski wrote:
> > > On Tue, 16 Feb 2016 08:37:11 +0100, Sean Nyekjær wrote:
> > >> On 2016-02-15 23:08, Jakub Kicinski wrote:
> > >>> On Mon, 15 Feb 2016 08:58:53 +0100, Sean Nyekjaer wrote:
> > >>>> Working with RT patches reveals this driver have some spin_lock issues.
> > >>>>
> > >>>> spin_lock moved from sc16is7xx_reg_proc to protect to call to
> > >>>> uart_handle_cts_change as it's the only call that isn't already
> > >>>> protected in serial-core.
> > >>> Sorry but this does not look fine.  You are removing all async items
> > >>> from the driver.  This works for you because in RT code can sleep with
> > >>> spin locks taken but for upstream it's not acceptable.  Did you test
> > >>> this patch with upstream kernel and lockdep enabled?
> > >> I have a 4.1.y with cherry-pick of the commits from tty-testing
> > >> regarding sc16is7xx. (exept the new gpio api)
> > >> On our custom board we have 3 sc16is750 on a single SPI channel. I'm
> > >> using an FTDI chip(connected to a test PC) TX wired to the 3x RX pins on
> > >> the sc16is750.
> > >> The problem is easy reproducible just by creating data from the PC at
> > >> 115200 baud, causes the kernel oops almost immediately.
> > >>
> > >> I have not seen any issues with a vanilla 4.1.y, only when i'm applying
> > >> the RT patch and loading the system.
> > > Thanks for providing the details, I think there is a misunderstanding
> > > between us.  Allow me to explain again what I meant ;)
> > >
> > > I do acknowledge that there may be a problem with the driver on RT
> > > kernels and I do trust you that you tested your patch on RT kernel
> > > with latest version of the driver cherry-picked from mainline and it
> > > works there.
> > >
> > > However, IIUC you are proposing your patch to be included in mainline.
> > >
> > > What I'm trying to say is that in mainline we need the async works
> > > which you remove in your patch because we cannot perform SPI/I2C
> > > blocking I/O under a spinlock...  If you compile mainline with your
> > > patch you will see a slew of "sleeping in atomic context" warnings.
> > >
> > > So basically the questions are do you want this patch in mainline and
> > > have you tested it on non-RT kernels?  (Please don't be discouraged -
> > > there definitely must be a bug somewhere if you're seeing crashes, but
> > > I'm afraid the proposed solutions is not good enough...)
> > >
> > > Thanks
> > Yeah the first priority must be for the driver to work on the mainline 
> > kernel :-)
> > 
> > I just wanted to flag a problem with the driver with the RT patches 
> > applied. I thought
> > RT patch (maybe) revaled that the driver have underlying problem.
> > 
> > So maybe this patch should be included in the RT patch?
> > No i have not tested this with a non-RT kernel, maybe i should try :-)
> 
> I've looked at the stack dump you posted in previous email.  It looks
> like we deadlock on the on spinlock_irqsave() in queue_kthread_work()
> because our interrupt is not threaded!?  I thought all interrupts are
> threaded in RT by default.

I believe the actual problem is that the handler is being registered w/
IRQF_ONESHOT.  I'm not sure that even makes sense in this scenario, but
is perhaps an unintended carry over from request_threaded_irq ->
request_irq transition in 9e6f4ca3 ("sc16is7xx: use kthread_worker for
tx_work and irq").

Sean-

Does this solve your problem?

  Josh

diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index e78fa99..1bb5c0e 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -1257,7 +1257,7 @@ static int sc16is7xx_probe(struct device *dev,
 
 	/* Setup interrupt */
 	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
-			       IRQF_ONESHOT | flags, dev_name(dev), s);
+			       flags, dev_name(dev), s);
 	if (!ret)
 		return 0;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] sc16is7xx: switch to threaded irq for fixing RT issues
  2016-02-17 23:25           ` Josh Cartwright
@ 2016-02-18  7:29               ` Sean Nyekjær
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Nyekjær @ 2016-02-18  7:29 UTC (permalink / raw)
  To: Josh Cartwright, Jakub Kicinski; +Cc: linux-serial, kubakici, linux-rt-users



On 2016-02-18 00:25, Josh Cartwright wrote:
> On Wed, Feb 17, 2016 at 09:54:27PM +0000, Jakub Kicinski wrote:
>> [CC: linux-rt-users, can you guys help?]
>>
>> On Wed, 17 Feb 2016 08:22:13 +0100, Sean Nyekjær wrote:
>>> On 2016-02-16 22:02, Jakub Kici??ski wrote:
>>>> On Tue, 16 Feb 2016 08:37:11 +0100, Sean Nyekjær wrote:
>>>>> On 2016-02-15 23:08, Jakub Kicinski wrote:
>>>>>> On Mon, 15 Feb 2016 08:58:53 +0100, Sean Nyekjaer wrote:
>>>>>>> Working with RT patches reveals this driver have some spin_lock issues.
>>>>>>>
>>>>>>> spin_lock moved from sc16is7xx_reg_proc to protect to call to
>>>>>>> uart_handle_cts_change as it's the only call that isn't already
>>>>>>> protected in serial-core.
>>>>>> Sorry but this does not look fine.  You are removing all async items
>>>>>> from the driver.  This works for you because in RT code can sleep with
>>>>>> spin locks taken but for upstream it's not acceptable.  Did you test
>>>>>> this patch with upstream kernel and lockdep enabled?
>>>>> I have a 4.1.y with cherry-pick of the commits from tty-testing
>>>>> regarding sc16is7xx. (exept the new gpio api)
>>>>> On our custom board we have 3 sc16is750 on a single SPI channel. I'm
>>>>> using an FTDI chip(connected to a test PC) TX wired to the 3x RX pins on
>>>>> the sc16is750.
>>>>> The problem is easy reproducible just by creating data from the PC at
>>>>> 115200 baud, causes the kernel oops almost immediately.
>>>>>
>>>>> I have not seen any issues with a vanilla 4.1.y, only when i'm applying
>>>>> the RT patch and loading the system.
>>>> Thanks for providing the details, I think there is a misunderstanding
>>>> between us.  Allow me to explain again what I meant ;)
>>>>
>>>> I do acknowledge that there may be a problem with the driver on RT
>>>> kernels and I do trust you that you tested your patch on RT kernel
>>>> with latest version of the driver cherry-picked from mainline and it
>>>> works there.
>>>>
>>>> However, IIUC you are proposing your patch to be included in mainline.
>>>>
>>>> What I'm trying to say is that in mainline we need the async works
>>>> which you remove in your patch because we cannot perform SPI/I2C
>>>> blocking I/O under a spinlock...  If you compile mainline with your
>>>> patch you will see a slew of "sleeping in atomic context" warnings.
>>>>
>>>> So basically the questions are do you want this patch in mainline and
>>>> have you tested it on non-RT kernels?  (Please don't be discouraged -
>>>> there definitely must be a bug somewhere if you're seeing crashes, but
>>>> I'm afraid the proposed solutions is not good enough...)
>>>>
>>>> Thanks
>>> Yeah the first priority must be for the driver to work on the mainline
>>> kernel :-)
>>>
>>> I just wanted to flag a problem with the driver with the RT patches
>>> applied. I thought
>>> RT patch (maybe) revaled that the driver have underlying problem.
>>>
>>> So maybe this patch should be included in the RT patch?
>>> No i have not tested this with a non-RT kernel, maybe i should try :-)
>> I've looked at the stack dump you posted in previous email.  It looks
>> like we deadlock on the on spinlock_irqsave() in queue_kthread_work()
>> because our interrupt is not threaded!?  I thought all interrupts are
>> threaded in RT by default.
> I believe the actual problem is that the handler is being registered w/
> IRQF_ONESHOT.  I'm not sure that even makes sense in this scenario, but
> is perhaps an unintended carry over from request_threaded_irq ->
> request_irq transition in 9e6f4ca3 ("sc16is7xx: use kthread_worker for
> tx_work and irq").
>
> Sean-
>
> Does this solve your problem?
>
>    Josh
>
> diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> index e78fa99..1bb5c0e 100644
> --- a/drivers/tty/serial/sc16is7xx.c
> +++ b/drivers/tty/serial/sc16is7xx.c
> @@ -1257,7 +1257,7 @@ static int sc16is7xx_probe(struct device *dev,
>   
>   	/* Setup interrupt */
>   	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
> -			       IRQF_ONESHOT | flags, dev_name(dev), s);
> +			       flags, dev_name(dev), s);
>   	if (!ret)
>   		return 0;
>   
Wow, nice that you CC'd the RT guys :-)

Josh that was it, it runs without deadlocks now :-D

Thank you.

Jakub, is this fix ok? Who is making this into a PATCH?
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sc16is7xx: switch to threaded irq for fixing RT issues
@ 2016-02-18  7:29               ` Sean Nyekjær
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Nyekjær @ 2016-02-18  7:29 UTC (permalink / raw)
  To: Josh Cartwright, Jakub Kicinski; +Cc: linux-serial, kubakici, linux-rt-users



On 2016-02-18 00:25, Josh Cartwright wrote:
> On Wed, Feb 17, 2016 at 09:54:27PM +0000, Jakub Kicinski wrote:
>> [CC: linux-rt-users, can you guys help?]
>>
>> On Wed, 17 Feb 2016 08:22:13 +0100, Sean Nyekjær wrote:
>>> On 2016-02-16 22:02, Jakub Kici??ski wrote:
>>>> On Tue, 16 Feb 2016 08:37:11 +0100, Sean Nyekjær wrote:
>>>>> On 2016-02-15 23:08, Jakub Kicinski wrote:
>>>>>> On Mon, 15 Feb 2016 08:58:53 +0100, Sean Nyekjaer wrote:
>>>>>>> Working with RT patches reveals this driver have some spin_lock issues.
>>>>>>>
>>>>>>> spin_lock moved from sc16is7xx_reg_proc to protect to call to
>>>>>>> uart_handle_cts_change as it's the only call that isn't already
>>>>>>> protected in serial-core.
>>>>>> Sorry but this does not look fine.  You are removing all async items
>>>>>> from the driver.  This works for you because in RT code can sleep with
>>>>>> spin locks taken but for upstream it's not acceptable.  Did you test
>>>>>> this patch with upstream kernel and lockdep enabled?
>>>>> I have a 4.1.y with cherry-pick of the commits from tty-testing
>>>>> regarding sc16is7xx. (exept the new gpio api)
>>>>> On our custom board we have 3 sc16is750 on a single SPI channel. I'm
>>>>> using an FTDI chip(connected to a test PC) TX wired to the 3x RX pins on
>>>>> the sc16is750.
>>>>> The problem is easy reproducible just by creating data from the PC at
>>>>> 115200 baud, causes the kernel oops almost immediately.
>>>>>
>>>>> I have not seen any issues with a vanilla 4.1.y, only when i'm applying
>>>>> the RT patch and loading the system.
>>>> Thanks for providing the details, I think there is a misunderstanding
>>>> between us.  Allow me to explain again what I meant ;)
>>>>
>>>> I do acknowledge that there may be a problem with the driver on RT
>>>> kernels and I do trust you that you tested your patch on RT kernel
>>>> with latest version of the driver cherry-picked from mainline and it
>>>> works there.
>>>>
>>>> However, IIUC you are proposing your patch to be included in mainline.
>>>>
>>>> What I'm trying to say is that in mainline we need the async works
>>>> which you remove in your patch because we cannot perform SPI/I2C
>>>> blocking I/O under a spinlock...  If you compile mainline with your
>>>> patch you will see a slew of "sleeping in atomic context" warnings.
>>>>
>>>> So basically the questions are do you want this patch in mainline and
>>>> have you tested it on non-RT kernels?  (Please don't be discouraged -
>>>> there definitely must be a bug somewhere if you're seeing crashes, but
>>>> I'm afraid the proposed solutions is not good enough...)
>>>>
>>>> Thanks
>>> Yeah the first priority must be for the driver to work on the mainline
>>> kernel :-)
>>>
>>> I just wanted to flag a problem with the driver with the RT patches
>>> applied. I thought
>>> RT patch (maybe) revaled that the driver have underlying problem.
>>>
>>> So maybe this patch should be included in the RT patch?
>>> No i have not tested this with a non-RT kernel, maybe i should try :-)
>> I've looked at the stack dump you posted in previous email.  It looks
>> like we deadlock on the on spinlock_irqsave() in queue_kthread_work()
>> because our interrupt is not threaded!?  I thought all interrupts are
>> threaded in RT by default.
> I believe the actual problem is that the handler is being registered w/
> IRQF_ONESHOT.  I'm not sure that even makes sense in this scenario, but
> is perhaps an unintended carry over from request_threaded_irq ->
> request_irq transition in 9e6f4ca3 ("sc16is7xx: use kthread_worker for
> tx_work and irq").
>
> Sean-
>
> Does this solve your problem?
>
>    Josh
>
> diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> index e78fa99..1bb5c0e 100644
> --- a/drivers/tty/serial/sc16is7xx.c
> +++ b/drivers/tty/serial/sc16is7xx.c
> @@ -1257,7 +1257,7 @@ static int sc16is7xx_probe(struct device *dev,
>   
>   	/* Setup interrupt */
>   	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
> -			       IRQF_ONESHOT | flags, dev_name(dev), s);
> +			       flags, dev_name(dev), s);
>   	if (!ret)
>   		return 0;
>   
Wow, nice that you CC'd the RT guys :-)

Josh that was it, it runs without deadlocks now :-D

Thank you.

Jakub, is this fix ok? Who is making this into a PATCH?
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sc16is7xx: switch to threaded irq for fixing RT issues
  2016-02-18  7:29               ` Sean Nyekjær
@ 2016-02-18  8:46                 ` Jakub Kicinski
  -1 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2016-02-18  8:46 UTC (permalink / raw)
  To: Sean Nyekjær; +Cc: Josh Cartwright, linux-serial, kubakici, linux-rt-users

On Thu, 18 Feb 2016 08:29:40 +0100, Sean Nyekjær wrote:
> > diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> > index e78fa99..1bb5c0e 100644
> > --- a/drivers/tty/serial/sc16is7xx.c
> > +++ b/drivers/tty/serial/sc16is7xx.c
> > @@ -1257,7 +1257,7 @@ static int sc16is7xx_probe(struct device *dev,
> >   
> >   	/* Setup interrupt */
> >   	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
> > -			       IRQF_ONESHOT | flags, dev_name(dev), s);
> > +			       flags, dev_name(dev), s);
> >   	if (!ret)
> >   		return 0;
> >  
> Wow, nice that you CC'd the RT guys :-)
> 
> Josh that was it, it runs without deadlocks now :-D
> 
> Thank you.
> 
> Jakub, is this fix ok? Who is making this into a PATCH?

Yes, in fact similar patch circulated at some point but the author
flaked out before it was ready for merging.

Josh could you submit officially?  Please CC stable for 4.1+.
Fixes: 9e6f4ca3e567 ("sc16is7xx: use kthread_worker for tx_work and irq")

Thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sc16is7xx: switch to threaded irq for fixing RT issues
@ 2016-02-18  8:46                 ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2016-02-18  8:46 UTC (permalink / raw)
  To: Sean Nyekjær; +Cc: Josh Cartwright, linux-serial, kubakici, linux-rt-users

On Thu, 18 Feb 2016 08:29:40 +0100, Sean Nyekjær wrote:
> > diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> > index e78fa99..1bb5c0e 100644
> > --- a/drivers/tty/serial/sc16is7xx.c
> > +++ b/drivers/tty/serial/sc16is7xx.c
> > @@ -1257,7 +1257,7 @@ static int sc16is7xx_probe(struct device *dev,
> >   
> >   	/* Setup interrupt */
> >   	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
> > -			       IRQF_ONESHOT | flags, dev_name(dev), s);
> > +			       flags, dev_name(dev), s);
> >   	if (!ret)
> >   		return 0;
> >  
> Wow, nice that you CC'd the RT guys :-)
> 
> Josh that was it, it runs without deadlocks now :-D
> 
> Thank you.
> 
> Jakub, is this fix ok? Who is making this into a PATCH?

Yes, in fact similar patch circulated at some point but the author
flaked out before it was ready for merging.

Josh could you submit officially?  Please CC stable for 4.1+.
Fixes: 9e6f4ca3e567 ("sc16is7xx: use kthread_worker for tx_work and irq")

Thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] sc16is7xx: drop bogus use of IRQF_ONESHOT
  2016-02-18  8:46                 ` Jakub Kicinski
  (?)
@ 2016-02-18 17:26                 ` Josh Cartwright
  2016-03-08 12:43                     ` Sean Nyekjær
  2016-09-09 13:14                   ` Sebastian Andrzej Siewior
  -1 siblings, 2 replies; 11+ messages in thread
From: Josh Cartwright @ 2016-02-18 17:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: linux-serial, linux-rt-users, Sean Nyekjaer, linux-kernel, stable

The use of IRQF_ONESHOT when registering an interrupt handler with
request_irq() is non-sensical.

Not only that, it also prevents the handler from being threaded when it
otherwise should be w/ IRQ_FORCED_THREADING is enabled.  This causes the
following deadlock observed by Sean Nyekjaer on -rt:

Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[..]
   rt_spin_lock_slowlock from (queue_kthread_work+0x18/0x74)
   queue_kthread_work) from (sc16is7xx_irq+0x10/0x18 [sc16is7xx])
   sc16is7xx_irq [sc16is7xx]) from (handle_irq_event_percpu+0x70/0x158)
   handle_irq_event_percpu) from (handle_irq_event+0x68/0xa8)
   handle_irq_event) from (handle_level_irq+0x10c/0x184)
   handle_level_irq) from (generic_handle_irq+0x2c/0x3c)
   generic_handle_irq) from (mxc_gpio_irq_handler+0x3c/0x108)
   mxc_gpio_irq_handler) from (mx3_gpio_irq_handler+0x80/0xcc)
   mx3_gpio_irq_handler) from (generic_handle_irq+0x2c/0x3c)
   generic_handle_irq) from (__handle_domain_irq+0x7c/0xe8)
   __handle_domain_irq) from (gic_handle_irq+0x24/0x5c)
   gic_handle_irq) from (__irq_svc+0x40/0x88)
   (__irq_svc) from (rt_spin_unlock+0x1c/0x68)
   (rt_spin_unlock) from (kthread_worker_fn+0x104/0x17c)
   (kthread_worker_fn) from (kthread+0xd0/0xe8)
   (kthread) from (ret_from_fork+0x14/0x2c)

Reported-by: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Fixes: 9e6f4ca3e567 ("sc16is7xx: use kthread_worker for tx_work and irq")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Josh Cartwright <joshc@ni.com>
---
 drivers/tty/serial/sc16is7xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index e78fa99..1bb5c0e 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -1257,7 +1257,7 @@ static int sc16is7xx_probe(struct device *dev,
 
 	/* Setup interrupt */
 	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
-			       IRQF_ONESHOT | flags, dev_name(dev), s);
+			       flags, dev_name(dev), s);
 	if (!ret)
 		return 0;
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] sc16is7xx: drop bogus use of IRQF_ONESHOT
  2016-02-18 17:26                 ` [PATCH] sc16is7xx: drop bogus use of IRQF_ONESHOT Josh Cartwright
@ 2016-03-08 12:43                     ` Sean Nyekjær
  2016-09-09 13:14                   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 11+ messages in thread
From: Sean Nyekjær @ 2016-03-08 12:43 UTC (permalink / raw)
  To: Josh Cartwright, Jakub Kicinski
  Cc: linux-serial, linux-rt-users, linux-kernel, stable



On 2016-02-18 18:26, Josh Cartwright wrote:
> The use of IRQF_ONESHOT when registering an interrupt handler with
> request_irq() is non-sensical.
>
> Not only that, it also prevents the handler from being threaded when it
> otherwise should be w/ IRQ_FORCED_THREADING is enabled.  This causes the
> following deadlock observed by Sean Nyekjaer on -rt:
>
> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> [..]
>     rt_spin_lock_slowlock from (queue_kthread_work+0x18/0x74)
>     queue_kthread_work) from (sc16is7xx_irq+0x10/0x18 [sc16is7xx])
>     sc16is7xx_irq [sc16is7xx]) from (handle_irq_event_percpu+0x70/0x158)
>     handle_irq_event_percpu) from (handle_irq_event+0x68/0xa8)
>     handle_irq_event) from (handle_level_irq+0x10c/0x184)
>     handle_level_irq) from (generic_handle_irq+0x2c/0x3c)
>     generic_handle_irq) from (mxc_gpio_irq_handler+0x3c/0x108)
>     mxc_gpio_irq_handler) from (mx3_gpio_irq_handler+0x80/0xcc)
>     mx3_gpio_irq_handler) from (generic_handle_irq+0x2c/0x3c)
>     generic_handle_irq) from (__handle_domain_irq+0x7c/0xe8)
>     __handle_domain_irq) from (gic_handle_irq+0x24/0x5c)
>     gic_handle_irq) from (__irq_svc+0x40/0x88)
>     (__irq_svc) from (rt_spin_unlock+0x1c/0x68)
>     (rt_spin_unlock) from (kthread_worker_fn+0x104/0x17c)
>     (kthread_worker_fn) from (kthread+0xd0/0xe8)
>     (kthread) from (ret_from_fork+0x14/0x2c)
>
> Reported-by: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
> Fixes: 9e6f4ca3e567 ("sc16is7xx: use kthread_worker for tx_work and irq")
> Cc: stable@vger.kernel.org # v4.1+
> Signed-off-by: Josh Cartwright <joshc@ni.com>
> ---
>   drivers/tty/serial/sc16is7xx.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> index e78fa99..1bb5c0e 100644
> --- a/drivers/tty/serial/sc16is7xx.c
> +++ b/drivers/tty/serial/sc16is7xx.c
> @@ -1257,7 +1257,7 @@ static int sc16is7xx_probe(struct device *dev,
>   
>   	/* Setup interrupt */
>   	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
> -			       IRQF_ONESHOT | flags, dev_name(dev), s);
> +			       flags, dev_name(dev), s);
>   	if (!ret)
>   		return 0;
>   
Is PATCH this dropped?
It clearly fixes the irq when running with RT patches :-)

/Sean

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sc16is7xx: drop bogus use of IRQF_ONESHOT
@ 2016-03-08 12:43                     ` Sean Nyekjær
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Nyekjær @ 2016-03-08 12:43 UTC (permalink / raw)
  To: Josh Cartwright, Jakub Kicinski
  Cc: linux-serial, linux-rt-users, linux-kernel, stable



On 2016-02-18 18:26, Josh Cartwright wrote:
> The use of IRQF_ONESHOT when registering an interrupt handler with
> request_irq() is non-sensical.
>
> Not only that, it also prevents the handler from being threaded when it
> otherwise should be w/ IRQ_FORCED_THREADING is enabled.  This causes the
> following deadlock observed by Sean Nyekjaer on -rt:
>
> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> [..]
>     rt_spin_lock_slowlock from (queue_kthread_work+0x18/0x74)
>     queue_kthread_work) from (sc16is7xx_irq+0x10/0x18 [sc16is7xx])
>     sc16is7xx_irq [sc16is7xx]) from (handle_irq_event_percpu+0x70/0x158)
>     handle_irq_event_percpu) from (handle_irq_event+0x68/0xa8)
>     handle_irq_event) from (handle_level_irq+0x10c/0x184)
>     handle_level_irq) from (generic_handle_irq+0x2c/0x3c)
>     generic_handle_irq) from (mxc_gpio_irq_handler+0x3c/0x108)
>     mxc_gpio_irq_handler) from (mx3_gpio_irq_handler+0x80/0xcc)
>     mx3_gpio_irq_handler) from (generic_handle_irq+0x2c/0x3c)
>     generic_handle_irq) from (__handle_domain_irq+0x7c/0xe8)
>     __handle_domain_irq) from (gic_handle_irq+0x24/0x5c)
>     gic_handle_irq) from (__irq_svc+0x40/0x88)
>     (__irq_svc) from (rt_spin_unlock+0x1c/0x68)
>     (rt_spin_unlock) from (kthread_worker_fn+0x104/0x17c)
>     (kthread_worker_fn) from (kthread+0xd0/0xe8)
>     (kthread) from (ret_from_fork+0x14/0x2c)
>
> Reported-by: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
> Fixes: 9e6f4ca3e567 ("sc16is7xx: use kthread_worker for tx_work and irq")
> Cc: stable@vger.kernel.org # v4.1+
> Signed-off-by: Josh Cartwright <joshc@ni.com>
> ---
>   drivers/tty/serial/sc16is7xx.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> index e78fa99..1bb5c0e 100644
> --- a/drivers/tty/serial/sc16is7xx.c
> +++ b/drivers/tty/serial/sc16is7xx.c
> @@ -1257,7 +1257,7 @@ static int sc16is7xx_probe(struct device *dev,
>   
>   	/* Setup interrupt */
>   	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
> -			       IRQF_ONESHOT | flags, dev_name(dev), s);
> +			       flags, dev_name(dev), s);
>   	if (!ret)
>   		return 0;
>   
Is PATCH this dropped?
It clearly fixes the irq when running with RT patches :-)

/Sean

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sc16is7xx: drop bogus use of IRQF_ONESHOT
  2016-02-18 17:26                 ` [PATCH] sc16is7xx: drop bogus use of IRQF_ONESHOT Josh Cartwright
  2016-03-08 12:43                     ` Sean Nyekjær
@ 2016-09-09 13:14                   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-09 13:14 UTC (permalink / raw)
  To: Josh Cartwright
  Cc: Jakub Kicinski, linux-serial, linux-rt-users, Sean Nyekjaer,
	linux-kernel

On 2016-02-18 11:26:12 [-0600], Josh Cartwright wrote:
> The use of IRQF_ONESHOT when registering an interrupt handler with
> request_irq() is non-sensical.
> 
> Not only that, it also prevents the handler from being threaded when it
> otherwise should be w/ IRQ_FORCED_THREADING is enabled.  This causes the
> following deadlock observed by Sean Nyekjaer on -rt:

So we this patch 
  http://lkml.iu.edu/hypermail/linux/kernel/1602.2/02637.html

sitting in -RT. Then someone posted
  http://www.spinics.net/lists/linux-rt-users/msg14975.html
which depends on it.
Before I was are of the first patch, I posted
  http://www.spinics.net/lists/linux-rt-users/msg14666.html

and nobody seems to care if anything of this gets merged upstream
where it belongs. Could some please carry this to Greg upstream?

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-09-09 13:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1455523133-29895-1-git-send-email-sean.nyekjaer@prevas.dk>
     [not found] ` <20160215220810.11529101@laptop>
     [not found]   ` <56C2D1A7.1030908@prevas.dk>
     [not found]     ` <20160216210250.34c2f07f@laptop>
     [not found]       ` <56C41FA5.4010605@prevas.dk>
2016-02-17 21:54         ` [PATCH] sc16is7xx: switch to threaded irq for fixing RT issues Jakub Kicinski
2016-02-17 21:54           ` Jakub Kicinski
2016-02-17 23:25           ` Josh Cartwright
2016-02-18  7:29             ` Sean Nyekjær
2016-02-18  7:29               ` Sean Nyekjær
2016-02-18  8:46               ` Jakub Kicinski
2016-02-18  8:46                 ` Jakub Kicinski
2016-02-18 17:26                 ` [PATCH] sc16is7xx: drop bogus use of IRQF_ONESHOT Josh Cartwright
2016-03-08 12:43                   ` Sean Nyekjær
2016-03-08 12:43                     ` Sean Nyekjær
2016-09-09 13:14                   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.