* cec kernel oops with pulse8 usb cec adapter @ 2018-10-20 13:26 Torbjorn Jansson 2018-10-20 21:12 ` Hans Verkuil 0 siblings, 1 reply; 15+ messages in thread From: Torbjorn Jansson @ 2018-10-20 13:26 UTC (permalink / raw) To: Linux Media Mailing List Hello i'm using the pulse8 usb cec adapter to control my tv. i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. this i think matches a kernel oops i'm seeing that i suspect is related to this. i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. also every now and then i see this in dmesg: cec cec0: transmit: failed 05 cec cec0: transmit: failed 06 but that doesn't appear to do any harm as far as i can tell. any idea whats causing the oops? the ops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 Workqueue: events pulse8_irq_work_handler [pulse8_cec] RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] Code: 8d ae b4 07 00 00 49 81 c6 b8 07 00 00 53 e8 4a df c3 d5 48 89 ef 49 89 45 00 e8 4e 84 41 d6 49 8b 1e 49 89 c4 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 RSP: 0018:ffffaa10e3c07d58 EFLAGS: 00010017 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 RDX: 0000000000000001 RSI: 00316245397fa93c RDI: ffff966d31c8d7b4 RBP: ffff966d31c8d7b4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000003 R11: ffffaa10e3c07e28 R12: 0000000000000002 R13: ffffaa10e3c07d88 R14: ffff966d31c8d7b8 R15: 0000000000000073 FS: 0000000000000000(0000) GS:ffff966d3f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 00000009d820a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ir_do_keydown+0x75/0x260 [rc_core] rc_keydown+0x54/0xc0 [rc_core] cec_received_msg_ts+0xaa8/0xaf0 [cec] process_one_work+0x1a1/0x350 worker_thread+0x30/0x380 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 Modules linked in: rc_tt_1500 dvb_usb_dvbsky dvb_usb_v2 uas usb_storage fuse vhost_net vhost tap xt_CHECKSUM iptable_mangle ip6t_REJECT nf_reject_ipv6 tun 8021q garp mrp xt_nat macvlan xfs devlink ebta si2157 si2168 cx25840 cx23885 kvm altera_ci tda18271 joydev ir_rc6_decoder rc_rc6_mce crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore altera_stapl m88ds3103 tveeprom cx2341 mxm_wmi igb crc32c_intel megaraid_sas dca i2c_algo_bit wmi vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio i2c_dev CR2: 0000000000000038 ---[ end trace 6cea307e2666b11a ]--- RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] Code: 8d ae b4 07 00 00 49 81 c6 b8 07 00 00 53 e8 4a df c3 d5 48 89 ef 49 89 45 00 e8 4e 84 41 d6 49 8b 1e 49 89 c4 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 RSP: 0018:ffffaa10e3c07d58 EFLAGS: 00010017 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 RDX: 0000000000000001 RSI: 00316245397fa93c RDI: ffff966d31c8d7b4 RBP: ffff966d31c8d7b4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000003 R11: ffffaa10e3c07e28 R12: 0000000000000002 R13: ffffaa10e3c07d88 R14: ffff966d31c8d7b8 R15: 0000000000000073 FS: 0000000000000000(0000) GS:ffff966d3f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 00000009d820a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: cec kernel oops with pulse8 usb cec adapter 2018-10-20 13:26 cec kernel oops with pulse8 usb cec adapter Torbjorn Jansson @ 2018-10-20 21:12 ` Hans Verkuil 2018-10-22 8:59 ` Sean Young 0 siblings, 1 reply; 15+ messages in thread From: Hans Verkuil @ 2018-10-20 21:12 UTC (permalink / raw) To: Sean Young; +Cc: Torbjorn Jansson, Linux Media Mailing List Hi Sean, Can you take a look at this, it appears to be an RC issue, see my analysis below. On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: > Hello > > i'm using the pulse8 usb cec adapter to control my tv. > i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. > this i think matches a kernel oops i'm seeing that i suspect is related to this. > > i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. > i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. > > > also every now and then i see this in dmesg: > cec cec0: transmit: failed 05 > cec cec0: transmit: failed 06 > but that doesn't appear to do any harm as far as i can tell. > > any idea whats causing the oops? > > the ops: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > PGD 0 P4D 0 > Oops: 0000 [#1] SMP PTI > CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 > Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 > Workqueue: events pulse8_irq_work_handler [pulse8_cec] > RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called from rc_register_device(), except when the protocol is CEC: /* Ensure that the lirc kfifo is setup before we start the thread */ if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { rc = ir_lirc_register(dev); if (rc < 0) goto out_rx; } So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never initialized. Could this be fall-out from the lirc changes you did not too long ago? Regards, Hans > Code: 8d ae b4 07 00 00 49 81 c6 b8 07 00 00 53 e8 4a df c3 d5 48 89 ef 49 89 45 00 e8 4e 84 41 d6 49 8b 1e 49 89 c4 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 > RSP: 0018:ffffaa10e3c07d58 EFLAGS: 00010017 > RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 > RDX: 0000000000000001 RSI: 00316245397fa93c RDI: ffff966d31c8d7b4 > RBP: ffff966d31c8d7b4 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000003 R11: ffffaa10e3c07e28 R12: 0000000000000002 > R13: ffffaa10e3c07d88 R14: ffff966d31c8d7b8 R15: 0000000000000073 > FS: 0000000000000000(0000) GS:ffff966d3f440000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000038 CR3: 00000009d820a003 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > ir_do_keydown+0x75/0x260 [rc_core] > rc_keydown+0x54/0xc0 [rc_core] > cec_received_msg_ts+0xaa8/0xaf0 [cec] > process_one_work+0x1a1/0x350 > worker_thread+0x30/0x380 > ? pwq_unbound_release_workfn+0xd0/0xd0 > kthread+0x112/0x130 > ? kthread_create_worker_on_cpu+0x70/0x70 > ret_from_fork+0x35/0x40 > Modules linked in: rc_tt_1500 dvb_usb_dvbsky dvb_usb_v2 uas usb_storage fuse vhost_net vhost tap xt_CHECKSUM iptable_mangle ip6t_REJECT nf_reject_ipv6 tun 8021q garp mrp xt_nat macvlan xfs devlink ebta > si2157 si2168 cx25840 cx23885 kvm altera_ci tda18271 joydev ir_rc6_decoder rc_rc6_mce crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore altera_stapl m88ds3103 tveeprom cx2341 > mxm_wmi igb crc32c_intel megaraid_sas dca i2c_algo_bit wmi vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio i2c_dev > CR2: 0000000000000038 > ---[ end trace 6cea307e2666b11a ]--- > RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] > Code: 8d ae b4 07 00 00 49 81 c6 b8 07 00 00 53 e8 4a df c3 d5 48 89 ef 49 89 45 00 e8 4e 84 41 d6 49 8b 1e 49 89 c4 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 > RSP: 0018:ffffaa10e3c07d58 EFLAGS: 00010017 > RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 > RDX: 0000000000000001 RSI: 00316245397fa93c RDI: ffff966d31c8d7b4 > RBP: ffff966d31c8d7b4 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000003 R11: ffffaa10e3c07e28 R12: 0000000000000002 > R13: ffffaa10e3c07d88 R14: ffff966d31c8d7b8 R15: 0000000000000073 > FS: 0000000000000000(0000) GS:ffff966d3f440000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000038 CR3: 00000009d820a003 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: cec kernel oops with pulse8 usb cec adapter 2018-10-20 21:12 ` Hans Verkuil @ 2018-10-22 8:59 ` Sean Young 2018-10-22 9:44 ` Torbjorn Jansson 2018-11-14 13:00 ` cec kernel oops with pulse8 usb cec adapter Torbjorn Jansson 0 siblings, 2 replies; 15+ messages in thread From: Sean Young @ 2018-10-22 8:59 UTC (permalink / raw) To: Hans Verkuil; +Cc: Torbjorn Jansson, Linux Media Mailing List On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: > Hi Sean, > > Can you take a look at this, it appears to be an RC issue, see my analysis below. > > On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: > > Hello > > > > i'm using the pulse8 usb cec adapter to control my tv. > > i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. > > this i think matches a kernel oops i'm seeing that i suspect is related to this. > > > > i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. > > i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. > > > > > > also every now and then i see this in dmesg: > > cec cec0: transmit: failed 05 > > cec cec0: transmit: failed 06 > > but that doesn't appear to do any harm as far as i can tell. > > > > any idea whats causing the oops? > > > > the ops: > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > > PGD 0 P4D 0 > > Oops: 0000 [#1] SMP PTI > > CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 > > Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 > > Workqueue: events pulse8_irq_work_handler [pulse8_cec] > > RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] > > Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); > > The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called > from rc_register_device(), except when the protocol is CEC: > > /* Ensure that the lirc kfifo is setup before we start the thread */ > if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { > rc = ir_lirc_register(dev); > if (rc < 0) > goto out_rx; > } > > So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never > initialized. > > Could this be fall-out from the lirc changes you did not too long ago? Yes, this is broken. My bad, sorry. I think this must have been broken since v4.16. I can write a patch but I don't have a patch but I'm on the train to ELCE in Edinburgh now, with no hardware to test on. Sean ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: cec kernel oops with pulse8 usb cec adapter 2018-10-22 8:59 ` Sean Young @ 2018-10-22 9:44 ` Torbjorn Jansson 2018-10-22 10:14 ` Sean Young 2018-11-14 13:00 ` cec kernel oops with pulse8 usb cec adapter Torbjorn Jansson 1 sibling, 1 reply; 15+ messages in thread From: Torbjorn Jansson @ 2018-10-22 9:44 UTC (permalink / raw) To: Sean Young, Hans Verkuil; +Cc: Linux Media Mailing List On 2018-10-22 10:59, Sean Young wrote: > On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: >> Hi Sean, >> >> Can you take a look at this, it appears to be an RC issue, see my analysis below. >> >> On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: >>> Hello >>> >>> i'm using the pulse8 usb cec adapter to control my tv. >>> i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. >>> this i think matches a kernel oops i'm seeing that i suspect is related to this. >>> >>> i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. >>> i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. >>> >>> >>> also every now and then i see this in dmesg: >>> cec cec0: transmit: failed 05 >>> cec cec0: transmit: failed 06 >>> but that doesn't appear to do any harm as far as i can tell. >>> >>> any idea whats causing the oops? >>> >>> the ops: >>> >>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 >>> PGD 0 P4D 0 >>> Oops: 0000 [#1] SMP PTI >>> CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 >>> Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 >>> Workqueue: events pulse8_irq_work_handler [pulse8_cec] >>> RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] >> >> Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); >> >> The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called >> from rc_register_device(), except when the protocol is CEC: >> >> /* Ensure that the lirc kfifo is setup before we start the thread */ >> if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { >> rc = ir_lirc_register(dev); >> if (rc < 0) >> goto out_rx; >> } >> >> So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never >> initialized. >> >> Could this be fall-out from the lirc changes you did not too long ago? > > Yes, this is broken. My bad, sorry. I think this must have been broken since > v4.16. I can write a patch but I don't have a patch but I'm on the train > to ELCE in Edinburgh now, with no hardware to test on. > > > Sean > the kernel oops have been happening for a while now. yesterday when i checked my logs i can see them at least back a couple of months when i was running 4.17 also my scripts to poll status of my tv and turn it on/off works for a while so it doesn't crash right away. maybe it only crashes when i send cec command to turn on/off tv and only polling for status is no problem. i think i have a separate issue too because i had problems even before the kernel oopses started. but i suspect this is caused by my tv locking up the cec bus because unplugging power to tv for a few minutes (i must wait or it will still be just as broken) and then back used to resolve the cec errors from my scripts. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] media: rc: cec devices do not have a lirc chardev 2018-10-22 9:44 ` Torbjorn Jansson @ 2018-10-22 10:14 ` Sean Young 0 siblings, 0 replies; 15+ messages in thread From: Sean Young @ 2018-10-22 10:14 UTC (permalink / raw) To: Torbjorn Jansson; +Cc: Hans Verkuil, Linux Media Mailing List, stable On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote: > On 2018-10-22 10:59, Sean Young wrote: > > On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: > > > Hi Sean, > > > > > > Can you take a look at this, it appears to be an RC issue, see my analysis below. > > > > > > On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: > > > > Hello > > > > > > > > i'm using the pulse8 usb cec adapter to control my tv. > > > > i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. > > > > this i think matches a kernel oops i'm seeing that i suspect is related to this. > > > > > > > > i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. > > > > i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. > > > > > > > > > > > > also every now and then i see this in dmesg: > > > > cec cec0: transmit: failed 05 > > > > cec cec0: transmit: failed 06 > > > > but that doesn't appear to do any harm as far as i can tell. > > > > > > > > any idea whats causing the oops? > > > > > > > > the ops: > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > > > > PGD 0 P4D 0 > > > > Oops: 0000 [#1] SMP PTI > > > > CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 > > > > Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 > > > > Workqueue: events pulse8_irq_work_handler [pulse8_cec] > > > > RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] > > > > > > Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); > > > > > > The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called > > > from rc_register_device(), except when the protocol is CEC: > > > > > > /* Ensure that the lirc kfifo is setup before we start the thread */ > > > if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { > > > rc = ir_lirc_register(dev); > > > if (rc < 0) > > > goto out_rx; > > > } > > > > > > So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never > > > initialized. > > > > > > Could this be fall-out from the lirc changes you did not too long ago? > > > > Yes, this is broken. My bad, sorry. I think this must have been broken since > > v4.16. I can write a patch but I don't have a patch but I'm on the train > > to ELCE in Edinburgh now, with no hardware to test on. > > > > > > Sean > > > > the kernel oops have been happening for a while now. > yesterday when i checked my logs i can see them at least back a couple of > months when i was running 4.17 > > also my scripts to poll status of my tv and turn it on/off works for a while > so it doesn't crash right away. > maybe it only crashes when i send cec command to turn on/off tv and only > polling for status is no problem. > > > i think i have a separate issue too because i had problems even before the > kernel oopses started. > but i suspect this is caused by my tv locking up the cec bus because > unplugging power to tv for a few minutes (i must wait or it will still be > just as broken) and then back used to resolve the cec errors from my > scripts. Would you be able to test the following patch please? Thanks, Sean --- >From 1b8b20b606b30c0e301c80e18af8d77194269bc1 Mon Sep 17 00:00:00 2001 From: Sean Young <sean@mess.org> Date: Mon, 22 Oct 2018 10:01:50 +0100 Subject: [PATCH] media: rc: cec devices do not have a lirc chardev MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This fixes an oops in ir_lirc_scancode_event(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 Workqueue: events pulse8_irq_work_handler [pulse8_cec] RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] Code: 8d ae b4 07 00 00 49 81 c6 b8 07 00 00 53 e8 4a df c3 d5 48 89 ef 49 89 45 00 e8 4e 84 41 d6 49 8b 1e 49 89 c4 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 RSP: 0018:ffffaa10e3c07d58 EFLAGS: 00010017 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 RDX: 0000000000000001 RSI: 00316245397fa93c RDI: ffff966d31c8d7b4 RBP: ffff966d31c8d7b4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000003 R11: ffffaa10e3c07e28 R12: 0000000000000002 R13: ffffaa10e3c07d88 R14: ffff966d31c8d7b8 R15: 0000000000000073 FS: 0000000000000000(0000) GS:ffff966d3f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 00000009d820a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ir_do_keydown+0x75/0x260 [rc_core] rc_keydown+0x54/0xc0 [rc_core] cec_received_msg_ts+0xaa8/0xaf0 [cec] process_one_work+0x1a1/0x350 worker_thread+0x30/0x380 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 Modules linked in: rc_tt_1500 dvb_usb_dvbsky dvb_usb_v2 uas usb_storage fuse vhost_net vhost tap xt_CHECKSUM iptable_mangle ip6t_REJECT nf_reject_ipv6 tun 8021q garp mrp xt_nat macvlan xfs devlink ebta si2157 si2168 cx25840 cx23885 kvm altera_ci tda18271 joydev ir_rc6_decoder rc_rc6_mce crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore altera_stapl m88ds3103 tveeprom cx2341 mxm_wmi igb crc32c_intel megaraid_sas dca i2c_algo_bit wmi vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio i2c_dev CR2: 0000000000000038 Cc: <stable@vger.kernel.org> # v4.16+ Signed-off-by: Sean Young <sean@mess.org> --- drivers/media/rc/rc-main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c index 552bbe82a160..877978dbd409 100644 --- a/drivers/media/rc/rc-main.c +++ b/drivers/media/rc/rc-main.c @@ -695,7 +695,8 @@ void rc_repeat(struct rc_dev *dev) (dev->last_toggle ? LIRC_SCANCODE_FLAG_TOGGLE : 0) }; - ir_lirc_scancode_event(dev, &sc); + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) + ir_lirc_scancode_event(dev, &sc); spin_lock_irqsave(&dev->keylock, flags); @@ -735,7 +736,8 @@ static void ir_do_keydown(struct rc_dev *dev, enum rc_proto protocol, .keycode = keycode }; - ir_lirc_scancode_event(dev, &sc); + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) + ir_lirc_scancode_event(dev, &sc); if (new_event && dev->keypressed) ir_do_keyup(dev, false); -- 2.17.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH] media: rc: cec devices do not have a lirc chardev @ 2018-10-22 10:14 ` Sean Young 0 siblings, 0 replies; 15+ messages in thread From: Sean Young @ 2018-10-22 10:14 UTC (permalink / raw) To: Torbjorn Jansson; +Cc: Hans Verkuil, Linux Media Mailing List, stable On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote: > On 2018-10-22 10:59, Sean Young wrote: > > On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: > > > Hi Sean, > > > > > > Can you take a look at this, it appears to be an RC issue, see my analysis below. > > > > > > On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: > > > > Hello > > > > > > > > i'm using the pulse8 usb cec adapter to control my tv. > > > > i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. > > > > this i think matches a kernel oops i'm seeing that i suspect is related to this. > > > > > > > > i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. > > > > i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. > > > > > > > > > > > > also every now and then i see this in dmesg: > > > > cec cec0: transmit: failed 05 > > > > cec cec0: transmit: failed 06 > > > > but that doesn't appear to do any harm as far as i can tell. > > > > > > > > any idea whats causing the oops? > > > > > > > > the ops: > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > > > > PGD 0 P4D 0 > > > > Oops: 0000 [#1] SMP PTI > > > > CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P���������� OE 4.18.12-200.fc28.x86_64 #1 > > > > Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 > > > > Workqueue: events pulse8_irq_work_handler [pulse8_cec] > > > > RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] > > > > > > Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); > > > > > > The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called > > > from rc_register_device(), except when the protocol is CEC: > > > > > > /* Ensure that the lirc kfifo is setup before we start the thread */ > > > if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { > > > rc = ir_lirc_register(dev); > > > if (rc < 0) > > > goto out_rx; > > > } > > > > > > So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never > > > initialized. > > > > > > Could this be fall-out from the lirc changes you did not too long ago? > > > > Yes, this is broken. My bad, sorry. I think this must have been broken since > > v4.16. I can write a patch but I don't have a patch but I'm on the train > > to ELCE in Edinburgh now, with no hardware to test on. > > > > > > Sean > > > > the kernel oops have been happening for a while now. > yesterday when i checked my logs i can see them at least back a couple of > months when i was running 4.17 > > also my scripts to poll status of my tv and turn it on/off works for a while > so it doesn't crash right away. > maybe it only crashes when i send cec command to turn on/off tv and only > polling for status is no problem. > > > i think i have a separate issue too because i had problems even before the > kernel oopses started. > but i suspect this is caused by my tv locking up the cec bus because > unplugging power to tv for a few minutes (i must wait or it will still be > just as broken) and then back used to resolve the cec errors from my > scripts. Would you be able to test the following patch please? Thanks, Sean --- >From 1b8b20b606b30c0e301c80e18af8d77194269bc1 Mon Sep 17 00:00:00 2001 From: Sean Young <sean@mess.org> Date: Mon, 22 Oct 2018 10:01:50 +0100 Subject: [PATCH] media: rc: cec devices do not have a lirc chardev MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This fixes an oops in ir_lirc_scancode_event(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P���������� OE 4.18.12-200.fc28.x86_64 #1 Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 Workqueue: events pulse8_irq_work_handler [pulse8_cec] RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] Code: 8d ae b4 07 00 00 49 81 c6 b8 07 00 00 53 e8 4a df c3 d5 48 89 ef 49 89 45 00 e8 4e 84 41 d6 49 8b 1e 49 89 c4 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 RSP: 0018:ffffaa10e3c07d58 EFLAGS: 00010017 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 RDX: 0000000000000001 RSI: 00316245397fa93c RDI: ffff966d31c8d7b4 RBP: ffff966d31c8d7b4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000003 R11: ffffaa10e3c07e28 R12: 0000000000000002 R13: ffffaa10e3c07d88 R14: ffff966d31c8d7b8 R15: 0000000000000073 FS:� 0000000000000000(0000) GS:ffff966d3f440000(0000) knlGS:0000000000000000 CS:� 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 00000009d820a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: �ir_do_keydown+0x75/0x260 [rc_core] �rc_keydown+0x54/0xc0 [rc_core] �cec_received_msg_ts+0xaa8/0xaf0 [cec] �process_one_work+0x1a1/0x350 �worker_thread+0x30/0x380 �? pwq_unbound_release_workfn+0xd0/0xd0 �kthread+0x112/0x130 �? kthread_create_worker_on_cpu+0x70/0x70 �ret_from_fork+0x35/0x40 Modules linked in: rc_tt_1500 dvb_usb_dvbsky dvb_usb_v2 uas usb_storage fuse vhost_net vhost tap xt_CHECKSUM iptable_mangle ip6t_REJECT nf_reject_ipv6 tun 8021q garp mrp xt_nat macvlan xfs devlink ebta �si2157 si2168 cx25840 cx23885 kvm altera_ci tda18271 joydev ir_rc6_decoder rc_rc6_mce crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore altera_stapl m88ds3103 tveeprom cx2341 �mxm_wmi igb crc32c_intel megaraid_sas dca i2c_algo_bit wmi vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio i2c_dev CR2: 0000000000000038 Cc: <stable@vger.kernel.org> # v4.16+ Signed-off-by: Sean Young <sean@mess.org> --- drivers/media/rc/rc-main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c index 552bbe82a160..877978dbd409 100644 --- a/drivers/media/rc/rc-main.c +++ b/drivers/media/rc/rc-main.c @@ -695,7 +695,8 @@ void rc_repeat(struct rc_dev *dev) (dev->last_toggle ? LIRC_SCANCODE_FLAG_TOGGLE : 0) }; - ir_lirc_scancode_event(dev, &sc); + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) + ir_lirc_scancode_event(dev, &sc); spin_lock_irqsave(&dev->keylock, flags); @@ -735,7 +736,8 @@ static void ir_do_keydown(struct rc_dev *dev, enum rc_proto protocol, .keycode = keycode }; - ir_lirc_scancode_event(dev, &sc); + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) + ir_lirc_scancode_event(dev, &sc); if (new_event && dev->keypressed) ir_do_keyup(dev, false); -- 2.17.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] media: rc: cec devices do not have a lirc chardev 2018-10-22 10:14 ` Sean Young (?) @ 2018-10-22 11:30 ` Hans Verkuil 2018-10-22 12:28 ` Sean Young -1 siblings, 1 reply; 15+ messages in thread From: Hans Verkuil @ 2018-10-22 11:30 UTC (permalink / raw) To: Sean Young, Torbjorn Jansson; +Cc: Linux Media Mailing List, stable On 10/22/2018 11:14 AM, Sean Young wrote: > On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote: >> On 2018-10-22 10:59, Sean Young wrote: >>> On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: >>>> Hi Sean, >>>> >>>> Can you take a look at this, it appears to be an RC issue, see my analysis below. >>>> >>>> On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: >>>>> Hello >>>>> >>>>> i'm using the pulse8 usb cec adapter to control my tv. >>>>> i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. >>>>> this i think matches a kernel oops i'm seeing that i suspect is related to this. >>>>> >>>>> i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. >>>>> i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. >>>>> >>>>> >>>>> also every now and then i see this in dmesg: >>>>> cec cec0: transmit: failed 05 >>>>> cec cec0: transmit: failed 06 >>>>> but that doesn't appear to do any harm as far as i can tell. >>>>> >>>>> any idea whats causing the oops? >>>>> >>>>> the ops: >>>>> >>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 >>>>> PGD 0 P4D 0 >>>>> Oops: 0000 [#1] SMP PTI >>>>> CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 >>>>> Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 >>>>> Workqueue: events pulse8_irq_work_handler [pulse8_cec] >>>>> RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] >>>> >>>> Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); >>>> >>>> The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called >>>> from rc_register_device(), except when the protocol is CEC: >>>> >>>> /* Ensure that the lirc kfifo is setup before we start the thread */ >>>> if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { >>>> rc = ir_lirc_register(dev); >>>> if (rc < 0) >>>> goto out_rx; >>>> } >>>> >>>> So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never >>>> initialized. >>>> >>>> Could this be fall-out from the lirc changes you did not too long ago? >>> >>> Yes, this is broken. My bad, sorry. I think this must have been broken since >>> v4.16. I can write a patch but I don't have a patch but I'm on the train >>> to ELCE in Edinburgh now, with no hardware to test on. >>> >>> >>> Sean >>> >> >> the kernel oops have been happening for a while now. >> yesterday when i checked my logs i can see them at least back a couple of >> months when i was running 4.17 >> >> also my scripts to poll status of my tv and turn it on/off works for a while >> so it doesn't crash right away. >> maybe it only crashes when i send cec command to turn on/off tv and only >> polling for status is no problem. >> >> >> i think i have a separate issue too because i had problems even before the >> kernel oopses started. >> but i suspect this is caused by my tv locking up the cec bus because >> unplugging power to tv for a few minutes (i must wait or it will still be >> just as broken) and then back used to resolve the cec errors from my >> scripts. > > > Would you be able to test the following patch please? Sean, I think you should be able to test this with the vivid driver. Load the vivid driver, run: cec-ctl --tv; cec-ctl -d1 --playback Then: cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 That said, I tried this, but it doesn't crash for me, but perhaps I need to run some RC command first... Regards, Hans > > Thanks, > > Sean > --- > From 1b8b20b606b30c0e301c80e18af8d77194269bc1 Mon Sep 17 00:00:00 2001 > From: Sean Young <sean@mess.org> > Date: Mon, 22 Oct 2018 10:01:50 +0100 > Subject: [PATCH] media: rc: cec devices do not have a lirc chardev > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > This fixes an oops in ir_lirc_scancode_event(). > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > PGD 0 P4D 0 > Oops: 0000 [#1] SMP PTI > CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 > Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 > Workqueue: events pulse8_irq_work_handler [pulse8_cec] > RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] > Code: 8d ae b4 07 00 00 49 81 c6 b8 07 00 00 53 e8 4a df c3 d5 48 89 ef 49 89 45 00 e8 4e 84 41 d6 49 8b 1e 49 89 c4 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 > RSP: 0018:ffffaa10e3c07d58 EFLAGS: 00010017 > RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 > RDX: 0000000000000001 RSI: 00316245397fa93c RDI: ffff966d31c8d7b4 > RBP: ffff966d31c8d7b4 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000003 R11: ffffaa10e3c07e28 R12: 0000000000000002 > R13: ffffaa10e3c07d88 R14: ffff966d31c8d7b8 R15: 0000000000000073 > FS: 0000000000000000(0000) GS:ffff966d3f440000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000038 CR3: 00000009d820a003 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > ir_do_keydown+0x75/0x260 [rc_core] > rc_keydown+0x54/0xc0 [rc_core] > cec_received_msg_ts+0xaa8/0xaf0 [cec] > process_one_work+0x1a1/0x350 > worker_thread+0x30/0x380 > ? pwq_unbound_release_workfn+0xd0/0xd0 > kthread+0x112/0x130 > ? kthread_create_worker_on_cpu+0x70/0x70 > ret_from_fork+0x35/0x40 > Modules linked in: rc_tt_1500 dvb_usb_dvbsky dvb_usb_v2 uas usb_storage fuse vhost_net vhost tap xt_CHECKSUM iptable_mangle ip6t_REJECT nf_reject_ipv6 tun 8021q garp mrp xt_nat macvlan xfs devlink ebta > si2157 si2168 cx25840 cx23885 kvm altera_ci tda18271 joydev ir_rc6_decoder rc_rc6_mce crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore altera_stapl m88ds3103 tveeprom cx2341 > mxm_wmi igb crc32c_intel megaraid_sas dca i2c_algo_bit wmi vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio i2c_dev > CR2: 0000000000000038 > > Cc: <stable@vger.kernel.org> # v4.16+ > Signed-off-by: Sean Young <sean@mess.org> > --- > drivers/media/rc/rc-main.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c > index 552bbe82a160..877978dbd409 100644 > --- a/drivers/media/rc/rc-main.c > +++ b/drivers/media/rc/rc-main.c > @@ -695,7 +695,8 @@ void rc_repeat(struct rc_dev *dev) > (dev->last_toggle ? LIRC_SCANCODE_FLAG_TOGGLE : 0) > }; > > - ir_lirc_scancode_event(dev, &sc); > + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) > + ir_lirc_scancode_event(dev, &sc); > > spin_lock_irqsave(&dev->keylock, flags); > > @@ -735,7 +736,8 @@ static void ir_do_keydown(struct rc_dev *dev, enum rc_proto protocol, > .keycode = keycode > }; > > - ir_lirc_scancode_event(dev, &sc); > + if (dev->allowed_protocols != RC_PROTO_BIT_CEC) > + ir_lirc_scancode_event(dev, &sc); > > if (new_event && dev->keypressed) > ir_do_keyup(dev, false); > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] media: rc: cec devices do not have a lirc chardev 2018-10-22 11:30 ` Hans Verkuil @ 2018-10-22 12:28 ` Sean Young 0 siblings, 0 replies; 15+ messages in thread From: Sean Young @ 2018-10-22 12:28 UTC (permalink / raw) To: Hans Verkuil; +Cc: Torbjorn Jansson, Linux Media Mailing List, stable On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote: > On 10/22/2018 11:14 AM, Sean Young wrote: > > On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote: > >> On 2018-10-22 10:59, Sean Young wrote: > >>> On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: > >>>> Hi Sean, > >>>> > >>>> Can you take a look at this, it appears to be an RC issue, see my analysis below. > >>>> > >>>> On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: > >>>>> Hello > >>>>> > >>>>> i'm using the pulse8 usb cec adapter to control my tv. > >>>>> i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. > >>>>> this i think matches a kernel oops i'm seeing that i suspect is related to this. > >>>>> > >>>>> i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. > >>>>> i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. > >>>>> > >>>>> > >>>>> also every now and then i see this in dmesg: > >>>>> cec cec0: transmit: failed 05 > >>>>> cec cec0: transmit: failed 06 > >>>>> but that doesn't appear to do any harm as far as i can tell. > >>>>> > >>>>> any idea whats causing the oops? > >>>>> > >>>>> the ops: > >>>>> > >>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > >>>>> PGD 0 P4D 0 > >>>>> Oops: 0000 [#1] SMP PTI > >>>>> CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 > >>>>> Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 > >>>>> Workqueue: events pulse8_irq_work_handler [pulse8_cec] > >>>>> RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] > >>>> > >>>> Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); > >>>> > >>>> The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called > >>>> from rc_register_device(), except when the protocol is CEC: > >>>> > >>>> /* Ensure that the lirc kfifo is setup before we start the thread */ > >>>> if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { > >>>> rc = ir_lirc_register(dev); > >>>> if (rc < 0) > >>>> goto out_rx; > >>>> } > >>>> > >>>> So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never > >>>> initialized. > >>>> > >>>> Could this be fall-out from the lirc changes you did not too long ago? > >>> > >>> Yes, this is broken. My bad, sorry. I think this must have been broken since > >>> v4.16. I can write a patch but I don't have a patch but I'm on the train > >>> to ELCE in Edinburgh now, with no hardware to test on. > >>> > >>> > >>> Sean > >>> > >> > >> the kernel oops have been happening for a while now. > >> yesterday when i checked my logs i can see them at least back a couple of > >> months when i was running 4.17 > >> > >> also my scripts to poll status of my tv and turn it on/off works for a while > >> so it doesn't crash right away. > >> maybe it only crashes when i send cec command to turn on/off tv and only > >> polling for status is no problem. > >> > >> > >> i think i have a separate issue too because i had problems even before the > >> kernel oopses started. > >> but i suspect this is caused by my tv locking up the cec bus because > >> unplugging power to tv for a few minutes (i must wait or it will still be > >> just as broken) and then back used to resolve the cec errors from my > >> scripts. > > > > > > Would you be able to test the following patch please? > > Sean, > > I think you should be able to test this with the vivid driver. Load the vivid driver, > run: > > cec-ctl --tv; cec-ctl -d1 --playback > > Then: > > cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 Ah, thanks. That will help with testing/reproducing. > That said, I tried this, but it doesn't crash for me, but perhaps I need to run > some RC command first... Hmm I think those commands should be enough. It probably needs CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now. Thanks, Sean ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] media: rc: cec devices do not have a lirc chardev @ 2018-10-22 12:28 ` Sean Young 0 siblings, 0 replies; 15+ messages in thread From: Sean Young @ 2018-10-22 12:28 UTC (permalink / raw) To: Hans Verkuil; +Cc: Torbjorn Jansson, Linux Media Mailing List, stable On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote: > On 10/22/2018 11:14 AM, Sean Young wrote: > > On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote: > >> On 2018-10-22 10:59, Sean Young wrote: > >>> On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: > >>>> Hi Sean, > >>>> > >>>> Can you take a look at this, it appears to be an RC issue, see my analysis below. > >>>> > >>>> On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: > >>>>> Hello > >>>>> > >>>>> i'm using the pulse8 usb cec adapter to control my tv. > >>>>> i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. > >>>>> this i think matches a kernel oops i'm seeing that i suspect is related to this. > >>>>> > >>>>> i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. > >>>>> i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. > >>>>> > >>>>> > >>>>> also every now and then i see this in dmesg: > >>>>> cec cec0: transmit: failed 05 > >>>>> cec cec0: transmit: failed 06 > >>>>> but that doesn't appear to do any harm as far as i can tell. > >>>>> > >>>>> any idea whats causing the oops? > >>>>> > >>>>> the ops: > >>>>> > >>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > >>>>> PGD 0 P4D 0 > >>>>> Oops: 0000 [#1] SMP PTI > >>>>> CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P���������� OE 4.18.12-200.fc28.x86_64 #1 > >>>>> Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 > >>>>> Workqueue: events pulse8_irq_work_handler [pulse8_cec] > >>>>> RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] > >>>> > >>>> Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); > >>>> > >>>> The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called > >>>> from rc_register_device(), except when the protocol is CEC: > >>>> > >>>> /* Ensure that the lirc kfifo is setup before we start the thread */ > >>>> if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { > >>>> rc = ir_lirc_register(dev); > >>>> if (rc < 0) > >>>> goto out_rx; > >>>> } > >>>> > >>>> So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never > >>>> initialized. > >>>> > >>>> Could this be fall-out from the lirc changes you did not too long ago? > >>> > >>> Yes, this is broken. My bad, sorry. I think this must have been broken since > >>> v4.16. I can write a patch but I don't have a patch but I'm on the train > >>> to ELCE in Edinburgh now, with no hardware to test on. > >>> > >>> > >>> Sean > >>> > >> > >> the kernel oops have been happening for a while now. > >> yesterday when i checked my logs i can see them at least back a couple of > >> months when i was running 4.17 > >> > >> also my scripts to poll status of my tv and turn it on/off works for a while > >> so it doesn't crash right away. > >> maybe it only crashes when i send cec command to turn on/off tv and only > >> polling for status is no problem. > >> > >> > >> i think i have a separate issue too because i had problems even before the > >> kernel oopses started. > >> but i suspect this is caused by my tv locking up the cec bus because > >> unplugging power to tv for a few minutes (i must wait or it will still be > >> just as broken) and then back used to resolve the cec errors from my > >> scripts. > > > > > > Would you be able to test the following patch please? > > Sean, > > I think you should be able to test this with the vivid driver. Load the vivid driver, > run: > > cec-ctl --tv; cec-ctl -d1 --playback > > Then: > > cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 Ah, thanks. That will help with testing/reproducing. > That said, I tried this, but it doesn't crash for me, but perhaps I need to run > some RC command first... Hmm I think those commands should be enough. It probably needs CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now. Thanks, Sean ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] media: rc: cec devices do not have a lirc chardev 2018-10-22 12:28 ` Sean Young (?) @ 2018-10-22 12:44 ` Torbjorn Jansson -1 siblings, 0 replies; 15+ messages in thread From: Torbjorn Jansson @ 2018-10-22 12:44 UTC (permalink / raw) To: Sean Young, Hans Verkuil; +Cc: Linux Media Mailing List, stable On 2018-10-22 14:28, Sean Young wrote: > On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote: >> On 10/22/2018 11:14 AM, Sean Young wrote: >>> On Mon, Oct 22, 2018 at 11:44:22AM +0200, Torbjorn Jansson wrote: >>>> On 2018-10-22 10:59, Sean Young wrote: >>>>> On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: >>>>>> Hi Sean, >>>>>> >>>>>> Can you take a look at this, it appears to be an RC issue, see my analysis below. >>>>>> >>>>>> On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: >>>>>>> Hello >>>>>>> >>>>>>> i'm using the pulse8 usb cec adapter to control my tv. >>>>>>> i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. >>>>>>> this i think matches a kernel oops i'm seeing that i suspect is related to this. >>>>>>> >>>>>>> i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. >>>>>>> i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. >>>>>>> >>>>>>> >>>>>>> also every now and then i see this in dmesg: >>>>>>> cec cec0: transmit: failed 05 >>>>>>> cec cec0: transmit: failed 06 >>>>>>> but that doesn't appear to do any harm as far as i can tell. >>>>>>> >>>>>>> any idea whats causing the oops? >>>>>>> >>>>>>> the ops: >>>>>>> >>>>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 >>>>>>> PGD 0 P4D 0 >>>>>>> Oops: 0000 [#1] SMP PTI >>>>>>> CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 >>>>>>> Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 >>>>>>> Workqueue: events pulse8_irq_work_handler [pulse8_cec] >>>>>>> RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] >>>>>> >>>>>> Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); >>>>>> >>>>>> The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called >>>>>> from rc_register_device(), except when the protocol is CEC: >>>>>> >>>>>> /* Ensure that the lirc kfifo is setup before we start the thread */ >>>>>> if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { >>>>>> rc = ir_lirc_register(dev); >>>>>> if (rc < 0) >>>>>> goto out_rx; >>>>>> } >>>>>> >>>>>> So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never >>>>>> initialized. >>>>>> >>>>>> Could this be fall-out from the lirc changes you did not too long ago? >>>>> >>>>> Yes, this is broken. My bad, sorry. I think this must have been broken since >>>>> v4.16. I can write a patch but I don't have a patch but I'm on the train >>>>> to ELCE in Edinburgh now, with no hardware to test on. >>>>> >>>>> >>>>> Sean >>>>> >>>> >>>> the kernel oops have been happening for a while now. >>>> yesterday when i checked my logs i can see them at least back a couple of >>>> months when i was running 4.17 >>>> >>>> also my scripts to poll status of my tv and turn it on/off works for a while >>>> so it doesn't crash right away. >>>> maybe it only crashes when i send cec command to turn on/off tv and only >>>> polling for status is no problem. >>>> >>>> >>>> i think i have a separate issue too because i had problems even before the >>>> kernel oopses started. >>>> but i suspect this is caused by my tv locking up the cec bus because >>>> unplugging power to tv for a few minutes (i must wait or it will still be >>>> just as broken) and then back used to resolve the cec errors from my >>>> scripts. >>> >>> >>> Would you be able to test the following patch please? >> >> Sean, >> >> I think you should be able to test this with the vivid driver. Load the vivid driver, >> run: >> >> cec-ctl --tv; cec-ctl -d1 --playback >> >> Then: >> >> cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 > > Ah, thanks. That will help with testing/reproducing. > >> That said, I tried this, but it doesn't crash for me, but perhaps I need to run >> some RC command first... > > Hmm I think those commands should be enough. It probably needs > CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now. > > Thanks, > > Sean > FYI the commands i run is as follows. getting status, this is run frequently several times per minute (once every 10 or 15 seconds): cec-ctl --to=0 --give-device-power-status then when i want to power on or off this is run: cec-ctl --to=0 --image-view-on or cec-ctl --to=0 --standby it usually takes day or two before i get a kernel oops. but i haven't studied super closely when exactly the oops happens in relation to what commands i sent since by now I'm used to it not working most of the time. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] media: rc: cec devices do not have a lirc chardev 2018-10-22 12:28 ` Sean Young (?) (?) @ 2018-10-22 13:17 ` Sean Young 2018-11-14 12:18 ` Torbjorn Jansson -1 siblings, 1 reply; 15+ messages in thread From: Sean Young @ 2018-10-22 13:17 UTC (permalink / raw) To: Hans Verkuil; +Cc: Torbjorn Jansson, Linux Media Mailing List, stable On Mon, Oct 22, 2018 at 01:28:42PM +0100, Sean Young wrote: > On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote: > > On 10/22/2018 11:14 AM, Sean Young wrote: > > > Would you be able to test the following patch please? > > > > Sean, > > > > I think you should be able to test this with the vivid driver. Load the vivid driver, > > run: > > > > cec-ctl --tv; cec-ctl -d1 --playback > > > > Then: > > > > cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 > > Ah, thanks. That will help with testing/reproducing. > > > That said, I tried this, but it doesn't crash for me, but perhaps I need to run > > some RC command first... > > Hmm I think those commands should be enough. It probably needs > CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now. Yes, that turned out to work. With CONFIG_DEBUG_SPINLOCK on, it goes bang every time. With the patch, the problem goes away. Without CONFIG_DEBUG_SPINLOCK we're going into undefined behaviour, so Torbjorn you're only seeing the oops occassionally (and which is why it has not been observed or reported before). Thanks, Sean ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] media: rc: cec devices do not have a lirc chardev 2018-10-22 13:17 ` Sean Young @ 2018-11-14 12:18 ` Torbjorn Jansson 0 siblings, 0 replies; 15+ messages in thread From: Torbjorn Jansson @ 2018-11-14 12:18 UTC (permalink / raw) To: Sean Young, Hans Verkuil; +Cc: Linux Media Mailing List, stable On 2018-10-22 15:17, Sean Young wrote: > On Mon, Oct 22, 2018 at 01:28:42PM +0100, Sean Young wrote: >> On Mon, Oct 22, 2018 at 12:30:29PM +0100, Hans Verkuil wrote: >>> On 10/22/2018 11:14 AM, Sean Young wrote: >>>> Would you be able to test the following patch please? >>> >>> Sean, >>> >>> I think you should be able to test this with the vivid driver. Load the vivid driver, >>> run: >>> >>> cec-ctl --tv; cec-ctl -d1 --playback >>> >>> Then: >>> >>> cec-ctl -d1 -t0 --user-control-pressed ui-cmd=F5 >> >> Ah, thanks. That will help with testing/reproducing. >> >>> That said, I tried this, but it doesn't crash for me, but perhaps I need to run >>> some RC command first... >> >> Hmm I think those commands should be enough. It probably needs >> CONFIG_DEBUG_SPINLOCK to detect the uninitialized spinlock. I'm trying it now. > > Yes, that turned out to work. With CONFIG_DEBUG_SPINLOCK on, it goes bang > every time. With the patch, the problem goes away. > > Without CONFIG_DEBUG_SPINLOCK we're going into undefined behaviour, so > Torbjorn you're only seeing the oops occassionally (and which is why it has > not been observed or reported before). > > Thanks, > > Sean > FYI i have tested this patch for a while now and have not seen any more kernel oops. So it appears to be working. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: cec kernel oops with pulse8 usb cec adapter 2018-10-22 8:59 ` Sean Young 2018-10-22 9:44 ` Torbjorn Jansson @ 2018-11-14 13:00 ` Torbjorn Jansson 2018-11-14 13:23 ` Hans Verkuil 1 sibling, 1 reply; 15+ messages in thread From: Torbjorn Jansson @ 2018-11-14 13:00 UTC (permalink / raw) To: Sean Young, Hans Verkuil; +Cc: Linux Media Mailing List On 2018-10-22 10:59, Sean Young wrote: > On Sat, Oct 20, 2018 at 11:12:16PM +0200, Hans Verkuil wrote: >> Hi Sean, >> >> Can you take a look at this, it appears to be an RC issue, see my analysis below. >> >> On 10/20/2018 03:26 PM, Torbjorn Jansson wrote: >>> Hello >>> >>> i'm using the pulse8 usb cec adapter to control my tv. >>> i have a few scripts that poll the power status of my tv and after a while it stops working returning errors when trying to check if tv is on or off. >>> this i think matches a kernel oops i'm seeing that i suspect is related to this. >>> >>> i have sometimes been able to recover from this problem by completely cutting power to my tv and also unplugging the usb cec adapter. >>> i have a feeling that the tv is at least partly to blame for cec-ctl not working but in any case there shouldn't be a kernel oops. >>> >>> >>> also every now and then i see this in dmesg: >>> cec cec0: transmit: failed 05 >>> cec cec0: transmit: failed 06 >>> but that doesn't appear to do any harm as far as i can tell. >>> >>> any idea whats causing the oops? >>> >>> the ops: >>> >>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 >>> PGD 0 P4D 0 >>> Oops: 0000 [#1] SMP PTI >>> CPU: 9 PID: 27687 Comm: kworker/9:2 Tainted: P OE 4.18.12-200.fc28.x86_64 #1 >>> Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 >>> Workqueue: events pulse8_irq_work_handler [pulse8_cec] >>> RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 [rc_core] >> >> Huh. ir_lirc_scancode_event() calls spin_lock_irqsave(&dev->lirc_fh_lock, flags); >> >> The spinlock dev->lirc_fh_lock is initialized in ir_lirc_register(), which is called >> from rc_register_device(), except when the protocol is CEC: >> >> /* Ensure that the lirc kfifo is setup before we start the thread */ >> if (dev->allowed_protocols != RC_PROTO_BIT_CEC) { >> rc = ir_lirc_register(dev); >> if (rc < 0) >> goto out_rx; >> } >> >> So it looks like ir_lirc_scancode_event() fails because dev->lirc_fh_lock was never >> initialized. >> >> Could this be fall-out from the lirc changes you did not too long ago? > > Yes, this is broken. My bad, sorry. I think this must have been broken since > v4.16. I can write a patch but I don't have a patch but I'm on the train > to ELCE in Edinburgh now, with no hardware to test on. > > > Sean since there now is a patch that appears to be working and fixing this problem i'd like to ask for some troubleshooting advice with another cec issue i have that i haven't figured out why it is happening and exactly whats causing it. i'm not sure if it is a hardware issue or a software issue or both. this is what is happening: i have a script that polls the tv for current status by running: cec-ctl --to=0 --give-device-power-status after a fresh reboot this works fine for a while then sometime later it stops working and errors with: - # cec-ctl --to=0 --give-device-power-status Driver Info: Driver Name : pulse8-cec Adapter Name : serio0 Capabilities : 0x0000003f Physical Address Logical Addresses Transmit Passthrough Remote Control Support Monitor All Driver version : 4.18.16 Available Logical Addresses: 1 Physical Address : 1.4.0.0 Logical Address Mask : 0x0800 CEC Version : 2.0 Vendor ID : 0x000c03 Logical Addresses : 1 (Allow RC Passthrough) Logical Address : 11 (Playback Device 3) Primary Device Type : Playback Logical Address Type : Playback All Device Types : Playback RC TV Profile : None Device Features : None Transmit from Playback Device 3 to TV (11 to 0): CEC_MSG_GIVE_DEVICE_POWER_STATUS (0x8f) Sequence: 119437 Tx Timestamp: 865389.535s Tx, Error (1), Max Retries - once this happens i can never get back any status and also running: cec-ctl -M gives me a lot of: Transmitted by Playback Device 3 to TV (11 to 0): CEC_MSG_GIVE_DEVICE_POWER_STATUS (0x8f) Tx, Error (1), Max Retries once for every run of my status checking script. i know polling like this is not the best option and a better way would be to listen for events and then take action when status changes but that's not easily doable with what i need it for. anyway, once i start getting the above errors when i poll it will not give me back any good status any more (everything errors out) also sending commands to tv to turn on or off like: cec-ctl --to=0 --image-view-on or cec-ctl --to=0 --standby doesn't work. BUT if i remove power to tv and wait for standby led to go out completely then power it back on i can use above two commands to turn on/off the tv even when they return errors and i still can't poll for current status. so even with the errors always returned at this stage the on/off commands still gets sent and works. do you think this behavior is a sw or hw issue or both? if i'm not mistaken i could unplug usb cable to pulse8 cec adapter and reinsert to make it work properly again (no errors and correct response like a fresh start) but i'm not 100% sure of this. when i tried it now i get a new kernel oops: - [866129.392139] usb 7-2: USB disconnect, device number 3 [866129.409568] cdc_acm 7-2:1.1: acm_start_wb - usb_submit_urb(write bulk) failed: -19 [866129.409576] cdc_acm 7-2:1.1: acm_start_wb - usb_submit_urb(write bulk) failed: -19 [866129.409635] WARNING: CPU: 10 PID: 1571 at drivers/media/cec/cec-adap.c:1243 cec_adap_unconfigure+0x9c/0xa0 [cec] [866129.409639] Modules linked in: loop fuse 8021q garp mrp xt_CHECKSUM iptable_mangle ip6t_REJECT nf_reject_ipv6 tun xt_nat macvlan ebtable_filter ebtables xfs devlink ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod bonding ip6table_filter ip6_tables nct6775 rc_cec hwmon_vid pulse8_cec cec vfat fat dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua nvidia_drm(POE) nvidia_modeset(POE) intel_rapl snd_hda_codec_hdmi nvidia_uvm(POE) x86_pkg_temp_thermal intel_powerclamp cx25840 coretemp nvidia(POE) joydev rc_tt_1500 sp2 si2157 si2168 snd_hda_codec_realtek [866129.409707] kvm_intel snd_hda_codec_generic snd_hda_intel cx23885 kvm snd_hda_codec altera_ci tda18271 iTCO_wdt iTCO_vendor_support snd_hda_core altera_stapl crct10dif_pclmul tveeprom crc32_pclmul ir_rc6_decoder cx2341x ghash_clmulni_intel videobuf2_dma_sg dvb_usb_dvbsky videobuf2_memops dvb_usb_v2 intel_cstate videobuf2_dvb snd_hwdep videobuf2_v4l2 m88ds3103 snd_seq videobuf2_common intel_uncore rc_rc6_mce cp210x snd_seq_device dvb_core mceusb intel_rapl_perf snd_pcm cdc_acm rc_core v4l2_common videodev ipmi_ssif drm_kms_helper snd_timer snd media drm i2c_mux mei_me i2c_i801 soundcore mei lpc_ich ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq nfsd auth_rpcgss binfmt_misc nfs_acl lockd grace sunrpc dm_thin_pool dm_cache_smq dm_cache dm_persistent_data libcrc32c dm_bio_prison mxm_wmi uas crc32c_intel [866129.409778] igb usb_storage megaraid_sas dca i2c_algo_bit wmi vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio i2c_dev [866129.409795] CPU: 10 PID: 1571 Comm: inputattach Tainted: P OE 4.18.16-200.local.fc28.x86_64 #1 [866129.409798] Hardware name: Supermicro C7X99-OCE-F/C7X99-OCE-F, BIOS 2.1a 06/15/2018 [866129.409804] RIP: 0010:cec_adap_unconfigure+0x9c/0xa0 [cec] [866129.409806] Code: 2c 05 00 00 e8 c5 fa ff ff 48 8d bb 28 04 00 00 31 c9 ba 01 00 00 00 be 01 00 00 00 e8 ad 9f 73 e8 48 89 df 5b e9 a4 fe ff ff <0f> 0b eb 98 0f 1f 44 00 00 41 57 4c 8d bf c8 03 00 00 41 56 41 55 [866129.409859] RSP: 0018:ffff9d02872b7cf0 EFLAGS: 00010286 [866129.409863] RAX: 00000000ffffffff RBX: ffff8a65f158c800 RCX: 0000000000000000 [866129.409866] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8a65f5626b00 [866129.409869] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8a65f0437d88 [866129.409872] R10: 0000000000000000 R11: ffffffffaa9a41ee R12: 000000000000ffff [866129.409875] R13: ffff8a65f158cb80 R14: 0000000000000060 R15: 0000000000000000 [866129.409879] FS: 00007f3bc63e4740(0000) GS:ffff8a65ff480000(0000) knlGS:0000000000000000 [866129.409882] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [866129.409885] CR2: 00007f681a1c4380 CR3: 000000102944e001 CR4: 00000000003606e0 [866129.409889] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [866129.409892] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [866129.409894] Call Trace: [866129.409905] __cec_s_phys_addr+0x78/0x220 [cec] [866129.409911] cec_unregister_adapter+0xd3/0x120 [cec] [866129.409917] pulse8_disconnect+0x1a/0x60 [pulse8_cec] [866129.409926] serio_disconnect_driver+0x31/0x40 [866129.409930] serio_driver_remove+0x11/0x20 [866129.409938] device_release_driver_internal+0x180/0x250 [866129.409944] serio_unregister_port+0x1d/0x40 [866129.409950] serport_ldisc_read+0x132/0x190 [866129.409958] ? finish_wait+0x80/0x80 [866129.409966] tty_read+0x94/0x120 [866129.409974] __vfs_read+0x36/0x180 [866129.409979] vfs_read+0x8a/0x140 [866129.409984] ksys_read+0x4f/0xb0 [866129.409992] do_syscall_64+0x5b/0x160 [866129.410000] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [866129.410005] RIP: 0033:0x7f3bc5b8de21 [866129.410007] Code: fe ff ff 50 48 8d 3d 46 b6 09 00 e8 f9 04 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 c1 3b 2d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48 [866129.410060] RSP: 002b:00007ffd9bda4508 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [866129.410064] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f3bc5b8de21 [866129.410066] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 [866129.410069] RBP: 00007ffd9bda453c R08: 00007ffd9bda4490 R09: 0000000000000010 [866129.410072] R10: fffffffffffffac5 R11: 0000000000000246 R12: 00007f3bc63e46c0 [866129.410075] R13: 00007ffd9bda4678 R14: 0000000000000010 R15: 0000000000604e10 [866129.410079] ---[ end trace 5c8dcc19f7a6ee3c ]--- [866129.410219] pulse8-cec serio0: disconnected [866129.410294] cdc_acm 7-2:1.0: failed to set dtr/rts - advice? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: cec kernel oops with pulse8 usb cec adapter 2018-11-14 13:00 ` cec kernel oops with pulse8 usb cec adapter Torbjorn Jansson @ 2018-11-14 13:23 ` Hans Verkuil 2018-11-14 16:22 ` Torbjorn Jansson 0 siblings, 1 reply; 15+ messages in thread From: Hans Verkuil @ 2018-11-14 13:23 UTC (permalink / raw) To: Torbjorn Jansson, Sean Young; +Cc: Linux Media Mailing List On 11/14/18 14:00, Torbjorn Jansson wrote: <snip> > since there now is a patch that appears to be working and fixing this problem > i'd like to ask for some troubleshooting advice with another cec issue i have > that i haven't figured out why it is happening and exactly whats causing it. > > i'm not sure if it is a hardware issue or a software issue or both. > > this is what is happening: > i have a script that polls the tv for current status by running: > cec-ctl --to=0 --give-device-power-status > after a fresh reboot this works fine for a while then sometime later it stops > working and errors with: > - > # cec-ctl --to=0 --give-device-power-status > Driver Info: > Driver Name : pulse8-cec > Adapter Name : serio0 > Capabilities : 0x0000003f > Physical Address > Logical Addresses > Transmit > Passthrough > Remote Control Support > Monitor All > Driver version : 4.18.16 > Available Logical Addresses: 1 > Physical Address : 1.4.0.0 > Logical Address Mask : 0x0800 > CEC Version : 2.0 > Vendor ID : 0x000c03 > Logical Addresses : 1 (Allow RC Passthrough) > > Logical Address : 11 (Playback Device 3) > Primary Device Type : Playback > Logical Address Type : Playback > All Device Types : Playback > RC TV Profile : None > Device Features : > None > > > Transmit from Playback Device 3 to TV (11 to 0): > CEC_MSG_GIVE_DEVICE_POWER_STATUS (0x8f) > Sequence: 119437 Tx Timestamp: 865389.535s > Tx, Error (1), Max Retries > - > > once this happens i can never get back any status and also running: > cec-ctl -M > gives me a lot of: > Transmitted by Playback Device 3 to TV (11 to 0): > CEC_MSG_GIVE_DEVICE_POWER_STATUS (0x8f) > Tx, Error (1), Max Retries > once for every run of my status checking script. > > i know polling like this is not the best option and a better way would be to > listen for events and then take action when status changes but that's not > easily doable with what i need it for. > > anyway, once i start getting the above errors when i poll it will not give me > back any good status any more (everything errors out) > > also sending commands to tv to turn on or off like: > cec-ctl --to=0 --image-view-on > or > cec-ctl --to=0 --standby > doesn't work. > > BUT if i remove power to tv and wait for standby led to go out completely then > power it back on i can use above two commands to turn on/off the tv even when > they return errors and i still can't poll for current status. > > so even with the errors always returned at this stage the on/off commands still > gets sent and works. > > do you think this behavior is a sw or hw issue or both? Does 'cec-ctl --to=0 --give-device-power-status' eventually fail even if the TV is on all the time? Or does it only fail if the TV goes to standby or has been in standby for a very long time? This error ('Tx, Error (1), Max Retries') indicates that the pulse-eight returns transmit errors suggesting that the CEC line is locked (always high or low). > > > if i'm not mistaken i could unplug usb cable to pulse8 cec adapter and reinsert > to make it work properly again (no errors and correct response like a fresh start) > but i'm not 100% sure of this. > when i tried it now i get a new kernel oops: > - > [866129.392139] usb 7-2: USB disconnect, device number 3 > [866129.409568] cdc_acm 7-2:1.1: acm_start_wb - usb_submit_urb(write bulk) > failed: -19 > [866129.409576] cdc_acm 7-2:1.1: acm_start_wb - usb_submit_urb(write bulk) > failed: -19 > [866129.409635] WARNING: CPU: 10 PID: 1571 at drivers/media/cec/cec-adap.c:1243 Not a kernel oops, just a warning. The pulse-eight driver has a small bug that causes this warning, but it does not affect the CEC behavior in any way. I'll post a patch. Regards, Hans ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: cec kernel oops with pulse8 usb cec adapter 2018-11-14 13:23 ` Hans Verkuil @ 2018-11-14 16:22 ` Torbjorn Jansson 0 siblings, 0 replies; 15+ messages in thread From: Torbjorn Jansson @ 2018-11-14 16:22 UTC (permalink / raw) To: Hans Verkuil, Sean Young; +Cc: Linux Media Mailing List On 2018-11-14 14:23, Hans Verkuil wrote: > On 11/14/18 14:00, Torbjorn Jansson wrote: > > <snip> > >> since there now is a patch that appears to be working and fixing this problem >> i'd like to ask for some troubleshooting advice with another cec issue i have >> that i haven't figured out why it is happening and exactly whats causing it. >> >> i'm not sure if it is a hardware issue or a software issue or both. >> >> this is what is happening: >> i have a script that polls the tv for current status by running: >> cec-ctl --to=0 --give-device-power-status >> after a fresh reboot this works fine for a while then sometime later it stops >> working and errors with: >> - >> # cec-ctl --to=0 --give-device-power-status >> Driver Info: >> Driver Name : pulse8-cec >> Adapter Name : serio0 >> Capabilities : 0x0000003f >> Physical Address >> Logical Addresses >> Transmit >> Passthrough >> Remote Control Support >> Monitor All >> Driver version : 4.18.16 >> Available Logical Addresses: 1 >> Physical Address : 1.4.0.0 >> Logical Address Mask : 0x0800 >> CEC Version : 2.0 >> Vendor ID : 0x000c03 >> Logical Addresses : 1 (Allow RC Passthrough) >> >> Logical Address : 11 (Playback Device 3) >> Primary Device Type : Playback >> Logical Address Type : Playback >> All Device Types : Playback >> RC TV Profile : None >> Device Features : >> None >> >> >> Transmit from Playback Device 3 to TV (11 to 0): >> CEC_MSG_GIVE_DEVICE_POWER_STATUS (0x8f) >> Sequence: 119437 Tx Timestamp: 865389.535s >> Tx, Error (1), Max Retries >> - >> >> once this happens i can never get back any status and also running: >> cec-ctl -M >> gives me a lot of: >> Transmitted by Playback Device 3 to TV (11 to 0): >> CEC_MSG_GIVE_DEVICE_POWER_STATUS (0x8f) >> Tx, Error (1), Max Retries >> once for every run of my status checking script. >> >> i know polling like this is not the best option and a better way would be to >> listen for events and then take action when status changes but that's not >> easily doable with what i need it for. >> >> anyway, once i start getting the above errors when i poll it will not give me >> back any good status any more (everything errors out) >> >> also sending commands to tv to turn on or off like: >> cec-ctl --to=0 --image-view-on >> or >> cec-ctl --to=0 --standby >> doesn't work. >> >> BUT if i remove power to tv and wait for standby led to go out completely then >> power it back on i can use above two commands to turn on/off the tv even when >> they return errors and i still can't poll for current status. >> >> so even with the errors always returned at this stage the on/off commands still >> gets sent and works. >> >> do you think this behavior is a sw or hw issue or both? > > Does 'cec-ctl --to=0 --give-device-power-status' eventually fail even if the TV is > on all the time? Or does it only fail if the TV goes to standby or has been in > standby for a very long time? > good question, tv will go to standby by itself after some time and i can't have the tv turned on for a few days in a row (not practical). my gut feeling is that it happens after being in standby for a long time. it may work fine for a day or two then it fails. > This error ('Tx, Error (1), Max Retries') indicates that the pulse-eight returns > transmit errors suggesting that the CEC line is locked (always high or low). > I did some more testing and i was able to somehow trigger the error state by having one cec-ctl -M and one cec-ctl -S going at the same time but this is not consistent (tested several times and was only able to get it to fail like this once) Also i tested to pull hdmi cable from tv when in error state and see if i could reach other devices, tested with cec-ctl -S and it did display a some of my other hdmi devices like amp & bd player. So i guess my tv is to blame for locking up cec. Maybe i'll just schedule something to cut power completely to tv during night for a few hours. >> >> >> if i'm not mistaken i could unplug usb cable to pulse8 cec adapter and reinsert >> to make it work properly again (no errors and correct response like a fresh start) >> but i'm not 100% sure of this. >> when i tried it now i get a new kernel oops: >> - >> [866129.392139] usb 7-2: USB disconnect, device number 3 >> [866129.409568] cdc_acm 7-2:1.1: acm_start_wb - usb_submit_urb(write bulk) >> failed: -19 >> [866129.409576] cdc_acm 7-2:1.1: acm_start_wb - usb_submit_urb(write bulk) >> failed: -19 >> [866129.409635] WARNING: CPU: 10 PID: 1571 at drivers/media/cec/cec-adap.c:1243 > > Not a kernel oops, just a warning. The pulse-eight driver has a small bug > that causes this warning, but it does not affect the CEC behavior in any way. > > I'll post a patch. > Ok so nothing to worry about then. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-11-15 2:26 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-20 13:26 cec kernel oops with pulse8 usb cec adapter Torbjorn Jansson 2018-10-20 21:12 ` Hans Verkuil 2018-10-22 8:59 ` Sean Young 2018-10-22 9:44 ` Torbjorn Jansson 2018-10-22 10:14 ` [PATCH] media: rc: cec devices do not have a lirc chardev Sean Young 2018-10-22 10:14 ` Sean Young 2018-10-22 11:30 ` Hans Verkuil 2018-10-22 12:28 ` Sean Young 2018-10-22 12:28 ` Sean Young 2018-10-22 12:44 ` Torbjorn Jansson 2018-10-22 13:17 ` Sean Young 2018-11-14 12:18 ` Torbjorn Jansson 2018-11-14 13:00 ` cec kernel oops with pulse8 usb cec adapter Torbjorn Jansson 2018-11-14 13:23 ` Hans Verkuil 2018-11-14 16:22 ` Torbjorn Jansson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.