* [REGRESSION/bisected] kernel panic after "rmmod cx23885" by "si2157: implement signal strength stats"
@ 2016-01-08 12:18 Martin Witte
2016-01-09 21:42 ` Ernst Martin Witte
0 siblings, 1 reply; 5+ messages in thread
From: Martin Witte @ 2016-01-08 12:18 UTC (permalink / raw)
To: Linux Media Mailing List
Hi!
First apologies if this bug report is not perfect ... It's my first
time hunting a kernel bug and doing the bisect.
My way to reliably reproduce the issue:
* relevance / use-case:
HTPC going into sleep (currently requires rmmod as isolated here)
when not being used.
* base system:
- Ubuntu 15.10 (arch: amd64)
- PCIe DVB-C card DVBSky T982 (2 DVB-C/T tuners)
- starting point: Ubuntu kernel 4.2.x / x86_64
* run tvheadend 4.x using the DVB-C tuners (from git/master in my
case, but actual release should not be that relevant) ... likely
any other DVB-C application would be sufficient.
* trigger kernel panic:
- stop tvheadend
- rmmod cx23885
Please note: It seems to be relevant that the DVB-C tuners were
in use (and released) before the call to rmmod. During testing,
I had situations where I could successfully do the rmmod if
tvheadend was not yet started after a fresh boot.
The kernel panic happens after the rmmod with a little delay
(less than a second) - in some completely unrelated parts of the
kernel, but reproducible locations, e.g. a stack frame like:
call_timer_fn+0x35/0xf0
__run_timers (inlined)
run_timer_softirq+0x221/0x2d0
__do_softirq+0x0f6/0250
(unable to handle paging request, aka invalid pointer)
or
call_timer_fn+0x35/0xf0
__run_timers (inlined)
run_timer_softirq+0x162/0x2d0
__do_softirq+0x0f6/0250
(null pointer dereference)
The faulting line is a call to NULL or invalid function pointer,
namely line 1178 in media_tree/kernel/time/timer.c
1177 trace_timer_expire_entry(timer);
1178 fn(data);
1179 trace_timer_expire_exit(timer);
fn is a null pointer if called within the "if (irqsafe)" block
or an invalid pointer if called from the respective else branch
(ll. 1284 in media_tree/kernel/time/timer.c)
* Affected:
- v4.2.x
- v4.3.x
- v4.4.rcX
- media-tree/master (after workaround for regression identified
by Matthias Schwarzott on Jan. 6,
"[REGRESSION/bisected] kernel oops in
vb2_core_qbuf")
* Not affected: v4.1.x
* Bisect result on git://github.com/torvalds/linux.git/master
between v4.1 and v4.2
2f1ea29fca781b8e6600f3ece1f2dd98ae276294 is the first bad commit
commit 2f1ea29fca781b8e6600f3ece1f2dd98ae276294
Author: Antti Palosaari <crope@iki.fi>
Date: Sun Sep 7 11:20:34 2014 -0300
[media] si2157: implement signal strength stats
Implement DVBv5 signal strength stats. Returns dBm.
Signed-off-by: Antti Palosaari <crope@iki.fi>
Tested-by: Adam Baker <linux@baker-net.org.uk>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
:040000 040000 1fc70a3d18532f91289e8b581081ee59feefc321 5130e9b011e9c4ba683cd4db3eae8dca67c3ef0e M drivers
* Workaround:
Reverting this patch on my tested kernels (incl. 4.4.rc8) prevents
the kernel panic.
* Guesses (Conclusion would be far too much ;-)
- Memory corruption by some of the media drivers? Without any
personal background in linux kernel development, it's currently
my only idea how those far-apart code locations can affect each
other. If it is memory corruption - is it
* caused by the identified patch? ... or
* caused by some other part of the media drivers but made
"observable" by the identified patch?
- Are there kernel timers created by the media drivers which are
not cleaned up properly upon rmmmod?
...just few ideas (... without any background on kernel dev ;-)
BR and many thx for your kernel development work!
Martin
P.S.: Bisect log:
user@host:/a/b/c$ git bisect log
# bad: [64291f7db5bd8150a74ad2036f1037e6a0428df2] Linux 4.2
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
git bisect start 'v4.2' 'v4.1'
# bad: [c11d716218910c3aa2bac1bb641e6086ad649555] Merge tag 'armsoc-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect bad c11d716218910c3aa2bac1bb641e6086ad649555
# good: [8a8c35fadfaf55629a37ef1a8ead1b8fb32581d2] mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
git bisect good 8a8c35fadfaf55629a37ef1a8ead1b8fb32581d2
# good: [14738e03312ff1137109d68bcbf103c738af0f4a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good 14738e03312ff1137109d68bcbf103c738af0f4a
# good: [4570a37169d4b44d316f40b2ccc681dc93fedc7b] Merge tag 'sound-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good 4570a37169d4b44d316f40b2ccc681dc93fedc7b
# bad: [78aad7f81aa6dfccdb2804ac35db6fc371d265cf] [media] vivid-tpg: precalculate colorspace/xfer_func combinations
git bisect bad 78aad7f81aa6dfccdb2804ac35db6fc371d265cf
# good: [356484cabe44984d2dc66a90bd5e3465ba1f64fb] [media] dw2102: resync fifo when demod locks
git bisect good 356484cabe44984d2dc66a90bd5e3465ba1f64fb
# good: [e4aa18d33c3a05f9ac51a8c8c7863318c807650f] [media] DocBook: Improve the description of the properties API
git bisect good e4aa18d33c3a05f9ac51a8c8c7863318c807650f
# good: [e01dfc01914ab9a078ca8d08287c19c6663b5438] [media] videodev2.h: add COLORSPACE_DEFAULT
git bisect good e01dfc01914ab9a078ca8d08287c19c6663b5438
# good: [dc9ef7d11207a04514ca195f0c9f4d2ac56696e1] [media] DocBook media: rewrite frontend open/close
git bisect good dc9ef7d11207a04514ca195f0c9f4d2ac56696e1
# bad: [5ac417efe66ddd7cd70a98f7f4e32a14ae40a651] [media] sh_vou: avoid going past arrays
git bisect bad 5ac417efe66ddd7cd70a98f7f4e32a14ae40a651
# bad: [171fe6d1270d535eae798e4b5acc9f5d25e6e17e] [media] media: davinci_vpfe: set minimum required buffers to three
git bisect bad 171fe6d1270d535eae798e4b5acc9f5d25e6e17e
# good: [d2b72f6482b9a3c57f036c11786a2489dcc81176] [media] si2168: Implement own I2C adapter locking
git bisect good d2b72f6482b9a3c57f036c11786a2489dcc81176
# bad: [694f9963edd831e4ed6fdbcb7134525cf5715a79] [media] media: davinci_vpfe: clear the output_specs
git bisect bad 694f9963edd831e4ed6fdbcb7134525cf5715a79
# bad: [2f1ea29fca781b8e6600f3ece1f2dd98ae276294] [media] si2157: implement signal strength stats
git bisect bad 2f1ea29fca781b8e6600f3ece1f2dd98ae276294
# first bad commit: [2f1ea29fca781b8e6600f3ece1f2dd98ae276294] [media] si2157: implement signal strength stats
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REGRESSION/bisected] kernel panic after "rmmod cx23885" by "si2157: implement signal strength stats"
2016-01-08 12:18 [REGRESSION/bisected] kernel panic after "rmmod cx23885" by "si2157: implement signal strength stats" Martin Witte
@ 2016-01-09 21:42 ` Ernst Martin Witte
2016-01-10 3:43 ` Antti Palosaari
0 siblings, 1 reply; 5+ messages in thread
From: Ernst Martin Witte @ 2016-01-09 21:42 UTC (permalink / raw)
To: Martin Witte; +Cc: Linux Media Mailing List
Hi again,
seems that the cause for the kernel panic is a missing call to
cancel_delayed_work_sync in si2157_remove before the call to kfree.
After adding cancel_delayed_work_sync(&dev->stat_work), rmmod does not
trigger the kernel panic any more.
However, very similar issues could be identified also in other modules:
ts2020
af9013
af9033
rtl2830
when looking in drivers/media/tuners and drivers/media/dvb-frontends.
Therefore, the submitted patch set contains fixes also for those
modules. The submitted patch set is:
[PATCH 0/5] [media] cancel_delayed_work_sync before device removal / kfree
I hope these patches completely fix the issue and are ok for inclusion
in the kernel.
BR and thx,
Martin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REGRESSION/bisected] kernel panic after "rmmod cx23885" by "si2157: implement signal strength stats"
2016-01-09 21:42 ` Ernst Martin Witte
@ 2016-01-10 3:43 ` Antti Palosaari
2016-01-10 22:04 ` Ernst Martin Witte
0 siblings, 1 reply; 5+ messages in thread
From: Antti Palosaari @ 2016-01-10 3:43 UTC (permalink / raw)
To: emw, Martin Witte; +Cc: Linux Media Mailing List
Hello
Those timers are activated on frontend init() ja de-activated on
sleep(). Removing active driver does not sound good and IMHO it should
not be even possible. I think it should be find out why it is possible
to remove whole driver before it is put to sleep().
Antti
On 01/09/2016 11:42 PM, Ernst Martin Witte wrote:
> Hi again,
>
> seems that the cause for the kernel panic is a missing call to
> cancel_delayed_work_sync in si2157_remove before the call to kfree.
> After adding cancel_delayed_work_sync(&dev->stat_work), rmmod does not
> trigger the kernel panic any more.
>
> However, very similar issues could be identified also in other modules:
>
> ts2020
> af9013
> af9033
> rtl2830
>
> when looking in drivers/media/tuners and drivers/media/dvb-frontends.
>
> Therefore, the submitted patch set contains fixes also for those
> modules. The submitted patch set is:
>
> [PATCH 0/5] [media] cancel_delayed_work_sync before device removal / kfree
>
> I hope these patches completely fix the issue and are ok for inclusion
> in the kernel.
>
> BR and thx,
> Martin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
http://palosaari.fi/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REGRESSION/bisected] kernel panic after "rmmod cx23885" by "si2157: implement signal strength stats"
2016-01-10 3:43 ` Antti Palosaari
@ 2016-01-10 22:04 ` Ernst Martin Witte
2016-01-11 2:34 ` Antti Palosaari
0 siblings, 1 reply; 5+ messages in thread
From: Ernst Martin Witte @ 2016-01-10 22:04 UTC (permalink / raw)
To: Linux Media Mailing List, Antti Palosaari
Hi Antti!
Thx for your quick answer. Since I'm not familiar with linux kernel
driver internals, may I ask few questions in order to better
understand the intended use of ..._delayed_work and the interaction
between drivers like the cx23885 and the tuner modules?
- Is si2157_sleep not related to "suspended" system power state but to
an "unused" state of the tuner physical device?
- Which mechanism should call si2157_sleep when e.g. cx23885 is
rmmod'ed?
- What I understood so far is that we should not need the
cance_delayed_work_sync in ..._remove because ..._sleep should have
been called before. Why is such a cancel_delayed_work call already
in the kernel in rtl2832_remove although this module has the
separate cancel_delayed_work call in rtl23832_sleep?
- If si2157_sleep was not called and the device is considered to be
still active, which mechanism should prevent the call of
si2157_remove or the unloading of module cx23885?
- Is it correct that rmmod cx23885 is legitime if none of its devices
is in use by any app and if those "links" between si2157 and cx23885
are shut down correctly? ... and should only be prevented if
something goes wrong with shutting down those "links"?
- Is the interpretation correct that si2157_remove should be called
when the driver/module si2157 is unloaded? If yes - how does it
happen that it gets called when only an rmmod cx23885 is done?
- What is the intended call sequence of all those
sleep/remove/... callbacks when
a) an applications closes the device (but driver remains loaded, but
presumingly inactive)
b) the one or other module is rmmod'ed (for me personally mainly
cx23885)
... you might see that for me it's mainly unclear how si2157 and
cx23885 are linked in terms of callbacks.
Is there anything I could try (adding debug statements, etc.) / test
in order to get more insight what's going wrong?
BR and many thanks for your help,
Martin
On 10.01.2016 05:43, Antti Palosaari wrote:
> Hello
> Those timers are activated on frontend init() ja de-activated on sleep().
> Removing active driver does not sound good and IMHO it should not be even
> possible. I think it should be find out why it is possible to remove whole
> driver before it is put to sleep().
>
> Antti
>
>
> On 01/09/2016 11:42 PM, Ernst Martin Witte wrote:
> >Hi again,
> >
> >seems that the cause for the kernel panic is a missing call to
> >cancel_delayed_work_sync in si2157_remove before the call to kfree.
> >After adding cancel_delayed_work_sync(&dev->stat_work), rmmod does not
> >trigger the kernel panic any more.
> >
> >However, very similar issues could be identified also in other modules:
> >
> > ts2020
> > af9013
> > af9033
> > rtl2830
> >
> >when looking in drivers/media/tuners and drivers/media/dvb-frontends.
> >
> >Therefore, the submitted patch set contains fixes also for those
> >modules. The submitted patch set is:
> >
> > [PATCH 0/5] [media] cancel_delayed_work_sync before device removal / kfree
> >
> >I hope these patches completely fix the issue and are ok for inclusion
> >in the kernel.
> >
> >BR and thx,
> > Martin
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-media" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> --
> http://palosaari.fi/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REGRESSION/bisected] kernel panic after "rmmod cx23885" by "si2157: implement signal strength stats"
2016-01-10 22:04 ` Ernst Martin Witte
@ 2016-01-11 2:34 ` Antti Palosaari
0 siblings, 0 replies; 5+ messages in thread
From: Antti Palosaari @ 2016-01-11 2:34 UTC (permalink / raw)
To: emw-linux-media-2016, Linux Media Mailing List
On 01/11/2016 12:04 AM, Ernst Martin Witte wrote:
> Hi Antti!
>
> Thx for your quick answer. Since I'm not familiar with linux kernel
> driver internals, may I ask few questions in order to better
> understand the intended use of ..._delayed_work and the interaction
> between drivers like the cx23885 and the tuner modules?
>
> - Is si2157_sleep not related to "suspended" system power state but to
> an "unused" state of the tuner physical device?
It is physical chip state where is is not used.
> - Which mechanism should call si2157_sleep when e.g. cx23885 is
> rmmod'ed?
It is dvb_frontend. You should not be able to remove cx23885 module
until dvb frontend is stopped. dvb frontend runs thread in order to call
init, sleep etc.
> - What I understood so far is that we should not need the
> cance_delayed_work_sync in ..._remove because ..._sleep should have
> been called before. Why is such a cancel_delayed_work call already
> in the kernel in rtl2832_remove although this module has the
> separate cancel_delayed_work call in rtl23832_sleep?
Look it again. There is 2 different delayed works.
> - If si2157_sleep was not called and the device is considered to be
> still active, which mechanism should prevent the call of
> si2157_remove or the unloading of module cx23885?
cx23885 should ref count si2157. I think dvb frontend ref counts cx23885.
> - Is it correct that rmmod cx23885 is legitime if none of its devices
> is in use by any app and if those "links" between si2157 and cx23885
> are shut down correctly? ... and should only be prevented if
> something goes wrong with shutting down those "links"?
>
> - Is the interpretation correct that si2157_remove should be called
> when the driver/module si2157 is unloaded? If yes - how does it
> happen that it gets called when only an rmmod cx23885 is done?
It should be called. cx23885 module releases si2157, which makes driver
core to call si2157_remove in order to free all resources driver has.
There is 2 different kind of driver ops: one for DVB core and one for
(I2C) driver core. si2157_remove is driver core op and si2157_sleep is
coming from DVB core.
> - What is the intended call sequence of all those
> sleep/remove/... callbacks when
>
> a) an applications closes the device (but driver remains loaded, but
> presumingly inactive)
>
> b) the one or other module is rmmod'ed (for me personally mainly
> cx23885)
Those dvb core callbacks are higher level and ones from struct
i2c_driver are lower level. So in order to shutdown device dvb callbacks
should be called first and after that i2c remove. After I2C remove none
of callbacks are not valid anymore.
>
> ... you might see that for me it's mainly unclear how si2157 and
> cx23885 are linked in terms of callbacks.
>
> Is there anything I could try (adding debug statements, etc.) / test
> in order to get more insight what's going wrong?
Sure, just add manually some printing to see where code is running.
That's how I also do.
You should examine when dvb core calls sleep and test it is not possible
to remove cx23885 module until that. And if sleep is not called, why it
is not called. IIRC there is some module param to prevent dvb frontend
shutdown, but it should not be used (I think it is left there to
workaround some bug).
Why you even want to remove whole module? Usually you only remove
modules manually when you are developing the driver.
regards
Antti
>
> BR and many thanks for your help,
> Martin
>
>
> On 10.01.2016 05:43, Antti Palosaari wrote:
>> Hello
>> Those timers are activated on frontend init() ja de-activated on sleep().
>> Removing active driver does not sound good and IMHO it should not be even
>> possible. I think it should be find out why it is possible to remove whole
>> driver before it is put to sleep().
>>
>> Antti
>>
>>
>> On 01/09/2016 11:42 PM, Ernst Martin Witte wrote:
>>> Hi again,
>>>
>>> seems that the cause for the kernel panic is a missing call to
>>> cancel_delayed_work_sync in si2157_remove before the call to kfree.
>>> After adding cancel_delayed_work_sync(&dev->stat_work), rmmod does not
>>> trigger the kernel panic any more.
>>>
>>> However, very similar issues could be identified also in other modules:
>>>
>>> ts2020
>>> af9013
>>> af9033
>>> rtl2830
>>>
>>> when looking in drivers/media/tuners and drivers/media/dvb-frontends.
>>>
>>> Therefore, the submitted patch set contains fixes also for those
>>> modules. The submitted patch set is:
>>>
>>> [PATCH 0/5] [media] cancel_delayed_work_sync before device removal / kfree
>>>
>>> I hope these patches completely fix the issue and are ok for inclusion
>>> in the kernel.
>>>
>>> BR and thx,
>>> Martin
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-media" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> http://palosaari.fi/
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-media" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
--
http://palosaari.fi/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-01-11 2:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-08 12:18 [REGRESSION/bisected] kernel panic after "rmmod cx23885" by "si2157: implement signal strength stats" Martin Witte
2016-01-09 21:42 ` Ernst Martin Witte
2016-01-10 3:43 ` Antti Palosaari
2016-01-10 22:04 ` Ernst Martin Witte
2016-01-11 2:34 ` Antti Palosaari
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.