* + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree @ 2016-09-20 21:55 akpm 2016-09-21 5:18 ` Sergey Senozhatsky 0 siblings, 1 reply; 9+ messages in thread From: akpm @ 2016-09-20 21:55 UTC (permalink / raw) To: ssantosh, davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx, mm-commits The patch titled Subject: softirq: fix tasklet_kill() and its users has been added to the -mm tree. Its filename is softirq-fix-tasklet_kill-and-its-users.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/softirq-fix-tasklet_kill-and-its-users.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/softirq-fix-tasklet_kill-and-its-users.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Santosh Shilimkar <ssantosh@kernel.org> Subject: softirq: fix tasklet_kill() and its users Semantically the expectation from the tasklet init/kill API should be as below. tasklet_init() == Init and Enable scheduling tasklet_kill() == Disable scheduling and Destroy The tasklet_init() API exibits the above behavior but tasklet_kill() does not. The tasklet handler can still get scheduled and run even after tasklet_kill(). There are 2, 3 places where drivers are working around this issue by calling tasklet_disable() which will add a usecount and thereby preventing the handlers from being called. tasklet_enable/tasklet_disable is a pair API and they are expected to be used together. Usage of tasklet_disable() *just* to work around tasklet scheduling after a kill is probably not the correct and intended use of the API. We also happen to see similar issues where in the shutdown path the tasklet_handler is getting called even after the tasklet_kill(). We fix this by ensuring that tasklet_kill() does the right thing and thereby ensures that the tasklet handler won't run after tasklet_kil() with a very simple change. The patch fixes the tasklet code and also removes a few driver workarounds. Link: http://lkml.kernel.org/r/1472089950-2724-1-git-send-email-ssantosh@kernel.org Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tadeusz Struk <tadeusz.struk@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Cc: Salvatore Benedetto <salvatore.benedetto@intel.com> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- drivers/crypto/qat/qat_common/adf_isr.c | 1 - drivers/crypto/qat/qat_common/adf_sriov.c | 1 - drivers/crypto/qat/qat_common/adf_vf_isr.c | 2 -- drivers/isdn/gigaset/interface.c | 1 - kernel/softirq.c | 7 ++++--- 5 files changed, 4 insertions(+), 8 deletions(-) diff -puN drivers/crypto/qat/qat_common/adf_isr.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_isr.c --- a/drivers/crypto/qat/qat_common/adf_isr.c~softirq-fix-tasklet_kill-and-its-users +++ a/drivers/crypto/qat/qat_common/adf_isr.c @@ -296,7 +296,6 @@ static void adf_cleanup_bh(struct adf_ac int i; for (i = 0; i < hw_data->num_banks; i++) { - tasklet_disable(&priv_data->banks[i].resp_handler); tasklet_kill(&priv_data->banks[i].resp_handler); } } diff -puN drivers/crypto/qat/qat_common/adf_sriov.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_sriov.c --- a/drivers/crypto/qat/qat_common/adf_sriov.c~softirq-fix-tasklet_kill-and-its-users +++ a/drivers/crypto/qat/qat_common/adf_sriov.c @@ -204,7 +204,6 @@ void adf_disable_sriov(struct adf_accel_ } for (i = 0, vf = accel_dev->pf.vf_info; i < totalvfs; i++, vf++) { - tasklet_disable(&vf->vf2pf_bh_tasklet); tasklet_kill(&vf->vf2pf_bh_tasklet); mutex_destroy(&vf->pf2vf_lock); } diff -puN drivers/crypto/qat/qat_common/adf_vf_isr.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_vf_isr.c --- a/drivers/crypto/qat/qat_common/adf_vf_isr.c~softirq-fix-tasklet_kill-and-its-users +++ a/drivers/crypto/qat/qat_common/adf_vf_isr.c @@ -191,7 +191,6 @@ static int adf_setup_pf2vf_bh(struct adf static void adf_cleanup_pf2vf_bh(struct adf_accel_dev *accel_dev) { - tasklet_disable(&accel_dev->vf.pf2vf_bh_tasklet); tasklet_kill(&accel_dev->vf.pf2vf_bh_tasklet); mutex_destroy(&accel_dev->vf.vf2pf_lock); } @@ -268,7 +267,6 @@ static void adf_cleanup_bh(struct adf_ac { struct adf_etr_data *priv_data = accel_dev->transport; - tasklet_disable(&priv_data->banks[0].resp_handler); tasklet_kill(&priv_data->banks[0].resp_handler); } diff -puN drivers/isdn/gigaset/interface.c~softirq-fix-tasklet_kill-and-its-users drivers/isdn/gigaset/interface.c --- a/drivers/isdn/gigaset/interface.c~softirq-fix-tasklet_kill-and-its-users +++ a/drivers/isdn/gigaset/interface.c @@ -524,7 +524,6 @@ void gigaset_if_free(struct cardstate *c if (!drv->have_tty) return; - tasklet_disable(&cs->if_wake_tasklet); tasklet_kill(&cs->if_wake_tasklet); cs->tty_dev = NULL; tty_unregister_device(drv->tty, cs->minor_index); diff -puN kernel/softirq.c~softirq-fix-tasklet_kill-and-its-users kernel/softirq.c --- a/kernel/softirq.c~softirq-fix-tasklet_kill-and-its-users +++ a/kernel/softirq.c @@ -498,7 +498,7 @@ static void tasklet_action(struct softir list = list->next; if (tasklet_trylock(t)) { - if (!atomic_read(&t->count)) { + if (atomic_read(&t->count) == 1) { if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state)) BUG(); @@ -534,7 +534,7 @@ static void tasklet_hi_action(struct sof list = list->next; if (tasklet_trylock(t)) { - if (!atomic_read(&t->count)) { + if (atomic_read(&t->count) == 1) { if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state)) BUG(); @@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct { t->next = NULL; t->state = 0; - atomic_set(&t->count, 0); + atomic_set(&t->count, 1); t->func = func; t->data = data; } @@ -576,6 +576,7 @@ void tasklet_kill(struct tasklet_struct } while (test_bit(TASKLET_STATE_SCHED, &t->state)); } tasklet_unlock_wait(t); + atomic_dec(&t->count); clear_bit(TASKLET_STATE_SCHED, &t->state); } EXPORT_SYMBOL(tasklet_kill); _ Patches currently in -mm which might be from ssantosh@kernel.org are softirq-fix-tasklet_kill-and-its-users.patch ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree 2016-09-20 21:55 + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree akpm @ 2016-09-21 5:18 ` Sergey Senozhatsky 2016-09-21 8:09 ` Sergey Senozhatsky 0 siblings, 1 reply; 9+ messages in thread From: Sergey Senozhatsky @ 2016-09-21 5:18 UTC (permalink / raw) To: akpm, ssantosh Cc: davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx, mm-commits, linux-kernel, sfr, linux-next, sergey.senozhatsky.work, sergey.senozhatsky Hello, On (09/20/16 14:55), akpm@linux-foundation.org wrote: > ------------------------------------------------------ > From: Santosh Shilimkar <ssantosh@kernel.org> > Subject: softirq: fix tasklet_kill() and its users > > Semantically the expectation from the tasklet init/kill API should be as > below. > > tasklet_init() == Init and Enable scheduling > tasklet_kill() == Disable scheduling and Destroy > > The tasklet_init() API exibits the above behavior but tasklet_kill() does > not. The tasklet handler can still get scheduled and run even after > tasklet_kill(). > > There are 2, 3 places where drivers are working around this issue by > calling tasklet_disable() which will add a usecount and thereby preventing > the handlers from being called. > > tasklet_enable/tasklet_disable is a pair API and they are expected to be > used together. Usage of tasklet_disable() *just* to work around tasklet > scheduling after a kill is probably not the correct and intended use of > the API. We also happen to see similar issues where in the shutdown path > the tasklet_handler is getting called even after the tasklet_kill(). > > We fix this by ensuring that tasklet_kill() does the right thing and > thereby ensures that the tasklet handler won't run after tasklet_kil() > with a very simple change. The patch fixes the tasklet code and also > removes a few driver workarounds. this commit makes ksoftirqd/x to burn CPUs on my system. with a small addition of WARN_ON_ONCE(atomic_read(&t->count) < 1) to tasklet_action(), I got [ 0.411895] ------------[ cut here ]------------ [ 0.411956] WARNING: CPU: 0 PID: 6 at kernel/softirq.c:502 tasklet_action+0xb4/0x12e [ 0.412018] Modules linked in: [ 0.412111] CPU: 0 PID: 6 Comm: ksoftirqd/0 Not tainted 4.8.0-rc7-mm1-dbg-00278-g7f4fca2-dirty #58 [ 0.412273] ffff8801330efd30 ffffffff811f073c 0000000000000000 0000000000000000 [ 0.412484] ffff8801330efd70 ffffffff8103d23e 000001f6330efd50 ffffffff81849be0 [ 0.412696] ffffffff81849be8 ffffffff8179cbfc 0000000000000000 0000000000000000 [ 0.412907] Call Trace: [ 0.412961] [<ffffffff811f073c>] dump_stack+0x68/0x92 [ 0.413020] [<ffffffff8103d23e>] __warn+0xb8/0xd3 [ 0.413079] [<ffffffff8103d2bf>] warn_slowpath_null+0x18/0x1a [ 0.413138] [<ffffffff810413d8>] tasklet_action+0xb4/0x12e [ 0.413197] [<ffffffff8104193a>] __do_softirq+0x103/0x20a [ 0.413256] [<ffffffff81041a5b>] run_ksoftirqd+0x1a/0x4b [ 0.413316] [<ffffffff8105a25e>] smpboot_thread_fn+0x20c/0x225 [ 0.413376] [<ffffffff8105a052>] ? sort_range+0x1d/0x1d [ 0.413436] [<ffffffff810578d1>] kthread+0xf4/0xfc [ 0.413494] [<ffffffff810577dd>] ? kthread_create_on_node+0x1df/0x1df [ 0.413555] [<ffffffff81057784>] ? kthread_create_on_node+0x186/0x1df [ 0.413616] [<ffffffff814c0a47>] ret_from_fork+0x27/0x40 [ 0.413735] ---[ end trace d79a87896622b4a4 ]--- -ss > Link: http://lkml.kernel.org/r/1472089950-2724-1-git-send-email-ssantosh@kernel.org > Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Tadeusz Struk <tadeusz.struk@intel.com> > Cc: Herbert Xu <herbert@gondor.apana.org.au> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Paul Bolle <pebolle@tiscali.nl> > Cc: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > Cc: Salvatore Benedetto <salvatore.benedetto@intel.com> > Cc: Karsten Keil <isdn@linux-pingi.de> > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > Cc: Ingo Molnar <mingo@elte.hu> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > drivers/crypto/qat/qat_common/adf_isr.c | 1 - > drivers/crypto/qat/qat_common/adf_sriov.c | 1 - > drivers/crypto/qat/qat_common/adf_vf_isr.c | 2 -- > drivers/isdn/gigaset/interface.c | 1 - > kernel/softirq.c | 7 ++++--- > 5 files changed, 4 insertions(+), 8 deletions(-) > > diff -puN drivers/crypto/qat/qat_common/adf_isr.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_isr.c > --- a/drivers/crypto/qat/qat_common/adf_isr.c~softirq-fix-tasklet_kill-and-its-users > +++ a/drivers/crypto/qat/qat_common/adf_isr.c > @@ -296,7 +296,6 @@ static void adf_cleanup_bh(struct adf_ac > int i; > > for (i = 0; i < hw_data->num_banks; i++) { > - tasklet_disable(&priv_data->banks[i].resp_handler); > tasklet_kill(&priv_data->banks[i].resp_handler); > } > } > diff -puN drivers/crypto/qat/qat_common/adf_sriov.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_sriov.c > --- a/drivers/crypto/qat/qat_common/adf_sriov.c~softirq-fix-tasklet_kill-and-its-users > +++ a/drivers/crypto/qat/qat_common/adf_sriov.c > @@ -204,7 +204,6 @@ void adf_disable_sriov(struct adf_accel_ > } > > for (i = 0, vf = accel_dev->pf.vf_info; i < totalvfs; i++, vf++) { > - tasklet_disable(&vf->vf2pf_bh_tasklet); > tasklet_kill(&vf->vf2pf_bh_tasklet); > mutex_destroy(&vf->pf2vf_lock); > } > diff -puN drivers/crypto/qat/qat_common/adf_vf_isr.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_vf_isr.c > --- a/drivers/crypto/qat/qat_common/adf_vf_isr.c~softirq-fix-tasklet_kill-and-its-users > +++ a/drivers/crypto/qat/qat_common/adf_vf_isr.c > @@ -191,7 +191,6 @@ static int adf_setup_pf2vf_bh(struct adf > > static void adf_cleanup_pf2vf_bh(struct adf_accel_dev *accel_dev) > { > - tasklet_disable(&accel_dev->vf.pf2vf_bh_tasklet); > tasklet_kill(&accel_dev->vf.pf2vf_bh_tasklet); > mutex_destroy(&accel_dev->vf.vf2pf_lock); > } > @@ -268,7 +267,6 @@ static void adf_cleanup_bh(struct adf_ac > { > struct adf_etr_data *priv_data = accel_dev->transport; > > - tasklet_disable(&priv_data->banks[0].resp_handler); > tasklet_kill(&priv_data->banks[0].resp_handler); > } > > diff -puN drivers/isdn/gigaset/interface.c~softirq-fix-tasklet_kill-and-its-users drivers/isdn/gigaset/interface.c > --- a/drivers/isdn/gigaset/interface.c~softirq-fix-tasklet_kill-and-its-users > +++ a/drivers/isdn/gigaset/interface.c > @@ -524,7 +524,6 @@ void gigaset_if_free(struct cardstate *c > if (!drv->have_tty) > return; > > - tasklet_disable(&cs->if_wake_tasklet); > tasklet_kill(&cs->if_wake_tasklet); > cs->tty_dev = NULL; > tty_unregister_device(drv->tty, cs->minor_index); > diff -puN kernel/softirq.c~softirq-fix-tasklet_kill-and-its-users kernel/softirq.c > --- a/kernel/softirq.c~softirq-fix-tasklet_kill-and-its-users > +++ a/kernel/softirq.c > @@ -498,7 +498,7 @@ static void tasklet_action(struct softir > list = list->next; > > if (tasklet_trylock(t)) { > - if (!atomic_read(&t->count)) { > + if (atomic_read(&t->count) == 1) { > if (!test_and_clear_bit(TASKLET_STATE_SCHED, > &t->state)) > BUG(); > @@ -534,7 +534,7 @@ static void tasklet_hi_action(struct sof > list = list->next; > > if (tasklet_trylock(t)) { > - if (!atomic_read(&t->count)) { > + if (atomic_read(&t->count) == 1) { > if (!test_and_clear_bit(TASKLET_STATE_SCHED, > &t->state)) > BUG(); > @@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct > { > t->next = NULL; > t->state = 0; > - atomic_set(&t->count, 0); > + atomic_set(&t->count, 1); > t->func = func; > t->data = data; > } > @@ -576,6 +576,7 @@ void tasklet_kill(struct tasklet_struct > } while (test_bit(TASKLET_STATE_SCHED, &t->state)); > } > tasklet_unlock_wait(t); > + atomic_dec(&t->count); > clear_bit(TASKLET_STATE_SCHED, &t->state); > } > EXPORT_SYMBOL(tasklet_kill); > _ > > Patches currently in -mm which might be from ssantosh@kernel.org are > > softirq-fix-tasklet_kill-and-its-users.patch > > -- > To unsubscribe from this list: send the line "unsubscribe mm-commits" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree 2016-09-21 5:18 ` Sergey Senozhatsky @ 2016-09-21 8:09 ` Sergey Senozhatsky 2016-09-21 17:23 ` Santosh Shilimkar 0 siblings, 1 reply; 9+ messages in thread From: Sergey Senozhatsky @ 2016-09-21 8:09 UTC (permalink / raw) To: ssantosh Cc: akpm, davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx, mm-commits, linux-kernel, sfr, linux-next, sergey.senozhatsky, sergey.senozhatsky.work didn't look into the issue, but this thing > > tasklet_init() == Init and Enable scheduling [..] > > @@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct > > { > > t->next = NULL; > > t->state = 0; > > - atomic_set(&t->count, 0); > > + atomic_set(&t->count, 1); ^^^^^^^^ > > t->func = func; > > t->data = data; > > } seems to be in conflict with #define DECLARE_TASKLET(name, func, data) \ struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data } ^^^^^^^ #define DECLARE_TASKLET_DISABLED(name, func, data) \ struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data } ^^^^^^^ as well as with the tasklet_{disable, enable} helpers static inline void tasklet_disable_nosync(struct tasklet_struct *t) { atomic_inc(&t->count); smp_mb__after_atomic(); } static inline void tasklet_disable(struct tasklet_struct *t) { tasklet_disable_nosync(t); tasklet_unlock_wait(t); smp_mb(); } static inline void tasklet_enable(struct tasklet_struct *t) { smp_mb__before_atomic(); atomic_dec(&t->count); } -ss ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree 2016-09-21 8:09 ` Sergey Senozhatsky @ 2016-09-21 17:23 ` Santosh Shilimkar 2016-09-22 0:42 ` Sergey Senozhatsky 0 siblings, 1 reply; 9+ messages in thread From: Santosh Shilimkar @ 2016-09-21 17:23 UTC (permalink / raw) To: Sergey Senozhatsky, ssantosh Cc: akpm, davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx, mm-commits, linux-kernel, sfr, linux-next, sergey.senozhatsky [-- Attachment #1: Type: text/plain, Size: 2597 bytes --] On 9/21/2016 1:09 AM, Sergey Senozhatsky wrote: > didn't look into the issue, but this thing > Thanks for reporting Sergey. >>> tasklet_init() == Init and Enable scheduling > [..] >>> @@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct >>> { >>> t->next = NULL; >>> t->state = 0; >>> - atomic_set(&t->count, 0); >>> + atomic_set(&t->count, 1); > ^^^^^^^^ >>> t->func = func; >>> t->data = data; >>> } > > seems to be in conflict with > Static helpers also needs to follow the API. > #define DECLARE_TASKLET(name, func, data) \ > struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data } > ^^^^^^^ > > #define DECLARE_TASKLET_DISABLED(name, func, data) \ > struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data } > ^^^^^^^ > > > as well as with the tasklet_{disable, enable} helpers > Those are fine since they work like a pair and the use count is always balanced. Am assuming one of the driver in your test is using the DECLARE_TASKLET to init the tasklet and killed by tasklet_kill() which leaves that tasklet to be still scheduled by tasklet action. Can you please try below patch and see if you still see the issue ? Attaching the same, just in case mailer eat the tabs. Regards, Santosh From e3e676e501a59b2a7de6e9f99ec3917c157e9caf Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar <ssantosh@kernel.org> Date: Wed, 21 Sep 2016 09:58:51 -0700 Subject: [PATCH] softirq: fix DECLARE_TASKLET[_DISABLE] macros init state In linux-next, commit 1f5e9c3bc47f ("softirq: fix tasklet_kill() and its users") changed the init state of the tasklet but missed to update the macros. Fix them too. Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> --- include/linux/interrupt.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 72f0721..cabf575 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -521,10 +521,10 @@ struct tasklet_struct }; #define DECLARE_TASKLET(name, func, data) \ -struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data } +struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data } #define DECLARE_TASKLET_DISABLED(name, func, data) \ -struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data } +struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(2), func, data } enum -- 1.9.1 [-- Attachment #2: 0001-softirq-fix-DECLARE_TASKLET-_DISABLE-macros-init-sta.patch --] [-- Type: text/x-diff, Size: 1323 bytes --] >From e3e676e501a59b2a7de6e9f99ec3917c157e9caf Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar <ssantosh@kernel.org> Date: Wed, 21 Sep 2016 09:58:51 -0700 Subject: [PATCH] softirq: fix DECLARE_TASKLET[_DISABLE] macros init state In linux-next, commit 1f5e9c3bc47f ("softirq: fix tasklet_kill() and its users") changed the init state of the tasklet but missed to update the macros. Fix them too. Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> --- include/linux/interrupt.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 72f0721..cabf575 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -521,10 +521,10 @@ struct tasklet_struct }; #define DECLARE_TASKLET(name, func, data) \ -struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data } +struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data } #define DECLARE_TASKLET_DISABLED(name, func, data) \ -struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data } +struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(2), func, data } enum -- 1.9.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree 2016-09-21 17:23 ` Santosh Shilimkar @ 2016-09-22 0:42 ` Sergey Senozhatsky 2016-09-22 2:31 ` Santosh Shilimkar 0 siblings, 1 reply; 9+ messages in thread From: Sergey Senozhatsky @ 2016-09-22 0:42 UTC (permalink / raw) To: Santosh Shilimkar Cc: Sergey Senozhatsky, ssantosh, akpm, davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx, mm-commits, linux-kernel, sfr, linux-next, sergey.senozhatsky Hello, On (09/21/16 10:23), Santosh Shilimkar wrote: > > > > tasklet_init() == Init and Enable scheduling > > [..] > > > > @@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct > > > > { > > > > t->next = NULL; > > > > t->state = 0; > > > > - atomic_set(&t->count, 0); > > > > + atomic_set(&t->count, 1); > > > > ^^^^^^^^ > > > > t->func = func; > > > > t->data = data; > > > > } > > > > seems to be in conflict with > > > Static helpers also needs to follow the API. > > > #define DECLARE_TASKLET(name, func, data) \ > > struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data } > > ^^^^^^^ > > > > #define DECLARE_TASKLET_DISABLED(name, func, data) \ > > struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data } > > ^^^^^^^ > > > > > > > as well as with the tasklet_{disable, enable} helpers > > > Those are fine since they work like a pair and the use count > is always balanced. right, the point was that DECLARE_TASKLET_DISABLED() equals to tasklet_init() and {DECLARE_TASKLET(); tasklet_disable();} equals to tasklet_init() > Am assuming one of the driver in your test is using the DECLARE_TASKLET > to init the tasklet and killed by tasklet_kill() which leaves that > tasklet to be still scheduled by tasklet action. yes, vt does something like this (kbd_bh). > Can you please try below patch and see if you still see the issue ? > Attaching the same, just in case mailer eat the tabs. hm, didn't completely fix it. the vt is now happy, unlike usbnet. and the usbnet case is rather alarming. static inline void tasklet_schedule(struct tasklet_struct *t) { + WARN_ON_ONCE(atomic_read(&t->count) < 1); + if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) __tasklet_schedule(t); } gives me the following backtrace [ 36.937798] [<ffffffffa013ff12>] usbnet_open+0x1f9/0x24f [usbnet] [ 36.937800] [<ffffffff813f7cf7>] __dev_open+0x8c/0xc8 [ 36.937801] [<ffffffff813f7f51>] __dev_change_flags+0xa2/0x13d [ 36.937802] [<ffffffff813f800c>] dev_change_flags+0x20/0x53 [ 36.937803] [<ffffffff814089da>] do_setlink+0x2f6/0xa31 [ 36.937806] [<ffffffff810cfb66>] ? get_page_from_freelist+0x5f3/0x7b2 [ 36.937808] [<ffffffff810f1995>] ? handle_mm_fault+0x82d/0xcc4 [ 36.937809] [<ffffffff81409973>] rtnl_newlink+0x39b/0x705 [ 36.937812] [<ffffffff813f6d2e>] ? netdev_master_upper_dev_get+0xd/0x57 [ 36.937813] [<ffffffff814096e9>] ? rtnl_newlink+0x111/0x705 [ 36.937816] [<ffffffff81030c5f>] ? update_stack_state.constprop.1+0x4c/0x59 [ 36.937818] [<ffffffff81407737>] rtnetlink_rcv_msg+0x16c/0x17b [ 36.937820] [<ffffffff814bf065>] ? mutex_lock_nested+0x31f/0x344 [ 36.937823] [<ffffffff8141c204>] ? netlink_deliver_tap+0x234/0x260 [ 36.937824] [<ffffffff814075cb>] ? __rtnl_unlock+0x5e/0x5e [ 36.937826] [<ffffffff8141f498>] netlink_rcv_skb+0x42/0x83 [ 36.937827] [<ffffffff81407566>] rtnetlink_rcv+0x1e/0x25 [ 36.937828] [<ffffffff8141df8a>] netlink_unicast+0x101/0x18e [ 36.937829] [<ffffffff8141e7ec>] netlink_sendmsg+0x2ef/0x300 [ 36.937832] [<ffffffff812022b7>] ? import_iovec+0x64/0x84 [ 36.937835] [<ffffffff813dc347>] sock_sendmsg+0xf/0x1a [ 36.937836] [<ffffffff813dc55b>] ___sys_sendmsg+0x17f/0x1f8 [ 36.937838] [<ffffffff810752db>] ? __lock_is_held+0x3c/0x57 [ 36.937841] [<ffffffff81207e89>] ? __this_cpu_preempt_check+0x13/0x15 [ 36.937843] [<ffffffff813dd7ad>] __sys_sendmsg+0x40/0x61 [ 36.937844] [<ffffffff813dd7ad>] ? __sys_sendmsg+0x40/0x61 [ 36.937845] [<ffffffff813dd7d7>] SyS_sendmsg+0x9/0xb [ 36.937847] [<ffffffff814c2f6a>] entry_SYSCALL_64_fastpath+0x18/0xad and there are several big problems here. looking at usbnet_probe() int usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod) { .... skb_queue_head_init (&dev->done); skb_queue_head_init(&dev->rxq_pause); dev->bh.func = usbnet_bh; dev->bh.data = (unsigned long) dev; INIT_WORK (&dev->kevent, usbnet_deferred_kevent); .... first, sometimes tasklet initialisation is performed directly, not via tasklet_init(). second, that 't->count == 0' eq 'tasklet_init()' is assumed to be sort of a contract. so a simple kzalloc() works fine, and the patch breaks it. a simple grep in drivers/net/ _next$ git grep tasklet_sched drivers/net/ | awk '{print $1}' | uniq | wc -l 60 _next$ git grep tasklet_init drivers/net/ | awk '{print $1}' | uniq | wc -l 52 and I don't know how many call-sites outside of drivers/net/ do something like this. -ss ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree 2016-09-22 0:42 ` Sergey Senozhatsky @ 2016-09-22 2:31 ` Santosh Shilimkar 2016-09-22 7:05 ` Thomas Gleixner 0 siblings, 1 reply; 9+ messages in thread From: Santosh Shilimkar @ 2016-09-22 2:31 UTC (permalink / raw) To: Sergey Senozhatsky, akpm Cc: ssantosh, davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx, mm-commits, linux-kernel, sfr, linux-next, sergey.senozhatsky On 9/21/2016 5:42 PM, Sergey Senozhatsky wrote: > Hello, > > On (09/21/16 10:23), Santosh Shilimkar wrote: [...] >> Am assuming one of the driver in your test is using the DECLARE_TASKLET >> to init the tasklet and killed by tasklet_kill() which leaves that >> tasklet to be still scheduled by tasklet action. > > yes, vt does something like this (kbd_bh). > >> Can you please try below patch and see if you still see the issue ? >> Attaching the same, just in case mailer eat the tabs. > > hm, didn't completely fix it. the vt is now happy, unlike usbnet. Good that vt works now. > and the usbnet case is rather alarming. > > > looking at usbnet_probe() > > int > usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod) > { > .... > skb_queue_head_init (&dev->done); > skb_queue_head_init(&dev->rxq_pause); > dev->bh.func = usbnet_bh; > dev->bh.data = (unsigned long) dev; > INIT_WORK (&dev->kevent, usbnet_deferred_kevent); > .... > > > first, sometimes tasklet initialisation is performed directly, not via > tasklet_init(). > > second, that 't->count == 0' eq 'tasklet_init()' is assumed to be sort of > a contract. so a simple kzalloc() works fine, and the patch breaks it. > Thats really bad that tasklet code is letting users call tasklet_schedule() even without any tasklet_init or DECLARE_TASKLET. > a simple grep in drivers/net/ > > _next$ git grep tasklet_sched drivers/net/ | awk '{print $1}' | uniq | wc -l > 60 > > _next$ git grep tasklet_init drivers/net/ | awk '{print $1}' | uniq | wc -l > 52 > > and I don't know how many call-sites outside of drivers/net/ do something > like this. > There are more :-(. Thanks for helping out Sergey. # git grep tasklet_sched . | awk '{print $1}' | uniq | wc -l 269 # git grep tasklet_init . | awk '{print $1}' | uniq | wc -l 240 Andrew, I requested you to include this patch but now am not sure anymore. Looks like there are almost 30 more users which are directly tweaking 'tasklet_struct' fields and calling other APIs. Hunting them and fixing them probably would be an exercise and also those changes needs those changed drivers to be tested. What do you suggest ? At least this patch needs to be dropped as of now till we can have complete coverage for those bad users. Regards, Santosh ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree 2016-09-22 2:31 ` Santosh Shilimkar @ 2016-09-22 7:05 ` Thomas Gleixner 2016-09-22 16:08 ` Santosh Shilimkar 2016-09-22 23:37 ` Stephen Rothwell 0 siblings, 2 replies; 9+ messages in thread From: Thomas Gleixner @ 2016-09-22 7:05 UTC (permalink / raw) To: Santosh Shilimkar Cc: Sergey Senozhatsky, Andrew Morton, ssantosh, David Miller, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle, Peter Zijlstra, salvatore.benedetto, tadeusz.struk, mm-commits, LKML, Stephen Rothwell, linux-next, sergey.senozhatsky, Ingo Molnar B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote: > I requested you to include this patch but now am not sure anymore. > Looks like there are almost 30 more users which are directly > tweaking 'tasklet_struct' fields and calling other APIs. Hunting them > and fixing them probably would be an exercise and also those changes > needs those changed drivers to be tested. > > What do you suggest ? At least this patch needs to be dropped as of now > till we can have complete coverage for those bad users. Yes, it needs to be dropped. Stephen, can you please revert it from next? How to fix this: The only way is to review all tasklet usage sites for creative abuse and then fix them one by one. This needs to be done anyway because those are ticking timebombs even without changes in the core code. I looked at one of the offenders and it's broken today, it's just protected by the extremly low probablity to hit the wreckage case. What you can do to coerce the developers/maintainers of offending code into looking at the mess they created/merged is to implement accessors for the tasklet struct fields and replace the open coded fiddling with them. Once that is done, rename the struct fields to something which is absurd enough to type. But don't worry, you will find people doing that. I catched a few brainwrecks who actually used: irqdesc->core_internal_state__do_not_mess_with_it in their code. Now after having everything converted to accessors, you can add sanity checks into the accessors and emit WARN_ONCE() when they are used in the wrong context. That'll make them look and explain why they think that fiddling in the internals is a good idea. Thanks, tglx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree 2016-09-22 7:05 ` Thomas Gleixner @ 2016-09-22 16:08 ` Santosh Shilimkar 2016-09-22 23:37 ` Stephen Rothwell 1 sibling, 0 replies; 9+ messages in thread From: Santosh Shilimkar @ 2016-09-22 16:08 UTC (permalink / raw) To: Thomas Gleixner Cc: Sergey Senozhatsky, Andrew Morton, ssantosh, David Miller, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle, Peter Zijlstra, salvatore.benedetto, tadeusz.struk, mm-commits, LKML, Stephen Rothwell, linux-next, sergey.senozhatsky, Ingo Molnar On 9/22/2016 12:05 AM, Thomas Gleixner wrote: > B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote: >> I requested you to include this patch but now am not sure anymore. >> Looks like there are almost 30 more users which are directly >> tweaking 'tasklet_struct' fields and calling other APIs. Hunting them >> and fixing them probably would be an exercise and also those changes >> needs those changed drivers to be tested. >> >> What do you suggest ? At least this patch needs to be dropped as of now >> till we can have complete coverage for those bad users. > > Yes, it needs to be dropped. Stephen, can you please revert it from next? > > How to fix this: The only way is to review all tasklet usage sites for > creative abuse and then fix them one by one. This needs to be done anyway > because those are ticking timebombs even without changes in the core > code. I looked at one of the offenders and it's broken today, it's just > protected by the extremly low probablity to hit the wreckage case. > > What you can do to coerce the developers/maintainers of offending code into > looking at the mess they created/merged is to implement accessors for the > tasklet struct fields and replace the open coded fiddling with them. > > Once that is done, rename the struct fields to something which is absurd > enough to type. But don't worry, you will find people doing that. I > catched a few brainwrecks who actually used: > > irqdesc->core_internal_state__do_not_mess_with_it > > in their code. > > Now after having everything converted to accessors, you can add sanity > checks into the accessors and emit WARN_ONCE() when they are used in the > wrong context. That'll make them look and explain why they think that > fiddling in the internals is a good idea. > Thanks Thomas for suggestion and looking into it. Sounds a good plan to me to tackle this mess. I will give a try and hopefully with help of those maintainers come up with a series first to fix the existing bad users. As you said, fixing core will be simple after that. Regards, Santosh ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree 2016-09-22 7:05 ` Thomas Gleixner 2016-09-22 16:08 ` Santosh Shilimkar @ 2016-09-22 23:37 ` Stephen Rothwell 1 sibling, 0 replies; 9+ messages in thread From: Stephen Rothwell @ 2016-09-22 23:37 UTC (permalink / raw) To: Thomas Gleixner Cc: Santosh Shilimkar, Sergey Senozhatsky, Andrew Morton, ssantosh, David Miller, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle, Peter Zijlstra, salvatore.benedetto, tadeusz.struk, mm-commits, LKML, linux-next, sergey.senozhatsky, Ingo Molnar, Tony Lindgren Hi Thomas, On Thu, 22 Sep 2016 09:05:23 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > > B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote: > > I requested you to include this patch but now am not sure anymore. > > Looks like there are almost 30 more users which are directly > > tweaking 'tasklet_struct' fields and calling other APIs. Hunting them > > and fixing them probably would be an exercise and also those changes > > needs those changed drivers to be tested. > > > > What do you suggest ? At least this patch needs to be dropped as of now > > till we can have complete coverage for those bad users. > > Yes, it needs to be dropped. Stephen, can you please revert it from next? I will do the revert today. It reverts cleanly, so hopefully there are no side effects. -- Cheers, Stephen Rothwell ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-09-22 23:37 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-09-20 21:55 + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree akpm 2016-09-21 5:18 ` Sergey Senozhatsky 2016-09-21 8:09 ` Sergey Senozhatsky 2016-09-21 17:23 ` Santosh Shilimkar 2016-09-22 0:42 ` Sergey Senozhatsky 2016-09-22 2:31 ` Santosh Shilimkar 2016-09-22 7:05 ` Thomas Gleixner 2016-09-22 16:08 ` Santosh Shilimkar 2016-09-22 23:37 ` Stephen Rothwell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.