All of lore.kernel.org
 help / color / mirror / Atom feed
* + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree
@ 2016-09-20 21:55 akpm
  2016-09-21  5:18 ` Sergey Senozhatsky
  0 siblings, 1 reply; 9+ messages in thread
From: akpm @ 2016-09-20 21:55 UTC (permalink / raw)
  To: ssantosh, davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo,
	pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx,
	mm-commits


The patch titled
     Subject: softirq: fix tasklet_kill() and its users
has been added to the -mm tree.  Its filename is
     softirq-fix-tasklet_kill-and-its-users.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/softirq-fix-tasklet_kill-and-its-users.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/softirq-fix-tasklet_kill-and-its-users.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Santosh Shilimkar <ssantosh@kernel.org>
Subject: softirq: fix tasklet_kill() and its users

Semantically the expectation from the tasklet init/kill API should be as
below.

tasklet_init() == Init and Enable scheduling
tasklet_kill() == Disable scheduling and Destroy

The tasklet_init() API exibits the above behavior but tasklet_kill() does
not.  The tasklet handler can still get scheduled and run even after
tasklet_kill().

There are 2, 3 places where drivers are working around this issue by
calling tasklet_disable() which will add a usecount and thereby preventing
the handlers from being called.

tasklet_enable/tasklet_disable is a pair API and they are expected to be
used together.  Usage of tasklet_disable() *just* to work around tasklet
scheduling after a kill is probably not the correct and intended use of
the API.  We also happen to see similar issues where in the shutdown path
the tasklet_handler is getting called even after the tasklet_kill().

We fix this by ensuring that tasklet_kill() does the right thing and
thereby ensures that the tasklet handler won't run after tasklet_kil()
with a very simple change.  The patch fixes the tasklet code and also
removes a few driver workarounds.

Link: http://lkml.kernel.org/r/1472089950-2724-1-git-send-email-ssantosh@kernel.org
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Cc: Salvatore Benedetto <salvatore.benedetto@intel.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/crypto/qat/qat_common/adf_isr.c    |    1 -
 drivers/crypto/qat/qat_common/adf_sriov.c  |    1 -
 drivers/crypto/qat/qat_common/adf_vf_isr.c |    2 --
 drivers/isdn/gigaset/interface.c           |    1 -
 kernel/softirq.c                           |    7 ++++---
 5 files changed, 4 insertions(+), 8 deletions(-)

diff -puN drivers/crypto/qat/qat_common/adf_isr.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_isr.c
--- a/drivers/crypto/qat/qat_common/adf_isr.c~softirq-fix-tasklet_kill-and-its-users
+++ a/drivers/crypto/qat/qat_common/adf_isr.c
@@ -296,7 +296,6 @@ static void adf_cleanup_bh(struct adf_ac
 	int i;
 
 	for (i = 0; i < hw_data->num_banks; i++) {
-		tasklet_disable(&priv_data->banks[i].resp_handler);
 		tasklet_kill(&priv_data->banks[i].resp_handler);
 	}
 }
diff -puN drivers/crypto/qat/qat_common/adf_sriov.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_sriov.c
--- a/drivers/crypto/qat/qat_common/adf_sriov.c~softirq-fix-tasklet_kill-and-its-users
+++ a/drivers/crypto/qat/qat_common/adf_sriov.c
@@ -204,7 +204,6 @@ void adf_disable_sriov(struct adf_accel_
 	}
 
 	for (i = 0, vf = accel_dev->pf.vf_info; i < totalvfs; i++, vf++) {
-		tasklet_disable(&vf->vf2pf_bh_tasklet);
 		tasklet_kill(&vf->vf2pf_bh_tasklet);
 		mutex_destroy(&vf->pf2vf_lock);
 	}
diff -puN drivers/crypto/qat/qat_common/adf_vf_isr.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_vf_isr.c
--- a/drivers/crypto/qat/qat_common/adf_vf_isr.c~softirq-fix-tasklet_kill-and-its-users
+++ a/drivers/crypto/qat/qat_common/adf_vf_isr.c
@@ -191,7 +191,6 @@ static int adf_setup_pf2vf_bh(struct adf
 
 static void adf_cleanup_pf2vf_bh(struct adf_accel_dev *accel_dev)
 {
-	tasklet_disable(&accel_dev->vf.pf2vf_bh_tasklet);
 	tasklet_kill(&accel_dev->vf.pf2vf_bh_tasklet);
 	mutex_destroy(&accel_dev->vf.vf2pf_lock);
 }
@@ -268,7 +267,6 @@ static void adf_cleanup_bh(struct adf_ac
 {
 	struct adf_etr_data *priv_data = accel_dev->transport;
 
-	tasklet_disable(&priv_data->banks[0].resp_handler);
 	tasklet_kill(&priv_data->banks[0].resp_handler);
 }
 
diff -puN drivers/isdn/gigaset/interface.c~softirq-fix-tasklet_kill-and-its-users drivers/isdn/gigaset/interface.c
--- a/drivers/isdn/gigaset/interface.c~softirq-fix-tasklet_kill-and-its-users
+++ a/drivers/isdn/gigaset/interface.c
@@ -524,7 +524,6 @@ void gigaset_if_free(struct cardstate *c
 	if (!drv->have_tty)
 		return;
 
-	tasklet_disable(&cs->if_wake_tasklet);
 	tasklet_kill(&cs->if_wake_tasklet);
 	cs->tty_dev = NULL;
 	tty_unregister_device(drv->tty, cs->minor_index);
diff -puN kernel/softirq.c~softirq-fix-tasklet_kill-and-its-users kernel/softirq.c
--- a/kernel/softirq.c~softirq-fix-tasklet_kill-and-its-users
+++ a/kernel/softirq.c
@@ -498,7 +498,7 @@ static void tasklet_action(struct softir
 		list = list->next;
 
 		if (tasklet_trylock(t)) {
-			if (!atomic_read(&t->count)) {
+			if (atomic_read(&t->count) == 1) {
 				if (!test_and_clear_bit(TASKLET_STATE_SCHED,
 							&t->state))
 					BUG();
@@ -534,7 +534,7 @@ static void tasklet_hi_action(struct sof
 		list = list->next;
 
 		if (tasklet_trylock(t)) {
-			if (!atomic_read(&t->count)) {
+			if (atomic_read(&t->count) == 1) {
 				if (!test_and_clear_bit(TASKLET_STATE_SCHED,
 							&t->state))
 					BUG();
@@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct
 {
 	t->next = NULL;
 	t->state = 0;
-	atomic_set(&t->count, 0);
+	atomic_set(&t->count, 1);
 	t->func = func;
 	t->data = data;
 }
@@ -576,6 +576,7 @@ void tasklet_kill(struct tasklet_struct
 		} while (test_bit(TASKLET_STATE_SCHED, &t->state));
 	}
 	tasklet_unlock_wait(t);
+	atomic_dec(&t->count);
 	clear_bit(TASKLET_STATE_SCHED, &t->state);
 }
 EXPORT_SYMBOL(tasklet_kill);
_

Patches currently in -mm which might be from ssantosh@kernel.org are

softirq-fix-tasklet_kill-and-its-users.patch


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree
  2016-09-20 21:55 + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree akpm
@ 2016-09-21  5:18 ` Sergey Senozhatsky
  2016-09-21  8:09   ` Sergey Senozhatsky
  0 siblings, 1 reply; 9+ messages in thread
From: Sergey Senozhatsky @ 2016-09-21  5:18 UTC (permalink / raw)
  To: akpm, ssantosh
  Cc: davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle,
	peterz, salvatore.benedetto, tadeusz.struk, tglx, mm-commits,
	linux-kernel, sfr, linux-next, sergey.senozhatsky.work,
	sergey.senozhatsky

Hello,

On (09/20/16 14:55), akpm@linux-foundation.org wrote:
> ------------------------------------------------------
> From: Santosh Shilimkar <ssantosh@kernel.org>
> Subject: softirq: fix tasklet_kill() and its users
> 
> Semantically the expectation from the tasklet init/kill API should be as
> below.
> 
> tasklet_init() == Init and Enable scheduling
> tasklet_kill() == Disable scheduling and Destroy
> 
> The tasklet_init() API exibits the above behavior but tasklet_kill() does
> not.  The tasklet handler can still get scheduled and run even after
> tasklet_kill().
> 
> There are 2, 3 places where drivers are working around this issue by
> calling tasklet_disable() which will add a usecount and thereby preventing
> the handlers from being called.
> 
> tasklet_enable/tasklet_disable is a pair API and they are expected to be
> used together.  Usage of tasklet_disable() *just* to work around tasklet
> scheduling after a kill is probably not the correct and intended use of
> the API.  We also happen to see similar issues where in the shutdown path
> the tasklet_handler is getting called even after the tasklet_kill().
> 
> We fix this by ensuring that tasklet_kill() does the right thing and
> thereby ensures that the tasklet handler won't run after tasklet_kil()
> with a very simple change.  The patch fixes the tasklet code and also
> removes a few driver workarounds.

this commit makes ksoftirqd/x to burn CPUs on my system. with a small
addition of WARN_ON_ONCE(atomic_read(&t->count) < 1) to tasklet_action(),
I got

[    0.411895] ------------[ cut here ]------------
[    0.411956] WARNING: CPU: 0 PID: 6 at kernel/softirq.c:502 tasklet_action+0xb4/0x12e
[    0.412018] Modules linked in:
[    0.412111] CPU: 0 PID: 6 Comm: ksoftirqd/0 Not tainted 4.8.0-rc7-mm1-dbg-00278-g7f4fca2-dirty #58
[    0.412273]  ffff8801330efd30 ffffffff811f073c 0000000000000000 0000000000000000
[    0.412484]  ffff8801330efd70 ffffffff8103d23e 000001f6330efd50 ffffffff81849be0
[    0.412696]  ffffffff81849be8 ffffffff8179cbfc 0000000000000000 0000000000000000
[    0.412907] Call Trace:
[    0.412961]  [<ffffffff811f073c>] dump_stack+0x68/0x92
[    0.413020]  [<ffffffff8103d23e>] __warn+0xb8/0xd3
[    0.413079]  [<ffffffff8103d2bf>] warn_slowpath_null+0x18/0x1a
[    0.413138]  [<ffffffff810413d8>] tasklet_action+0xb4/0x12e
[    0.413197]  [<ffffffff8104193a>] __do_softirq+0x103/0x20a
[    0.413256]  [<ffffffff81041a5b>] run_ksoftirqd+0x1a/0x4b
[    0.413316]  [<ffffffff8105a25e>] smpboot_thread_fn+0x20c/0x225
[    0.413376]  [<ffffffff8105a052>] ? sort_range+0x1d/0x1d
[    0.413436]  [<ffffffff810578d1>] kthread+0xf4/0xfc
[    0.413494]  [<ffffffff810577dd>] ? kthread_create_on_node+0x1df/0x1df
[    0.413555]  [<ffffffff81057784>] ? kthread_create_on_node+0x186/0x1df
[    0.413616]  [<ffffffff814c0a47>] ret_from_fork+0x27/0x40
[    0.413735] ---[ end trace d79a87896622b4a4 ]---

	-ss

> Link: http://lkml.kernel.org/r/1472089950-2724-1-git-send-email-ssantosh@kernel.org
> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tadeusz Struk <tadeusz.struk@intel.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Paul Bolle <pebolle@tiscali.nl>
> Cc: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> Cc: Salvatore Benedetto <salvatore.benedetto@intel.com>
> Cc: Karsten Keil <isdn@linux-pingi.de>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  drivers/crypto/qat/qat_common/adf_isr.c    |    1 -
>  drivers/crypto/qat/qat_common/adf_sriov.c  |    1 -
>  drivers/crypto/qat/qat_common/adf_vf_isr.c |    2 --
>  drivers/isdn/gigaset/interface.c           |    1 -
>  kernel/softirq.c                           |    7 ++++---
>  5 files changed, 4 insertions(+), 8 deletions(-)
> 
> diff -puN drivers/crypto/qat/qat_common/adf_isr.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_isr.c
> --- a/drivers/crypto/qat/qat_common/adf_isr.c~softirq-fix-tasklet_kill-and-its-users
> +++ a/drivers/crypto/qat/qat_common/adf_isr.c
> @@ -296,7 +296,6 @@ static void adf_cleanup_bh(struct adf_ac
>  	int i;
>  
>  	for (i = 0; i < hw_data->num_banks; i++) {
> -		tasklet_disable(&priv_data->banks[i].resp_handler);
>  		tasklet_kill(&priv_data->banks[i].resp_handler);
>  	}
>  }
> diff -puN drivers/crypto/qat/qat_common/adf_sriov.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_sriov.c
> --- a/drivers/crypto/qat/qat_common/adf_sriov.c~softirq-fix-tasklet_kill-and-its-users
> +++ a/drivers/crypto/qat/qat_common/adf_sriov.c
> @@ -204,7 +204,6 @@ void adf_disable_sriov(struct adf_accel_
>  	}
>  
>  	for (i = 0, vf = accel_dev->pf.vf_info; i < totalvfs; i++, vf++) {
> -		tasklet_disable(&vf->vf2pf_bh_tasklet);
>  		tasklet_kill(&vf->vf2pf_bh_tasklet);
>  		mutex_destroy(&vf->pf2vf_lock);
>  	}
> diff -puN drivers/crypto/qat/qat_common/adf_vf_isr.c~softirq-fix-tasklet_kill-and-its-users drivers/crypto/qat/qat_common/adf_vf_isr.c
> --- a/drivers/crypto/qat/qat_common/adf_vf_isr.c~softirq-fix-tasklet_kill-and-its-users
> +++ a/drivers/crypto/qat/qat_common/adf_vf_isr.c
> @@ -191,7 +191,6 @@ static int adf_setup_pf2vf_bh(struct adf
>  
>  static void adf_cleanup_pf2vf_bh(struct adf_accel_dev *accel_dev)
>  {
> -	tasklet_disable(&accel_dev->vf.pf2vf_bh_tasklet);
>  	tasklet_kill(&accel_dev->vf.pf2vf_bh_tasklet);
>  	mutex_destroy(&accel_dev->vf.vf2pf_lock);
>  }
> @@ -268,7 +267,6 @@ static void adf_cleanup_bh(struct adf_ac
>  {
>  	struct adf_etr_data *priv_data = accel_dev->transport;
>  
> -	tasklet_disable(&priv_data->banks[0].resp_handler);
>  	tasklet_kill(&priv_data->banks[0].resp_handler);
>  }
>  
> diff -puN drivers/isdn/gigaset/interface.c~softirq-fix-tasklet_kill-and-its-users drivers/isdn/gigaset/interface.c
> --- a/drivers/isdn/gigaset/interface.c~softirq-fix-tasklet_kill-and-its-users
> +++ a/drivers/isdn/gigaset/interface.c
> @@ -524,7 +524,6 @@ void gigaset_if_free(struct cardstate *c
>  	if (!drv->have_tty)
>  		return;
>  
> -	tasklet_disable(&cs->if_wake_tasklet);
>  	tasklet_kill(&cs->if_wake_tasklet);
>  	cs->tty_dev = NULL;
>  	tty_unregister_device(drv->tty, cs->minor_index);
> diff -puN kernel/softirq.c~softirq-fix-tasklet_kill-and-its-users kernel/softirq.c
> --- a/kernel/softirq.c~softirq-fix-tasklet_kill-and-its-users
> +++ a/kernel/softirq.c
> @@ -498,7 +498,7 @@ static void tasklet_action(struct softir
>  		list = list->next;
>  
>  		if (tasklet_trylock(t)) {
> -			if (!atomic_read(&t->count)) {
> +			if (atomic_read(&t->count) == 1) {
>  				if (!test_and_clear_bit(TASKLET_STATE_SCHED,
>  							&t->state))
>  					BUG();
> @@ -534,7 +534,7 @@ static void tasklet_hi_action(struct sof
>  		list = list->next;
>  
>  		if (tasklet_trylock(t)) {
> -			if (!atomic_read(&t->count)) {
> +			if (atomic_read(&t->count) == 1) {
>  				if (!test_and_clear_bit(TASKLET_STATE_SCHED,
>  							&t->state))
>  					BUG();
> @@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct
>  {
>  	t->next = NULL;
>  	t->state = 0;
> -	atomic_set(&t->count, 0);
> +	atomic_set(&t->count, 1);
>  	t->func = func;
>  	t->data = data;
>  }
> @@ -576,6 +576,7 @@ void tasklet_kill(struct tasklet_struct
>  		} while (test_bit(TASKLET_STATE_SCHED, &t->state));
>  	}
>  	tasklet_unlock_wait(t);
> +	atomic_dec(&t->count);
>  	clear_bit(TASKLET_STATE_SCHED, &t->state);
>  }
>  EXPORT_SYMBOL(tasklet_kill);
> _
> 
> Patches currently in -mm which might be from ssantosh@kernel.org are
> 
> softirq-fix-tasklet_kill-and-its-users.patch
> 
> --
> To unsubscribe from this list: send the line "unsubscribe mm-commits" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree
  2016-09-21  5:18 ` Sergey Senozhatsky
@ 2016-09-21  8:09   ` Sergey Senozhatsky
  2016-09-21 17:23     ` Santosh Shilimkar
  0 siblings, 1 reply; 9+ messages in thread
From: Sergey Senozhatsky @ 2016-09-21  8:09 UTC (permalink / raw)
  To: ssantosh
  Cc: akpm, davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo,
	pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx,
	mm-commits, linux-kernel, sfr, linux-next, sergey.senozhatsky,
	sergey.senozhatsky.work

didn't look into the issue, but this thing

> > tasklet_init() == Init and Enable scheduling
[..]
> > @@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct
> >  {
> >  	t->next = NULL;
> >  	t->state = 0;
> > -	atomic_set(&t->count, 0);
> > +	atomic_set(&t->count, 1);
			^^^^^^^^
> >  	t->func = func;
> >  	t->data = data;
> >  }

seems to be in conflict with

 #define DECLARE_TASKLET(name, func, data) \
 struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }
						^^^^^^^

 #define DECLARE_TASKLET_DISABLED(name, func, data) \
 struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
						^^^^^^^


as well as with the tasklet_{disable, enable} helpers

static inline void tasklet_disable_nosync(struct tasklet_struct *t)
{
	atomic_inc(&t->count);
	smp_mb__after_atomic();
}

static inline void tasklet_disable(struct tasklet_struct *t)
{
	tasklet_disable_nosync(t);
	tasklet_unlock_wait(t);
	smp_mb();
}

static inline void tasklet_enable(struct tasklet_struct *t)
{
	smp_mb__before_atomic();
	atomic_dec(&t->count);
}

	-ss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree
  2016-09-21  8:09   ` Sergey Senozhatsky
@ 2016-09-21 17:23     ` Santosh Shilimkar
  2016-09-22  0:42       ` Sergey Senozhatsky
  0 siblings, 1 reply; 9+ messages in thread
From: Santosh Shilimkar @ 2016-09-21 17:23 UTC (permalink / raw)
  To: Sergey Senozhatsky, ssantosh
  Cc: akpm, davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo,
	pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx,
	mm-commits, linux-kernel, sfr, linux-next, sergey.senozhatsky

[-- Attachment #1: Type: text/plain, Size: 2597 bytes --]

On 9/21/2016 1:09 AM, Sergey Senozhatsky wrote:
> didn't look into the issue, but this thing
>
Thanks for reporting Sergey.

>>> tasklet_init() == Init and Enable scheduling
> [..]
>>> @@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct
>>>  {
>>>  	t->next = NULL;
>>>  	t->state = 0;
>>> -	atomic_set(&t->count, 0);
>>> +	atomic_set(&t->count, 1);
>

    			^^^^^^^^
>>>  	t->func = func;
>>>  	t->data = data;
>>>  }
>
> seems to be in conflict with
>
Static helpers also needs to follow the API.

>  #define DECLARE_TASKLET(name, func, data) \
>  struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }
> 						^^^^^^^
>
>  #define DECLARE_TASKLET_DISABLED(name, func, data) \
>  struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
> 						^^^^^^^
>

>
> as well as with the tasklet_{disable, enable} helpers
>
Those are fine since they work like a pair and the use count
is always balanced.

Am assuming one of the driver in your test is using the DECLARE_TASKLET
to init the tasklet and killed by tasklet_kill() which leaves that
tasklet to be still scheduled by tasklet action.

Can you please try below patch and see if you still see the issue ?
Attaching the same, just in case mailer eat the tabs.

Regards,
Santosh

 From e3e676e501a59b2a7de6e9f99ec3917c157e9caf Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <ssantosh@kernel.org>
Date: Wed, 21 Sep 2016 09:58:51 -0700
Subject: [PATCH] softirq: fix DECLARE_TASKLET[_DISABLE] macros init state

In linux-next, commit 1f5e9c3bc47f ("softirq: fix tasklet_kill() and its 
users")
changed the init state of the tasklet but missed to update the macros.

Fix them too.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
---
  include/linux/interrupt.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 72f0721..cabf575 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -521,10 +521,10 @@ struct tasklet_struct
  };

  #define DECLARE_TASKLET(name, func, data) \
-struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }
+struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }

  #define DECLARE_TASKLET_DISABLED(name, func, data) \
-struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
+struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(2), func, data }


  enum
-- 
1.9.1








[-- Attachment #2: 0001-softirq-fix-DECLARE_TASKLET-_DISABLE-macros-init-sta.patch --]
[-- Type: text/x-diff, Size: 1323 bytes --]

>From e3e676e501a59b2a7de6e9f99ec3917c157e9caf Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <ssantosh@kernel.org>
Date: Wed, 21 Sep 2016 09:58:51 -0700
Subject: [PATCH] softirq: fix DECLARE_TASKLET[_DISABLE] macros init state

In linux-next, commit 1f5e9c3bc47f ("softirq: fix tasklet_kill() and its users")
changed the init state of the tasklet but missed to update the macros.

Fix them too.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
---
 include/linux/interrupt.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 72f0721..cabf575 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -521,10 +521,10 @@ struct tasklet_struct
 };
 
 #define DECLARE_TASKLET(name, func, data) \
-struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }
+struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
 
 #define DECLARE_TASKLET_DISABLED(name, func, data) \
-struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
+struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(2), func, data }
 
 
 enum
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree
  2016-09-21 17:23     ` Santosh Shilimkar
@ 2016-09-22  0:42       ` Sergey Senozhatsky
  2016-09-22  2:31         ` Santosh Shilimkar
  0 siblings, 1 reply; 9+ messages in thread
From: Sergey Senozhatsky @ 2016-09-22  0:42 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: Sergey Senozhatsky, ssantosh, akpm, davem, giovanni.cabiddu,
	gregkh, herbert, isdn, mingo, pebolle, peterz,
	salvatore.benedetto, tadeusz.struk, tglx, mm-commits,
	linux-kernel, sfr, linux-next, sergey.senozhatsky

Hello,

On (09/21/16 10:23), Santosh Shilimkar wrote:
> > > > tasklet_init() == Init and Enable scheduling
> > [..]
> > > > @@ -559,7 +559,7 @@ void tasklet_init(struct tasklet_struct
> > > >  {
> > > >  	t->next = NULL;
> > > >  	t->state = 0;
> > > > -	atomic_set(&t->count, 0);
> > > > +	atomic_set(&t->count, 1);
> > 
> 
>    			^^^^^^^^
> > > >  	t->func = func;
> > > >  	t->data = data;
> > > >  }
> > 
> > seems to be in conflict with
> > 
> Static helpers also needs to follow the API.
> 
> >  #define DECLARE_TASKLET(name, func, data) \
> >  struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }
> > 						^^^^^^^
> > 
> >  #define DECLARE_TASKLET_DISABLED(name, func, data) \
> >  struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
> > 						^^^^^^^
> > 
> 
> > 
> > as well as with the tasklet_{disable, enable} helpers
> > 
> Those are fine since they work like a pair and the use count
> is always balanced.

right, the point was that
  DECLARE_TASKLET_DISABLED()			equals to tasklet_init()
and
  {DECLARE_TASKLET(); tasklet_disable();}	equals to tasklet_init()

> Am assuming one of the driver in your test is using the DECLARE_TASKLET
> to init the tasklet and killed by tasklet_kill() which leaves that
> tasklet to be still scheduled by tasklet action.

yes, vt does something like this (kbd_bh).

> Can you please try below patch and see if you still see the issue ?
> Attaching the same, just in case mailer eat the tabs.

hm, didn't completely fix it. the vt is now happy, unlike usbnet.
and the usbnet case is rather alarming.

 static inline void tasklet_schedule(struct tasklet_struct *t)
 {
+       WARN_ON_ONCE(atomic_read(&t->count) < 1);
+
        if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))
                __tasklet_schedule(t);
 }

gives me the following backtrace

[   36.937798]  [<ffffffffa013ff12>] usbnet_open+0x1f9/0x24f [usbnet]
[   36.937800]  [<ffffffff813f7cf7>] __dev_open+0x8c/0xc8
[   36.937801]  [<ffffffff813f7f51>] __dev_change_flags+0xa2/0x13d
[   36.937802]  [<ffffffff813f800c>] dev_change_flags+0x20/0x53
[   36.937803]  [<ffffffff814089da>] do_setlink+0x2f6/0xa31
[   36.937806]  [<ffffffff810cfb66>] ? get_page_from_freelist+0x5f3/0x7b2
[   36.937808]  [<ffffffff810f1995>] ? handle_mm_fault+0x82d/0xcc4
[   36.937809]  [<ffffffff81409973>] rtnl_newlink+0x39b/0x705
[   36.937812]  [<ffffffff813f6d2e>] ? netdev_master_upper_dev_get+0xd/0x57
[   36.937813]  [<ffffffff814096e9>] ? rtnl_newlink+0x111/0x705
[   36.937816]  [<ffffffff81030c5f>] ? update_stack_state.constprop.1+0x4c/0x59
[   36.937818]  [<ffffffff81407737>] rtnetlink_rcv_msg+0x16c/0x17b
[   36.937820]  [<ffffffff814bf065>] ? mutex_lock_nested+0x31f/0x344
[   36.937823]  [<ffffffff8141c204>] ? netlink_deliver_tap+0x234/0x260
[   36.937824]  [<ffffffff814075cb>] ? __rtnl_unlock+0x5e/0x5e
[   36.937826]  [<ffffffff8141f498>] netlink_rcv_skb+0x42/0x83
[   36.937827]  [<ffffffff81407566>] rtnetlink_rcv+0x1e/0x25
[   36.937828]  [<ffffffff8141df8a>] netlink_unicast+0x101/0x18e
[   36.937829]  [<ffffffff8141e7ec>] netlink_sendmsg+0x2ef/0x300
[   36.937832]  [<ffffffff812022b7>] ? import_iovec+0x64/0x84
[   36.937835]  [<ffffffff813dc347>] sock_sendmsg+0xf/0x1a
[   36.937836]  [<ffffffff813dc55b>] ___sys_sendmsg+0x17f/0x1f8
[   36.937838]  [<ffffffff810752db>] ? __lock_is_held+0x3c/0x57
[   36.937841]  [<ffffffff81207e89>] ? __this_cpu_preempt_check+0x13/0x15
[   36.937843]  [<ffffffff813dd7ad>] __sys_sendmsg+0x40/0x61
[   36.937844]  [<ffffffff813dd7ad>] ? __sys_sendmsg+0x40/0x61
[   36.937845]  [<ffffffff813dd7d7>] SyS_sendmsg+0x9/0xb
[   36.937847]  [<ffffffff814c2f6a>] entry_SYSCALL_64_fastpath+0x18/0xad


and there are several big problems here.


looking at usbnet_probe()

int
usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
{
....
	skb_queue_head_init (&dev->done);
	skb_queue_head_init(&dev->rxq_pause);
	dev->bh.func = usbnet_bh;
	dev->bh.data = (unsigned long) dev;
	INIT_WORK (&dev->kevent, usbnet_deferred_kevent);
....


first, sometimes tasklet initialisation is performed directly, not via
tasklet_init().

second, that 't->count == 0' eq 'tasklet_init()' is assumed to be sort of
a contract. so a simple kzalloc() works fine, and the patch breaks it.



a simple grep in drivers/net/

_next$ git grep tasklet_sched drivers/net/ | awk '{print $1}' | uniq | wc -l
60

_next$ git grep tasklet_init drivers/net/ | awk '{print $1}' | uniq | wc -l
52


and I don't know how many call-sites outside of drivers/net/ do something
like this.

	-ss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree
  2016-09-22  0:42       ` Sergey Senozhatsky
@ 2016-09-22  2:31         ` Santosh Shilimkar
  2016-09-22  7:05           ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Santosh Shilimkar @ 2016-09-22  2:31 UTC (permalink / raw)
  To: Sergey Senozhatsky, akpm
  Cc: ssantosh, davem, giovanni.cabiddu, gregkh, herbert, isdn, mingo,
	pebolle, peterz, salvatore.benedetto, tadeusz.struk, tglx,
	mm-commits, linux-kernel, sfr, linux-next, sergey.senozhatsky

On 9/21/2016 5:42 PM, Sergey Senozhatsky wrote:
> Hello,
>
> On (09/21/16 10:23), Santosh Shilimkar wrote:

[...]

>> Am assuming one of the driver in your test is using the DECLARE_TASKLET
>> to init the tasklet and killed by tasklet_kill() which leaves that
>> tasklet to be still scheduled by tasklet action.
>
> yes, vt does something like this (kbd_bh).
>
>> Can you please try below patch and see if you still see the issue ?
>> Attaching the same, just in case mailer eat the tabs.
>
> hm, didn't completely fix it. the vt is now happy, unlike usbnet.
Good that vt works now.

> and the usbnet case is rather alarming.
>
>
> looking at usbnet_probe()
>
> int
> usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
> {
> ....
> 	skb_queue_head_init (&dev->done);
> 	skb_queue_head_init(&dev->rxq_pause);
> 	dev->bh.func = usbnet_bh;
> 	dev->bh.data = (unsigned long) dev;
> 	INIT_WORK (&dev->kevent, usbnet_deferred_kevent);
> ....
>
>
> first, sometimes tasklet initialisation is performed directly, not via
> tasklet_init().
>
> second, that 't->count == 0' eq 'tasklet_init()' is assumed to be sort of
> a contract. so a simple kzalloc() works fine, and the patch breaks it.
>
Thats really bad that tasklet code is letting users call 
tasklet_schedule() even without any tasklet_init or DECLARE_TASKLET.

> a simple grep in drivers/net/
>
> _next$ git grep tasklet_sched drivers/net/ | awk '{print $1}' | uniq | wc -l
> 60
>
> _next$ git grep tasklet_init drivers/net/ | awk '{print $1}' | uniq | wc -l
> 52
>
> and I don't know how many call-sites outside of drivers/net/ do something
> like this.
>
There are more :-(. Thanks for helping out Sergey.

# git grep tasklet_sched . | awk '{print $1}' | uniq | wc -l
269
# git grep tasklet_init . | awk '{print $1}' | uniq | wc -l
240

Andrew,
I requested you to include this patch but now am not sure anymore.
Looks like there are almost 30 more users which are directly
tweaking 'tasklet_struct' fields and calling other APIs. Hunting them
and fixing them probably would be an exercise and also those changes
needs those changed drivers to be tested.

What do you suggest ? At least this patch needs to be dropped as of now
till we can have complete coverage for those bad users.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree
  2016-09-22  2:31         ` Santosh Shilimkar
@ 2016-09-22  7:05           ` Thomas Gleixner
  2016-09-22 16:08             ` Santosh Shilimkar
  2016-09-22 23:37             ` Stephen Rothwell
  0 siblings, 2 replies; 9+ messages in thread
From: Thomas Gleixner @ 2016-09-22  7:05 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: Sergey Senozhatsky, Andrew Morton, ssantosh, David Miller,
	giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle,
	Peter Zijlstra, salvatore.benedetto, tadeusz.struk, mm-commits,
	LKML, Stephen Rothwell, linux-next, sergey.senozhatsky,
	Ingo Molnar

B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote:
> I requested you to include this patch but now am not sure anymore.
> Looks like there are almost 30 more users which are directly
> tweaking 'tasklet_struct' fields and calling other APIs. Hunting them
> and fixing them probably would be an exercise and also those changes
> needs those changed drivers to be tested.
> 
> What do you suggest ? At least this patch needs to be dropped as of now
> till we can have complete coverage for those bad users.

Yes, it needs to be dropped. Stephen, can you please revert it from next?

How to fix this: The only way is to review all tasklet usage sites for
creative abuse and then fix them one by one. This needs to be done anyway
because those are ticking timebombs even without changes in the core
code. I looked at one of the offenders and it's broken today, it's just
protected by the extremly low probablity to hit the wreckage case.

What you can do to coerce the developers/maintainers of offending code into
looking at the mess they created/merged is to implement accessors for the
tasklet struct fields and replace the open coded fiddling with them.

Once that is done, rename the struct fields to something which is absurd
enough to type.  But don't worry, you will find people doing that. I
catched a few brainwrecks who actually used:

 irqdesc->core_internal_state__do_not_mess_with_it

in their code.

Now after having everything converted to accessors, you can add sanity
checks into the accessors and emit WARN_ONCE() when they are used in the
wrong context. That'll make them look and explain why they think that
fiddling in the internals is a good idea.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree
  2016-09-22  7:05           ` Thomas Gleixner
@ 2016-09-22 16:08             ` Santosh Shilimkar
  2016-09-22 23:37             ` Stephen Rothwell
  1 sibling, 0 replies; 9+ messages in thread
From: Santosh Shilimkar @ 2016-09-22 16:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sergey Senozhatsky, Andrew Morton, ssantosh, David Miller,
	giovanni.cabiddu, gregkh, herbert, isdn, mingo, pebolle,
	Peter Zijlstra, salvatore.benedetto, tadeusz.struk, mm-commits,
	LKML, Stephen Rothwell, linux-next, sergey.senozhatsky,
	Ingo Molnar

On 9/22/2016 12:05 AM, Thomas Gleixner wrote:
> B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote:
>> I requested you to include this patch but now am not sure anymore.
>> Looks like there are almost 30 more users which are directly
>> tweaking 'tasklet_struct' fields and calling other APIs. Hunting them
>> and fixing them probably would be an exercise and also those changes
>> needs those changed drivers to be tested.
>>
>> What do you suggest ? At least this patch needs to be dropped as of now
>> till we can have complete coverage for those bad users.
>
> Yes, it needs to be dropped. Stephen, can you please revert it from next?
>
> How to fix this: The only way is to review all tasklet usage sites for
> creative abuse and then fix them one by one. This needs to be done anyway
> because those are ticking timebombs even without changes in the core
> code. I looked at one of the offenders and it's broken today, it's just
> protected by the extremly low probablity to hit the wreckage case.
>
> What you can do to coerce the developers/maintainers of offending code into
> looking at the mess they created/merged is to implement accessors for the
> tasklet struct fields and replace the open coded fiddling with them.
>
> Once that is done, rename the struct fields to something which is absurd
> enough to type.  But don't worry, you will find people doing that. I
> catched a few brainwrecks who actually used:
>
>  irqdesc->core_internal_state__do_not_mess_with_it
>
> in their code.
>
> Now after having everything converted to accessors, you can add sanity
> checks into the accessors and emit WARN_ONCE() when they are used in the
> wrong context. That'll make them look and explain why they think that
> fiddling in the internals is a good idea.
>
Thanks Thomas for suggestion and looking into it. Sounds a good plan
to me to tackle this mess. I will give a try and hopefully with help
of those maintainers come up with a series first to fix the
existing bad users. As you said, fixing core will be simple after that.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree
  2016-09-22  7:05           ` Thomas Gleixner
  2016-09-22 16:08             ` Santosh Shilimkar
@ 2016-09-22 23:37             ` Stephen Rothwell
  1 sibling, 0 replies; 9+ messages in thread
From: Stephen Rothwell @ 2016-09-22 23:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Santosh Shilimkar, Sergey Senozhatsky, Andrew Morton, ssantosh,
	David Miller, giovanni.cabiddu, gregkh, herbert, isdn, mingo,
	pebolle, Peter Zijlstra, salvatore.benedetto, tadeusz.struk,
	mm-commits, LKML, linux-next, sergey.senozhatsky, Ingo Molnar,
	Tony Lindgren

Hi Thomas,

On Thu, 22 Sep 2016 09:05:23 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote:
>
> B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote:
> > I requested you to include this patch but now am not sure anymore.
> > Looks like there are almost 30 more users which are directly
> > tweaking 'tasklet_struct' fields and calling other APIs. Hunting them
> > and fixing them probably would be an exercise and also those changes
> > needs those changed drivers to be tested.
> > 
> > What do you suggest ? At least this patch needs to be dropped as of now
> > till we can have complete coverage for those bad users.  
> 
> Yes, it needs to be dropped. Stephen, can you please revert it from next?

I will do the revert today.  It reverts cleanly, so hopefully there are
no side effects.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-22 23:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-20 21:55 + softirq-fix-tasklet_kill-and-its-users.patch added to -mm tree akpm
2016-09-21  5:18 ` Sergey Senozhatsky
2016-09-21  8:09   ` Sergey Senozhatsky
2016-09-21 17:23     ` Santosh Shilimkar
2016-09-22  0:42       ` Sergey Senozhatsky
2016-09-22  2:31         ` Santosh Shilimkar
2016-09-22  7:05           ` Thomas Gleixner
2016-09-22 16:08             ` Santosh Shilimkar
2016-09-22 23:37             ` Stephen Rothwell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.