[v2] iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING"
diff mbox series

Message ID 20160809001937.GA23658@google.com
State New, archived
Headers show
Series
  • [v2] iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING"
Related show

Commit Message

Brian Norris Aug. 9, 2016, 12:19 a.m. UTC
When using CONFIG_DEBUG_ATOMIC_SLEEP, the scheduler nicely points out
that we're calling sleeping primitives within the wait_event loop, which
means we might clobber the task state:

[   10.831289] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffc00026b610>]
[   10.845531] ------------[ cut here ]------------
[   10.850161] WARNING: at kernel/sched/core.c:7630
...
[   12.164333] ---[ end trace 45409966a9a76438 ]---
[   12.168942] Call trace:
[   12.171391] [<ffffffc00024ed44>] __might_sleep+0x64/0x90
[   12.176699] [<ffffffc000954774>] mutex_lock_nested+0x50/0x3fc
[   12.182440] [<ffffffc0007b9424>] iio_kfifo_buf_data_available+0x28/0x4c
[   12.189043] [<ffffffc0007b76ac>] iio_buffer_ready+0x60/0xe0
[   12.194608] [<ffffffc0007b7834>] iio_buffer_read_first_n_outer+0x108/0x1a8
[   12.201474] [<ffffffc000370d48>] __vfs_read+0x58/0x114
[   12.206606] [<ffffffc000371740>] vfs_read+0x94/0x118
[   12.211564] [<ffffffc0003720f8>] SyS_read+0x64/0xb4
[   12.216436] [<ffffffc000203cb4>] el0_svc_naked+0x24/0x28

To avoid this, we should (a la https://lwn.net/Articles/628628/) use the
wait_woken() function, which avoids the nested sleeping while still
handling races between waiting / wake-events.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---
v2:
 * Only add/remove to/from the wait queue once per call
 * Restore interruptability via explicit signal_pending() call
 * Refactor to avoid nested loop

Note that I did not remove the first (semi-redundant) test for
!indio_dev->info, so as to avoid disturbing the error codes used here.

 drivers/iio/industrialio-buffer.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

Comments

Jonathan Cameron Aug. 15, 2016, 3:54 p.m. UTC | #1
On 09/08/16 01:19, Brian Norris wrote:
> When using CONFIG_DEBUG_ATOMIC_SLEEP, the scheduler nicely points out
> that we're calling sleeping primitives within the wait_event loop, which
> means we might clobber the task state:
> 
> [   10.831289] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffc00026b610>]
> [   10.845531] ------------[ cut here ]------------
> [   10.850161] WARNING: at kernel/sched/core.c:7630
> ...
> [   12.164333] ---[ end trace 45409966a9a76438 ]---
> [   12.168942] Call trace:
> [   12.171391] [<ffffffc00024ed44>] __might_sleep+0x64/0x90
> [   12.176699] [<ffffffc000954774>] mutex_lock_nested+0x50/0x3fc
> [   12.182440] [<ffffffc0007b9424>] iio_kfifo_buf_data_available+0x28/0x4c
> [   12.189043] [<ffffffc0007b76ac>] iio_buffer_ready+0x60/0xe0
> [   12.194608] [<ffffffc0007b7834>] iio_buffer_read_first_n_outer+0x108/0x1a8
> [   12.201474] [<ffffffc000370d48>] __vfs_read+0x58/0x114
> [   12.206606] [<ffffffc000371740>] vfs_read+0x94/0x118
> [   12.211564] [<ffffffc0003720f8>] SyS_read+0x64/0xb4
> [   12.216436] [<ffffffc000203cb4>] el0_svc_naked+0x24/0x28
> 
> To avoid this, we should (a la https://lwn.net/Articles/628628/) use the
> wait_woken() function, which avoids the nested sleeping while still
> handling races between waiting / wake-events.
> 
> Signed-off-by: Brian Norris <briannorris@chromium.org>
Looks good to me, but given Lars' involvement in the discussion I'd
like his review before applying this.

Jonathan
> ---
> v2:
>  * Only add/remove to/from the wait queue once per call
>  * Restore interruptability via explicit signal_pending() call
>  * Refactor to avoid nested loop
> 
> Note that I did not remove the first (semi-redundant) test for
> !indio_dev->info, so as to avoid disturbing the error codes used here.
> 
>  drivers/iio/industrialio-buffer.c | 23 +++++++++++++++++------
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
> index 90462fcf5436..49bf9c59f117 100644
> --- a/drivers/iio/industrialio-buffer.c
> +++ b/drivers/iio/industrialio-buffer.c
> @@ -107,6 +107,7 @@ ssize_t iio_buffer_read_first_n_outer(struct file *filp, char __user *buf,
>  {
>  	struct iio_dev *indio_dev = filp->private_data;
>  	struct iio_buffer *rb = indio_dev->buffer;
> +	DEFINE_WAIT_FUNC(wait, woken_wake_function);
>  	size_t datum_size;
>  	size_t to_wait;
>  	int ret;
> @@ -131,19 +132,29 @@ ssize_t iio_buffer_read_first_n_outer(struct file *filp, char __user *buf,
>  	else
>  		to_wait = min_t(size_t, n / datum_size, rb->watermark);
>  
> +	add_wait_queue(&rb->pollq, &wait);
>  	do {
> -		ret = wait_event_interruptible(rb->pollq,
> -		      iio_buffer_ready(indio_dev, rb, to_wait, n / datum_size));
> -		if (ret)
> -			return ret;
> +		if (!indio_dev->info) {
> +			ret = -ENODEV;
> +			break;
> +		}
>  
> -		if (!indio_dev->info)
> -			return -ENODEV;
> +		if (!iio_buffer_ready(indio_dev, rb, to_wait, n / datum_size)) {
> +			if (signal_pending(current)) {
> +				ret = -ERESTARTSYS;
> +				break;
> +			}
> +
> +			wait_woken(&wait, TASK_INTERRUPTIBLE,
> +				   MAX_SCHEDULE_TIMEOUT);
> +			continue;
> +		}
>  
>  		ret = rb->access->read_first_n(rb, n, buf);
>  		if (ret == 0 && (filp->f_flags & O_NONBLOCK))
>  			ret = -EAGAIN;
>  	 } while (ret == 0);
> +	remove_wait_queue(&rb->pollq, &wait);
>  
>  	return ret;
>  }
>
Lars-Peter Clausen Aug. 16, 2016, 3:27 p.m. UTC | #2
On 08/15/2016 05:54 PM, Jonathan Cameron wrote:
> On 09/08/16 01:19, Brian Norris wrote:
>> When using CONFIG_DEBUG_ATOMIC_SLEEP, the scheduler nicely points out
>> that we're calling sleeping primitives within the wait_event loop, which
>> means we might clobber the task state:
>>
>> [   10.831289] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffc00026b610>]
>> [   10.845531] ------------[ cut here ]------------
>> [   10.850161] WARNING: at kernel/sched/core.c:7630
>> ...
>> [   12.164333] ---[ end trace 45409966a9a76438 ]---
>> [   12.168942] Call trace:
>> [   12.171391] [<ffffffc00024ed44>] __might_sleep+0x64/0x90
>> [   12.176699] [<ffffffc000954774>] mutex_lock_nested+0x50/0x3fc
>> [   12.182440] [<ffffffc0007b9424>] iio_kfifo_buf_data_available+0x28/0x4c
>> [   12.189043] [<ffffffc0007b76ac>] iio_buffer_ready+0x60/0xe0
>> [   12.194608] [<ffffffc0007b7834>] iio_buffer_read_first_n_outer+0x108/0x1a8
>> [   12.201474] [<ffffffc000370d48>] __vfs_read+0x58/0x114
>> [   12.206606] [<ffffffc000371740>] vfs_read+0x94/0x118
>> [   12.211564] [<ffffffc0003720f8>] SyS_read+0x64/0xb4
>> [   12.216436] [<ffffffc000203cb4>] el0_svc_naked+0x24/0x28
>>
>> To avoid this, we should (a la https://lwn.net/Articles/628628/) use the
>> wait_woken() function, which avoids the nested sleeping while still
>> handling races between waiting / wake-events.
>>
>> Signed-off-by: Brian Norris <briannorris@chromium.org>
> Looks good to me, but given Lars' involvement in the discussion I'd
> like his review before applying this.

Looks good. Thanks Brian for fixing this.

Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
Jonathan Cameron Aug. 21, 2016, 11:21 a.m. UTC | #3
On 16/08/16 16:27, Lars-Peter Clausen wrote:
> On 08/15/2016 05:54 PM, Jonathan Cameron wrote:
>> On 09/08/16 01:19, Brian Norris wrote:
>>> When using CONFIG_DEBUG_ATOMIC_SLEEP, the scheduler nicely points out
>>> that we're calling sleeping primitives within the wait_event loop, which
>>> means we might clobber the task state:
>>>
>>> [   10.831289] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffc00026b610>]
>>> [   10.845531] ------------[ cut here ]------------
>>> [   10.850161] WARNING: at kernel/sched/core.c:7630
>>> ...
>>> [   12.164333] ---[ end trace 45409966a9a76438 ]---
>>> [   12.168942] Call trace:
>>> [   12.171391] [<ffffffc00024ed44>] __might_sleep+0x64/0x90
>>> [   12.176699] [<ffffffc000954774>] mutex_lock_nested+0x50/0x3fc
>>> [   12.182440] [<ffffffc0007b9424>] iio_kfifo_buf_data_available+0x28/0x4c
>>> [   12.189043] [<ffffffc0007b76ac>] iio_buffer_ready+0x60/0xe0
>>> [   12.194608] [<ffffffc0007b7834>] iio_buffer_read_first_n_outer+0x108/0x1a8
>>> [   12.201474] [<ffffffc000370d48>] __vfs_read+0x58/0x114
>>> [   12.206606] [<ffffffc000371740>] vfs_read+0x94/0x118
>>> [   12.211564] [<ffffffc0003720f8>] SyS_read+0x64/0xb4
>>> [   12.216436] [<ffffffc000203cb4>] el0_svc_naked+0x24/0x28
>>>
>>> To avoid this, we should (a la https://lwn.net/Articles/628628/) use the
>>> wait_woken() function, which avoids the nested sleeping while still
>>> handling races between waiting / wake-events.
>>>
>>> Signed-off-by: Brian Norris <briannorris@chromium.org>
>> Looks good to me, but given Lars' involvement in the discussion I'd
>> like his review before applying this.
> 
> Looks good. Thanks Brian for fixing this.
> 
> Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
I've applied to this to the fixes-togreg branch of iio.git

For now I haven't marked it for stable, purely because I'm not sure
when the first 'problem' usage was introduced.  I'm happy to explicitly
send a request for stable inclusion if anyone wants to track down
which stable trees this is applicable to.

I've very low on time today (holiday catch up) so won't get to
dig into it myself for a at least a few weeks.

Jonathan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-iio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Lars-Peter Clausen Aug. 21, 2016, 12:26 p.m. UTC | #4
On 08/21/2016 01:21 PM, Jonathan Cameron wrote:
[...]
> I've applied to this to the fixes-togreg branch of iio.git
> 
> For now I haven't marked it for stable, purely because I'm not sure
> when the first 'problem' usage was introduced.  I'm happy to explicitly
> send a request for stable inclusion if anyone wants to track down
> which stable trees this is applicable to.
> 
> I've very low on time today (holiday catch up) so won't get to
> dig into it myself for a at least a few weeks.

The issue has been around for a bit longer, but the wait_woken()
infrastructure has only available since v3.19.
Jonathan Cameron Aug. 21, 2016, 3:23 p.m. UTC | #5
On 21/08/16 13:26, Lars-Peter Clausen wrote:
> On 08/21/2016 01:21 PM, Jonathan Cameron wrote:
> [...]
>> I've applied to this to the fixes-togreg branch of iio.git
>>
>> For now I haven't marked it for stable, purely because I'm not sure
>> when the first 'problem' usage was introduced.  I'm happy to explicitly
>> send a request for stable inclusion if anyone wants to track down
>> which stable trees this is applicable to.
>>
>> I've very low on time today (holiday catch up) so won't get to
>> dig into it myself for a at least a few weeks.
> 
> The issue has been around for a bit longer, but the wait_woken()
> infrastructure has only available since v3.19.
> 
I've marked for stable and added a note about that.
Will do for now!

Jonathan

Patch
diff mbox series

diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
index 90462fcf5436..49bf9c59f117 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -107,6 +107,7 @@  ssize_t iio_buffer_read_first_n_outer(struct file *filp, char __user *buf,
 {
 	struct iio_dev *indio_dev = filp->private_data;
 	struct iio_buffer *rb = indio_dev->buffer;
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	size_t datum_size;
 	size_t to_wait;
 	int ret;
@@ -131,19 +132,29 @@  ssize_t iio_buffer_read_first_n_outer(struct file *filp, char __user *buf,
 	else
 		to_wait = min_t(size_t, n / datum_size, rb->watermark);
 
+	add_wait_queue(&rb->pollq, &wait);
 	do {
-		ret = wait_event_interruptible(rb->pollq,
-		      iio_buffer_ready(indio_dev, rb, to_wait, n / datum_size));
-		if (ret)
-			return ret;
+		if (!indio_dev->info) {
+			ret = -ENODEV;
+			break;
+		}
 
-		if (!indio_dev->info)
-			return -ENODEV;
+		if (!iio_buffer_ready(indio_dev, rb, to_wait, n / datum_size)) {
+			if (signal_pending(current)) {
+				ret = -ERESTARTSYS;
+				break;
+			}
+
+			wait_woken(&wait, TASK_INTERRUPTIBLE,
+				   MAX_SCHEDULE_TIMEOUT);
+			continue;
+		}
 
 		ret = rb->access->read_first_n(rb, n, buf);
 		if (ret == 0 && (filp->f_flags & O_NONBLOCK))
 			ret = -EAGAIN;
 	 } while (ret == 0);
+	remove_wait_queue(&rb->pollq, &wait);
 
 	return ret;
 }