linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] firmware: fix async/manual firmware loading
@ 2016-10-30 14:50 Yves-Alexis Perez
  2016-10-30 14:50 ` Yves-Alexis Perez
  0 siblings, 1 reply; 15+ messages in thread
From: Yves-Alexis Perez @ 2016-10-30 14:50 UTC (permalink / raw)
  To: linux-kernel

Hi,

when trying to (ab)use the firmware loading interface in a local kernel module
(in order to load the EEPROM content into a PCIE card), I noticed that the
manual firmware loading interface (the one from
/sys/class/firmware/<foo>/{loading,data}) was broken.

After instrumenting the kernel I noticed an issue with the way
wait_for_completion_interruptible_timeout() is called in _request_firmware()
and especially how the return value is handled: it's supposed to be a long, but
here it's silently casted to an int and tested if positive. The initial timeout
seems to be LONG_MAX (since it's a manual firmware loading you're supposed to
have all the time you want to do it), so the return value overflows the int.

Attached patch fixes the problem here, although there might be a better way. I
tested it using lib/test_firmware.c kernel module, with the following patch to
enable manual loading:

diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index a3e8ec3..01d333c 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -105,7 +105,7 @@ static ssize_t trigger_async_request_store(struct device *dev,
        mutex_lock(&test_fw_mutex);
        release_firmware(test_firmware);
        test_firmware = NULL;
-       rc = request_firmware_nowait(THIS_MODULE, 1, name, dev, GFP_KERNEL,
+       rc = request_firmware_nowait(THIS_MODULE, 0, name, dev, GFP_KERNEL,
                                     NULL, trigger_async_request_cb);
        if (rc) {
                pr_info("async load of '%s' failed: %d\n", name, rc);

Then load test_firmware and:

# echo -n test > /sys/devices/virtual/misc/test_firmware/trigger_async_request

In another terminal, do:

# echo -n 1 > /sys/class/firmware/test/loading
# echo -n data > /sys/class/firmware/test/data
# echo -n 0 > /sys/class/firmware/test/loading

Without the patch, the loading fails right after the "echo -n 0", with it the loading succeeds with:

[   96.405171] test_firmware: loaded: 4

Regards,
-- 
Yves-Alexis

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] firmware: fix async/manual firmware loading
  2016-10-30 14:50 [PATCH] firmware: fix async/manual firmware loading Yves-Alexis Perez
@ 2016-10-30 14:50 ` Yves-Alexis Perez
  2016-10-30 17:28   ` Yves-Alexis Perez
  2016-11-09 20:39   ` Luis R. Rodriguez
  0 siblings, 2 replies; 15+ messages in thread
From: Yves-Alexis Perez @ 2016-10-30 14:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yves-Alexis Perez, Yves-Alexis Perez, Ming Lei,
	Luis R. Rodriguez, Greg Kroah-Hartman, stable

From: Yves-Alexis Perez <corsac@debian.org>

wait_for_completion_interruptible_timeout() return value is either
-ERESTARTSYS (in case it was interrupted), 0 (in case the timeout expired)
or the number of jiffies left until timeout. The return value is stored in
a long, but in _request_firmware_load() it's silently casted to an int,
which can overflow and give a negative value, indicating an error.

Fix this by re-using the timeout variable and only set retval when it's
safe.

Signed-off-by: Yves-Alexis Perez <corsac@corsac.net>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org

---
 drivers/base/firmware_class.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 22d1760..a95e1e5 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -955,13 +955,14 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
 		timeout = MAX_JIFFY_OFFSET;
 	}
 
-	retval = wait_for_completion_interruptible_timeout(&buf->completion,
+	timeout = wait_for_completion_interruptible_timeout(&buf->completion,
 			timeout);
-	if (retval == -ERESTARTSYS || !retval) {
+	if (timeout == -ERESTARTSYS || !timeout) {
+		retval = timeout;
 		mutex_lock(&fw_lock);
 		fw_load_abort(fw_priv);
 		mutex_unlock(&fw_lock);
-	} else if (retval > 0) {
+	} else if (timeout > 0) {
 		retval = 0;
 	}
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-10-30 14:50 ` Yves-Alexis Perez
@ 2016-10-30 17:28   ` Yves-Alexis Perez
  2016-11-09 20:36     ` Luis R. Rodriguez
  2016-11-09 20:39   ` Luis R. Rodriguez
  1 sibling, 1 reply; 15+ messages in thread
From: Yves-Alexis Perez @ 2016-10-30 17:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ming Lei, Luis R. Rodriguez, Greg Kroah-Hartman, stable

[-- Attachment #1: Type: text/plain, Size: 2872 bytes --]

On Sun, 2016-10-30 at 15:50 +0100, Yves-Alexis Perez wrote:
> wait_for_completion_interruptible_timeout() return value is either
> -ERESTARTSYS (in case it was interrupted), 0 (in case the timeout expired)
> or the number of jiffies left until timeout. The return value is stored in
> a long, but in _request_firmware_load() it's silently casted to an int,
> which can overflow and give a negative value, indicating an error.
> 
> Fix this by re-using the timeout variable and only set retval when it's
> safe.

I completely messed-up my git send-email (sorry), so the cover letter only
went to LKML and I assume nobody read it. So just in case, I'm pasting it
below because it gives some more explanation about how I tested this:

when trying to (ab)use the firmware loading interface in a local kernel module
(in order to load the EEPROM content into a PCIE card), I noticed that the
manual firmware loading interface (the one from
/sys/class/firmware/<foo>/{loading,data}) was broken.

After instrumenting the kernel I noticed an issue with the way
wait_for_completion_interruptible_timeout() is called in _request_firmware()
and especially how the return value is handled: it's supposed to be a long, but
here it's silently casted to an int and tested if positive. The initial timeout
seems to be LONG_MAX (since it's a manual firmware loading you're supposed to
have all the time you want to do it), so the return value overflows the int.

Attached patch fixes the problem here, although there might be a better way. I
tested it using lib/test_firmware.c kernel module, with the following patch to
enable manual loading:

diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index a3e8ec3..01d333c 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -105,7 +105,7 @@ static ssize_t trigger_async_request_store(struct device *dev,
        mutex_lock(&test_fw_mutex);
        release_firmware(test_firmware);
        test_firmware = NULL;
-       rc = request_firmware_nowait(THIS_MODULE, 1, name, dev, GFP_KERNEL,
+       rc = request_firmware_nowait(THIS_MODULE, 0, name, dev, GFP_KERNEL,
                                     NULL, trigger_async_request_cb);
        if (rc) {
                pr_info("async load of '%s' failed: %d\n", name, rc);

Then load test_firmware and:

# echo -n test > /sys/devices/virtual/misc/test_firmware/trigger_async_request

In another terminal, do:

# echo -n 1 > /sys/class/firmware/test/loading
# echo -n data > /sys/class/firmware/test/data
# echo -n 0 > /sys/class/firmware/test/loading

Without the patch, the loading fails right after the "echo -n 0", with it the loading succeeds with:

[   96.405171] test_firmware: loaded: 4

Regards,
-- 
Yves-Alexis

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-10-30 17:28   ` Yves-Alexis Perez
@ 2016-11-09 20:36     ` Luis R. Rodriguez
  2016-11-09 22:02       ` Luis R. Rodriguez
  0 siblings, 1 reply; 15+ messages in thread
From: Luis R. Rodriguez @ 2016-11-09 20:36 UTC (permalink / raw)
  To: Yves-Alexis Perez
  Cc: linux-kernel, Ming Lei, Luis R. Rodriguez, Greg Kroah-Hartman,
	Johannes Berg, Jouni Malinen, Kees Cook, Jiri Kosina, Jiri Slaby,
	Tom Gundersen, Kay Sievers, Josh Boyer, Dmitry Torokhov,
	Andy Lutomirski, Harald Hoyer, Seth Forshee, Bjorn Andersson,
	Daniel Wagner, stable

On Sun, Oct 30, 2016 at 06:28:58PM +0100, Yves-Alexis Perez wrote:
> On Sun, 2016-10-30 at 15:50 +0100, Yves-Alexis Perez wrote:
> > wait_for_completion_interruptible_timeout() return value is either
> > -ERESTARTSYS (in case it was interrupted), 0 (in case the timeout expired)
> > or the number of jiffies left until timeout. The return value is stored in
> > a long, but in _request_firmware_load() it's silently casted to an int,
> > which can overflow and give a negative value, indicating an error.
> > 
> > Fix this by re-using the timeout variable and only set retval when it's
> > safe.
> 
> I completely messed-up my git send-email (sorry), so the cover letter only
> went to LKML and I assume nobody read it. So just in case, I'm pasting it
> below because it gives some more explanation about how I tested this:

Please include the information below in the commit log.

> when trying to (ab)use the firmware loading interface in a local kernel module
> (in order to load the EEPROM content into a PCIE card), I noticed that the
> manual firmware loading interface (the one from
> /sys/class/firmware/<foo>/{loading,data}) was broken.
> 
> After instrumenting the kernel I noticed an issue with the way
> wait_for_completion_interruptible_timeout() is called in _request_firmware()
> and especially how the return value is handled: it's supposed to be a long, but
> here it's silently casted to an int and tested if positive.

This is certainly an issue and thanks for addressing this. Please Cc:
Daniel Wagner <wagi@monom.org> (and others I've added to this thread)
on a follow up patch with the commit log fixed up a bit more given Daniel
is working on replacing the call:

wait_for_completion_interruptible_timeout() with
swait_event_interruptible_timeout()

I checked and swait_event_interruptible_timeout() uses schedule_timeout(),
and this does not return a negative value, so this issue would not be
present there *but* -- Daniel  you do cast the value, please revise
your code with this consideration as well in your next patch set follow up.

> The initial timeout seems to be LONG_MAX 

No, the default timeout is 60, so 60 * HZ, unless you provided an override,
but the override is only possible via the syfs UMH interface and set via
timeout_store(), if you provide a value then its set on an int loading_timeout
variable and its supposed to represent seconds. simple_strtol() is used to parse the
value set, and note this is returns long, so a cast is done here as well,
the max value which will return a positive cast int you can set is 2147483647
(INT_MAX):

echo 2147483647 > /sys/class/firmware/timeout

Finally if the value provided in sysfs is negative or if the cast yields
a negative (anything greater than 2147483647) value then loading_timeout is
set to 0:

echo 2147483648 > /sys/class/firmware/timeout
cat /sys/class/firmware/timeout
0

In practice firmware_loading_timeout() is what we use to get the timeout
used for the UMH interface. This call will always check if the int value
loading_timeout is negative or 0 and if it is it sets loading_timeout to
MAX_JIFFY_OFFSET. Since timeout_store() forces the value to be 0 or positive
the negative check is never effective, so the only way to get MAX_JIFFY_OFFSET
is to either force a negative value parsed by simple_strtol() (when values
greater than INT_MAX are passed) or just setting the timeout to 0. When this
is done MAX_JIFFY_OFFSET is used and note this is:

#define MAX_JIFFY_OFFSET ((LONG_MAX >> 1)-1)

When we cast MAX_JIFFY_OFFSET to int we get -2, if we just used LONG_MAX
this would be cast to -1, so both are negative. In our current worst case we
end up with -2 set when the UMH timeout is overridden to force a timeout of
0 to be set when either a sysfs timeout value is set to value of 0 or
greater than INT_MAX.

Please amend your commit log with the above.

>(since it's a manual firmware loading you're supposed to
> have all the time you want to do it),

We have learned that this is stupid for a few reasons, some of this is
some drivers are not written properly and can stall certain processes in
the kernel when the firmware is not found. In the worst case kernels
with CONFIG_FW_LOADER_USER_HELPER_FALLBACK enabled will *always* use
the UMH fallback even if on synchronous calls, these days most distributions
fortunately disable CONFIG_FW_LOADER_USER_HELPER_FALLBACK as such only
explicit callers are using the UMH. We are currently working on
compartamentalizing the UMH [0] as it has proven to cause many headaches,
some other issues are still being discussed [1].

[0] https://lkml.kernel.org/r/1476957132-27818-1-git-send-email-wagi@monom.org
[1] https://lkml.kernel.org/r/20161108224726.GD13978@wotan.suse.de

> so the return value overflows the int.

The above explain how and when this can happen.

> Attached patch fixes the problem here, although there might be a better way. I
> tested it using lib/test_firmware.c kernel module, with the following patch to
> enable manual loading:
> 
> diff --git a/lib/test_firmware.c b/lib/test_firmware.c
> index a3e8ec3..01d333c 100644
> --- a/lib/test_firmware.c
> +++ b/lib/test_firmware.c
> @@ -105,7 +105,7 @@ static ssize_t trigger_async_request_store(struct device *dev,
>         mutex_lock(&test_fw_mutex);
>         release_firmware(test_firmware);
>         test_firmware = NULL;
> -       rc = request_firmware_nowait(THIS_MODULE, 1, name, dev, GFP_KERNEL,
> +       rc = request_firmware_nowait(THIS_MODULE, 0, name, dev, GFP_KERNEL,
>                                      NULL, trigger_async_request_cb);
>         if (rc) {
>                 pr_info("async load of '%s' failed: %d\n", name, rc);
> 
> Then load test_firmware and:
> 
> # echo -n test > /sys/devices/virtual/misc/test_firmware/trigger_async_request
> 
> In another terminal, do:
> 
> # echo -n 1 > /sys/class/firmware/test/loading
> # echo -n data > /sys/class/firmware/test/data
> # echo -n 0 > /sys/class/firmware/test/loading
> 
> Without the patch, the loading fails right after the "echo -n 0", with it the loading succeeds with:
> 
> [   96.405171] test_firmware: loaded: 4

Thanks for the fix, please start the commit log explaining what can cause the
issue and immediately the effect, leave the verbose explanation for later,
a kernel maintainer wants to easily get the impact / cases of the issue and
the severity of the issue. In this case the issue can only be caused by
a manual override as root to the timeout for the UMH to force it to 0, and
the only issue is a failed firmware request. Since not many are using the
UMH these days (only explicit callers on drivers that require it on most
distros at least), and since most distributions are not overriding the
timeout I don't think this is necessarily a critical issue for stable, I'll
let Greg decide. I know there was some recent talks over what goes into stable
and I think this is a good example case to consider where the line is grey.

As a maintainer I personally would avoid recommending this for stable, but
others may disagree. Cc'd enough folks so they can cherry pick on their own
kernel if this does not go to stable but they do want it in their kernel.

  Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-10-30 14:50 ` Yves-Alexis Perez
  2016-10-30 17:28   ` Yves-Alexis Perez
@ 2016-11-09 20:39   ` Luis R. Rodriguez
  2016-11-10 15:55     ` Greg Kroah-Hartman
  1 sibling, 1 reply; 15+ messages in thread
From: Luis R. Rodriguez @ 2016-11-09 20:39 UTC (permalink / raw)
  To: Yves-Alexis Perez
  Cc: linux-kernel, Yves-Alexis Perez, Ming Lei, Luis R. Rodriguez,
	Greg Kroah-Hartman, Johannes Berg, Jouni Malinen, Kees Cook,
	Jiri Kosina, Jiri Slaby, Tom Gundersen, Kay Sievers, Josh Boyer,
	Dmitry Torokhov, Andy Lutomirski, Harald Hoyer, Seth Forshee,
	Bjorn Andersson, Daniel Wagner, stable

On Sun, Oct 30, 2016 at 03:50:48PM +0100, Yves-Alexis Perez wrote:
> From: Yves-Alexis Perez <corsac@debian.org>
> 
> wait_for_completion_interruptible_timeout() return value is either
> -ERESTARTSYS (in case it was interrupted), 0 (in case the timeout expired)
> or the number of jiffies left until timeout. The return value is stored in
> a long, but in _request_firmware_load() it's silently casted to an int,
> which can overflow and give a negative value, indicating an error.
> 
> Fix this by re-using the timeout variable and only set retval when it's
> safe.

Please amend the commit log as I noted in the previous response, and
resend.

> Signed-off-by: Yves-Alexis Perez <corsac@corsac.net>
> Cc: Ming Lei <ming.lei@canonical.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Other than the commit log you can add on you resend:

Acked-by: Luis R. Rodriguez.

Modulo I don't personally thing this this is sable material but I'll let
Greg decide.

  Luis

> Cc: stable@vger.kernel.org
> 
> ---
>  drivers/base/firmware_class.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index 22d1760..a95e1e5 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -955,13 +955,14 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
>  		timeout = MAX_JIFFY_OFFSET;
>  	}
>  
> -	retval = wait_for_completion_interruptible_timeout(&buf->completion,
> +	timeout = wait_for_completion_interruptible_timeout(&buf->completion,
>  			timeout);
> -	if (retval == -ERESTARTSYS || !retval) {
> +	if (timeout == -ERESTARTSYS || !timeout) {
> +		retval = timeout;
>  		mutex_lock(&fw_lock);
>  		fw_load_abort(fw_priv);
>  		mutex_unlock(&fw_lock);
> -	} else if (retval > 0) {
> +	} else if (timeout > 0) {
>  		retval = 0;
>  	}
>  
> -- 
> 2.10.1
> 
> 

-- 
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-09 20:36     ` Luis R. Rodriguez
@ 2016-11-09 22:02       ` Luis R. Rodriguez
  2016-11-10  6:57         ` Yves-Alexis Perez
  0 siblings, 1 reply; 15+ messages in thread
From: Luis R. Rodriguez @ 2016-11-09 22:02 UTC (permalink / raw)
  To: Luis R. Rodriguez, Matt Domsch, Fengguang Wu, Richard Purdie,
	Jacek Anaszewski, linux-leds, Abhay Salunke, David Woodhouse
  Cc: Yves-Alexis Perez, linux-kernel, Ming Lei, Greg Kroah-Hartman,
	Johannes Berg, Jouni Malinen, Kees Cook, Jiri Kosina, Jiri Slaby,
	Tom Gundersen, Kay Sievers, Josh Boyer, Dmitry Torokhov,
	Andy Lutomirski, Harald Hoyer, Seth Forshee, Bjorn Andersson,
	Linus Torvalds, Daniel Wagner, stable

On Wed, Nov 09, 2016 at 09:36:56PM +0100, Luis R. Rodriguez wrote:
> On Sun, Oct 30, 2016 at 06:28:58PM +0100, Yves-Alexis Perez wrote:
> > On Sun, 2016-10-30 at 15:50 +0100, Yves-Alexis Perez wrote:
> > > wait_for_completion_interruptible_timeout() return value is either
> > > -ERESTARTSYS (in case it was interrupted), 0 (in case the timeout expired)
> > > or the number of jiffies left until timeout. The return value is stored in
> > > a long, but in _request_firmware_load() it's silently casted to an int,
> > > which can overflow and give a negative value, indicating an error.
> > > 
> > > Fix this by re-using the timeout variable and only set retval when it's
> > > safe.
> > 
> > I completely messed-up my git send-email (sorry), so the cover letter only
> > went to LKML and I assume nobody read it. So just in case, I'm pasting it
> > below because it gives some more explanation about how I tested this:
> 
> Please include the information below in the commit log.
> 
> > when trying to (ab)use the firmware loading interface in a local kernel module
> > (in order to load the EEPROM content into a PCIE card), I noticed that the
> > manual firmware loading interface (the one from
> > /sys/class/firmware/<foo>/{loading,data}) was broken.
> > 
> > After instrumenting the kernel I noticed an issue with the way
> > wait_for_completion_interruptible_timeout() is called in _request_firmware()
> > and especially how the return value is handled: it's supposed to be a long, but
> > here it's silently casted to an int and tested if positive.
> 
> This is certainly an issue and thanks for addressing this. Please Cc:
> Daniel Wagner <wagi@monom.org> (and others I've added to this thread)
> on a follow up patch with the commit log fixed up a bit more given Daniel
> is working on replacing the call:
> 
> wait_for_completion_interruptible_timeout() with
> swait_event_interruptible_timeout()
> 
> I checked and swait_event_interruptible_timeout() uses schedule_timeout(),
> and this does not return a negative value, so this issue would not be
> present there *but* -- Daniel  you do cast the value, please revise
> your code with this consideration as well in your next patch set follow up.
> 
> > The initial timeout seems to be LONG_MAX 
> 
> No, the default timeout is 60, so 60 * HZ, unless you provided an override,
> but the override is only possible via the syfs UMH interface and set via
> timeout_store(), if you provide a value then its set on an int loading_timeout
> variable and its supposed to represent seconds. simple_strtol() is used to parse the
> value set, and note this is returns long, so a cast is done here as well,
> the max value which will return a positive cast int you can set is 2147483647
> (INT_MAX):
> 
> echo 2147483647 > /sys/class/firmware/timeout
> 
> Finally if the value provided in sysfs is negative or if the cast yields
> a negative (anything greater than 2147483647) value then loading_timeout is
> set to 0:
> 
> echo 2147483648 > /sys/class/firmware/timeout
> cat /sys/class/firmware/timeout
> 0
> 
> In practice firmware_loading_timeout() is what we use to get the timeout
> used for the UMH interface. This call will always check if the int value
> loading_timeout is negative or 0 and if it is it sets loading_timeout to
> MAX_JIFFY_OFFSET. Since timeout_store() forces the value to be 0 or positive
> the negative check is never effective, so the only way to get MAX_JIFFY_OFFSET
> is to either force a negative value parsed by simple_strtol() (when values
> greater than INT_MAX are passed) or just setting the timeout to 0. When this
> is done MAX_JIFFY_OFFSET is used and note this is:
> 
> #define MAX_JIFFY_OFFSET ((LONG_MAX >> 1)-1)
> 
> When we cast MAX_JIFFY_OFFSET to int we get -2, if we just used LONG_MAX
> this would be cast to -1, so both are negative. In our current worst case we
> end up with -2 set when the UMH timeout is overridden to force a timeout of
> 0 to be set when either a sysfs timeout value is set to value of 0 or
> greater than INT_MAX.

Actually... we also set the timeout to MAX_JIFFY_OFFSET when FW_OPT_UEVENT is
not set. This happens *only* when the UMH was explicitly requested on the
async call, when the second argument to request_firmware_nowait() is false.
There are *only* two callers that do this in the kernel. If this is correct *and*
the assessment of the cast from long to int here is correct that should mean
these two callers in the kernel that have always requested the UMH firmware
*fallback* have *always* had a faulty UMH fallback return value...

The two drivers would be:

drivers/firmware/dell_rbu.c
drivers/leds/leds-lp55xx-common.c

Can the maintainers of these two drivers comment on the below..

If this really has been broken for ages and since we are trying to minimize
more use of the UMH I wonder if fixing it would set us back in compartamentalizing
the UMH further. This bug discover and the current situation with the UMH udev
helper (no longer in systemd) makes me inclined to just recommend for us seriously
to rip this out completely. Note that there are two uses for the explicit firmware
UMH fallback -- one where the uevent is set (Johannes presumably this is your use
case that you are interested in) and one where it is not set. You *can* request
firmware with the uevent set (second argument to request_firmware_nowait() is true,
but run in a kernel with CONFIG_FW_LOADER_USER_HELPER_FALLBACK set (not all distros).

Does Android enable CONFIG_FW_LOADER_USER_HELPER_FALLBACK ? Johannes were you relying
on CONFIG_FW_LOADER_USER_HELPER_FALLBACK and the uevent set for your use case you
had in mind ?

Has this really been broken for ages???! Note that the current test driver
never tested it...

  Luis

> Please amend your commit log with the above.
> 
> >(since it's a manual firmware loading you're supposed to
> > have all the time you want to do it),
> 
> We have learned that this is stupid for a few reasons, some of this is
> some drivers are not written properly and can stall certain processes in
> the kernel when the firmware is not found. In the worst case kernels
> with CONFIG_FW_LOADER_USER_HELPER_FALLBACK enabled will *always* use
> the UMH fallback even if on synchronous calls, these days most distributions
> fortunately disable CONFIG_FW_LOADER_USER_HELPER_FALLBACK as such only
> explicit callers are using the UMH. We are currently working on
> compartamentalizing the UMH [0] as it has proven to cause many headaches,
> some other issues are still being discussed [1].
> 
> [0] https://lkml.kernel.org/r/1476957132-27818-1-git-send-email-wagi@monom.org
> [1] https://lkml.kernel.org/r/20161108224726.GD13978@wotan.suse.de
> 
> > so the return value overflows the int.
> 
> The above explain how and when this can happen.
> 
> > Attached patch fixes the problem here, although there might be a better way. I
> > tested it using lib/test_firmware.c kernel module, with the following patch to
> > enable manual loading:
> > 
> > diff --git a/lib/test_firmware.c b/lib/test_firmware.c
> > index a3e8ec3..01d333c 100644
> > --- a/lib/test_firmware.c
> > +++ b/lib/test_firmware.c
> > @@ -105,7 +105,7 @@ static ssize_t trigger_async_request_store(struct device *dev,
> >         mutex_lock(&test_fw_mutex);
> >         release_firmware(test_firmware);
> >         test_firmware = NULL;
> > -       rc = request_firmware_nowait(THIS_MODULE, 1, name, dev, GFP_KERNEL,
> > +       rc = request_firmware_nowait(THIS_MODULE, 0, name, dev, GFP_KERNEL,
> >                                      NULL, trigger_async_request_cb);
> >         if (rc) {
> >                 pr_info("async load of '%s' failed: %d\n", name, rc);
> > 
> > Then load test_firmware and:
> > 
> > # echo -n test > /sys/devices/virtual/misc/test_firmware/trigger_async_request
> > 
> > In another terminal, do:
> > 
> > # echo -n 1 > /sys/class/firmware/test/loading
> > # echo -n data > /sys/class/firmware/test/data
> > # echo -n 0 > /sys/class/firmware/test/loading
> > 
> > Without the patch, the loading fails right after the "echo -n 0", with it the loading succeeds with:
> > 
> > [   96.405171] test_firmware: loaded: 4
> 
> Thanks for the fix, please start the commit log explaining what can cause the
> issue and immediately the effect, leave the verbose explanation for later,
> a kernel maintainer wants to easily get the impact / cases of the issue and
> the severity of the issue. In this case the issue can only be caused by
> a manual override as root to the timeout for the UMH to force it to 0, and
> the only issue is a failed firmware request. Since not many are using the
> UMH these days (only explicit callers on drivers that require it on most
> distros at least), and since most distributions are not overriding the
> timeout I don't think this is necessarily a critical issue for stable, I'll
> let Greg decide. I know there was some recent talks over what goes into stable
> and I think this is a good example case to consider where the line is grey.
> 
> As a maintainer I personally would avoid recommending this for stable, but
> others may disagree. Cc'd enough folks so they can cherry pick on their own
> kernel if this does not go to stable but they do want it in their kernel.
> 
>   Luis
> 

-- 
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-09 22:02       ` Luis R. Rodriguez
@ 2016-11-10  6:57         ` Yves-Alexis Perez
  0 siblings, 0 replies; 15+ messages in thread
From: Yves-Alexis Perez @ 2016-11-10  6:57 UTC (permalink / raw)
  To: Luis R. Rodriguez, Matt Domsch, Fengguang Wu, Richard Purdie,
	Jacek Anaszewski, linux-leds, Abhay Salunke, David Woodhouse
  Cc: linux-kernel, Ming Lei, Greg Kroah-Hartman, Johannes Berg,
	Jouni Malinen, Kees Cook, Jiri Kosina, Jiri Slaby, Tom Gundersen,
	Kay Sievers, Josh Boyer, Dmitry Torokhov, Andy Lutomirski,
	Harald Hoyer, Seth Forshee, Bjorn Andersson, Linus Torvalds,
	Daniel Wagner, stable

[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

On Wed, 2016-11-09 at 23:02 +0100, Luis R. Rodriguez wrote:
> Actually... we also set the timeout to MAX_JIFFY_OFFSET when FW_OPT_UEVENT is
> not set. This happens *only* when the UMH was explicitly requested on the
> async call, when the second argument to request_firmware_nowait() is false.
> There are *only* two callers that do this in the kernel. If this is correct *and*
> the assessment of the cast from long to int here is correct that should mean
> these two callers in the kernel that have always requested the UMH firmware
> *fallback* have *always* had a faulty UMH fallback return value...

Thanks for all your mails. Yes, that was my case here, and I noticed it looked
broken for a long time, I didn't investigate who was using it in the kernel
but I had the feeling it was a bit of an abuse of the infrastructure. I still
reported the bug because the quick fix looked OK but I can understand the will
to actually rethink if and how this part is really needed.

Maybe I need to wait a bit before resubmitting any patch with rephrased commit
log to see where this is going?

Regards,
-- 
Yves-Alexis

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-09 20:39   ` Luis R. Rodriguez
@ 2016-11-10 15:55     ` Greg Kroah-Hartman
  2016-11-10 16:07       ` Luis R. Rodriguez
  0 siblings, 1 reply; 15+ messages in thread
From: Greg Kroah-Hartman @ 2016-11-10 15:55 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Yves-Alexis Perez, linux-kernel, Yves-Alexis Perez, Ming Lei,
	Johannes Berg, Jouni Malinen, Kees Cook, Jiri Kosina, Jiri Slaby,
	Tom Gundersen, Kay Sievers, Josh Boyer, Dmitry Torokhov,
	Andy Lutomirski, Harald Hoyer, Seth Forshee, Bjorn Andersson,
	Daniel Wagner, stable

On Wed, Nov 09, 2016 at 09:39:21PM +0100, Luis R. Rodriguez wrote:
> On Sun, Oct 30, 2016 at 03:50:48PM +0100, Yves-Alexis Perez wrote:
> > From: Yves-Alexis Perez <corsac@debian.org>
> > 
> > wait_for_completion_interruptible_timeout() return value is either
> > -ERESTARTSYS (in case it was interrupted), 0 (in case the timeout expired)
> > or the number of jiffies left until timeout. The return value is stored in
> > a long, but in _request_firmware_load() it's silently casted to an int,
> > which can overflow and give a negative value, indicating an error.
> > 
> > Fix this by re-using the timeout variable and only set retval when it's
> > safe.
> 
> Please amend the commit log as I noted in the previous response, and
> resend.
> 
> > Signed-off-by: Yves-Alexis Perez <corsac@corsac.net>
> > Cc: Ming Lei <ming.lei@canonical.com>
> > Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Other than the commit log you can add on you resend:
> 
> Acked-by: Luis R. Rodriguez.
> 
> Modulo I don't personally thing this this is sable material but I'll let
> Greg decide.

Does it fix a regression?  A reported issue with an older kernel version
that people have hit?  It shouldn't be hard to figure out if a patch
should be in stable or not...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-10 15:55     ` Greg Kroah-Hartman
@ 2016-11-10 16:07       ` Luis R. Rodriguez
  2016-11-10 18:52         ` Bjorn Andersson
  2016-11-10 19:18         ` Greg Kroah-Hartman
  0 siblings, 2 replies; 15+ messages in thread
From: Luis R. Rodriguez @ 2016-11-10 16:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Yves-Alexis Perez, linux-kernel, Yves-Alexis Perez, Ming Lei,
	Johannes Berg, Jouni Malinen, Kees Cook, Jiri Kosina, Jiri Slaby,
	Tom Gundersen, Kay Sievers, Josh Boyer, Dmitry Torokhov,
	Andy Lutomirski, Harald Hoyer, Seth Forshee, Bjorn Andersson,
	Daniel Wagner, 4.2+

On Thu, Nov 10, 2016 at 7:55 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Wed, Nov 09, 2016 at 09:39:21PM +0100, Luis R. Rodriguez wrote:
>> On Sun, Oct 30, 2016 at 03:50:48PM +0100, Yves-Alexis Perez wrote:
>> > From: Yves-Alexis Perez <corsac@debian.org>
>> >
>> > wait_for_completion_interruptible_timeout() return value is either
>> > -ERESTARTSYS (in case it was interrupted), 0 (in case the timeout expired)
>> > or the number of jiffies left until timeout. The return value is stored in
>> > a long, but in _request_firmware_load() it's silently casted to an int,
>> > which can overflow and give a negative value, indicating an error.
>> >
>> > Fix this by re-using the timeout variable and only set retval when it's
>> > safe.
>>
>> Please amend the commit log as I noted in the previous response, and
>> resend.
>>
>> > Signed-off-by: Yves-Alexis Perez <corsac@corsac.net>
>> > Cc: Ming Lei <ming.lei@canonical.com>
>> > Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
>> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> Other than the commit log you can add on you resend:
>>
>> Acked-by: Luis R. Rodriguez.
>>
>> Modulo I don't personally thing this this is sable material but I'll let
>> Greg decide.
>
> Does it fix a regression?

Not that I am aware of, but if you consider the reported the developer then yes.

> A reported issue with an older kernel version
> that people have hit?

Definitely not.

>  It shouldn't be hard to figure out if a patch should be in stable or not...

Well with the only caveat now that I am suggesting we consider remove
this logic completely as only 2 drivers were using it explicitly
(second argument to request_firmware_nowait() set to false), it seems
they had good reasons for it but ... this has been broken for ages and
we seem to be happy to compartamentalize the UMH further, its unclear
why we would want to expand and "fix" that instead of just removing
crap that never worked. Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-10 16:07       ` Luis R. Rodriguez
@ 2016-11-10 18:52         ` Bjorn Andersson
  2016-11-10 19:48           ` Luis R. Rodriguez
  2016-11-10 19:18         ` Greg Kroah-Hartman
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Andersson @ 2016-11-10 18:52 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Greg Kroah-Hartman, Yves-Alexis Perez, linux-kernel,
	Yves-Alexis Perez, Ming Lei, Johannes Berg, Jouni Malinen,
	Kees Cook, Jiri Kosina, Jiri Slaby, Tom Gundersen, Kay Sievers,
	Josh Boyer, Dmitry Torokhov, Andy Lutomirski, Harald Hoyer,
	Seth Forshee, Daniel Wagner, 4.2+

On Thu 10 Nov 08:07 PST 2016, Luis R. Rodriguez wrote:

> On Thu, Nov 10, 2016 at 7:55 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Wed, Nov 09, 2016 at 09:39:21PM +0100, Luis R. Rodriguez wrote:
> >> On Sun, Oct 30, 2016 at 03:50:48PM +0100, Yves-Alexis Perez wrote:
> >> > From: Yves-Alexis Perez <corsac@debian.org>
> >> >
> >> > wait_for_completion_interruptible_timeout() return value is either
> >> > -ERESTARTSYS (in case it was interrupted), 0 (in case the timeout expired)
> >> > or the number of jiffies left until timeout. The return value is stored in
> >> > a long, but in _request_firmware_load() it's silently casted to an int,
> >> > which can overflow and give a negative value, indicating an error.
> >> >
> >> > Fix this by re-using the timeout variable and only set retval when it's
> >> > safe.
> >>
> >> Please amend the commit log as I noted in the previous response, and
> >> resend.
> >>
> >> > Signed-off-by: Yves-Alexis Perez <corsac@corsac.net>
> >> > Cc: Ming Lei <ming.lei@canonical.com>
> >> > Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> >> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>
> >> Other than the commit log you can add on you resend:
> >>
> >> Acked-by: Luis R. Rodriguez.
> >>
> >> Modulo I don't personally thing this this is sable material but I'll let
> >> Greg decide.
> >
> > Does it fix a regression?
> 

Yes

> Not that I am aware of, but if you consider the reported the developer
> then yes.
> 

I haven't verified that this particular use case actually worked before,
but this code works with lower timeout values (e.g. 60 in the fallback
case), so this looks isolated.

The bug was clearly introduced in v4.0 by:

68ff2a00dbf5 "firmware_loader: handle timeout via wait_for_completion_interruptible_timeout()"

So please add a Fixes: and

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

> > A reported issue with an older kernel version
> > that people have hit?
> 
> Definitely not.
> 
> >  It shouldn't be hard to figure out if a patch should be in stable or not...
> 
> Well with the only caveat now that I am suggesting we consider remove
> this logic completely as only 2 drivers were using it explicitly
> (second argument to request_firmware_nowait() set to false), it seems
> they had good reasons for it but ... this has been broken for ages and
> we seem to be happy to compartamentalize the UMH further, its unclear
> why we would want to expand and "fix" that instead of just removing
> crap that never worked. Thoughts?
> 

Please Luis, just stop your crusade on this code. You're grasping at
every straw of opportunity to get this code out of the kernel, but it
has not been broken for ages, it works just fine and it is ABI.

I'm very concerned about your mission to to "compartamentalize" this
code when you're so certain that it's "broken crap".

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-10 16:07       ` Luis R. Rodriguez
  2016-11-10 18:52         ` Bjorn Andersson
@ 2016-11-10 19:18         ` Greg Kroah-Hartman
  2016-11-10 19:49           ` Luis R. Rodriguez
  1 sibling, 1 reply; 15+ messages in thread
From: Greg Kroah-Hartman @ 2016-11-10 19:18 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Yves-Alexis Perez, linux-kernel, Yves-Alexis Perez, Ming Lei,
	Johannes Berg, Jouni Malinen, Kees Cook, Jiri Kosina, Jiri Slaby,
	Tom Gundersen, Kay Sievers, Josh Boyer, Dmitry Torokhov,
	Andy Lutomirski, Harald Hoyer, Seth Forshee, Bjorn Andersson,
	Daniel Wagner, 4.2+

On Thu, Nov 10, 2016 at 08:07:58AM -0800, Luis R. Rodriguez wrote:
> >  It shouldn't be hard to figure out if a patch should be in stable or not...
> 
> Well with the only caveat now that I am suggesting we consider remove
> this logic completely as only 2 drivers were using it explicitly
> (second argument to request_firmware_nowait() set to false), it seems
> they had good reasons for it but ... this has been broken for ages and
> we seem to be happy to compartamentalize the UMH further, its unclear
> why we would want to expand and "fix" that instead of just removing
> crap that never worked. Thoughts?

Why would you want to remove stuff that works and people rely on?  Don't
be foolish, you know we can't do that...

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-10 18:52         ` Bjorn Andersson
@ 2016-11-10 19:48           ` Luis R. Rodriguez
  2016-11-10 21:04             ` Yves-Alexis Perez
  0 siblings, 1 reply; 15+ messages in thread
From: Luis R. Rodriguez @ 2016-11-10 19:48 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Greg Kroah-Hartman, Yves-Alexis Perez, linux-kernel,
	Yves-Alexis Perez, Ming Lei, Johannes Berg, Jouni Malinen,
	Kees Cook, Jiri Kosina, Jiri Slaby, Tom Gundersen, Kay Sievers,
	Josh Boyer, Dmitry Torokhov, Andy Lutomirski, Harald Hoyer,
	Seth Forshee, Daniel Wagner, 4.2+

On Thu, Nov 10, 2016 at 10:52 AM, Bjorn Andersson
<bjorn.andersson@linaro.org> wrote:
> On Thu 10 Nov 08:07 PST 2016, Luis R. Rodriguez wrote:
>
>> On Thu, Nov 10, 2016 at 7:55 AM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> > On Wed, Nov 09, 2016 at 09:39:21PM +0100, Luis R. Rodriguez wrote:
>> >> On Sun, Oct 30, 2016 at 03:50:48PM +0100, Yves-Alexis Perez wrote:
>> >> > From: Yves-Alexis Perez <corsac@debian.org>
>> >> >
>> >> > wait_for_completion_interruptible_timeout() return value is either
>> >> > -ERESTARTSYS (in case it was interrupted), 0 (in case the timeout expired)
>> >> > or the number of jiffies left until timeout. The return value is stored in
>> >> > a long, but in _request_firmware_load() it's silently casted to an int,
>> >> > which can overflow and give a negative value, indicating an error.
>> >> >
>> >> > Fix this by re-using the timeout variable and only set retval when it's
>> >> > safe.
>> >>
>> >> Please amend the commit log as I noted in the previous response, and
>> >> resend.
>> >>
>> >> > Signed-off-by: Yves-Alexis Perez <corsac@corsac.net>
>> >> > Cc: Ming Lei <ming.lei@canonical.com>
>> >> > Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
>> >> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> >>
>> >> Other than the commit log you can add on you resend:
>> >>
>> >> Acked-by: Luis R. Rodriguez.
>> >>
>> >> Modulo I don't personally thing this this is sable material but I'll let
>> >> Greg decide.
>> >
>> > Does it fix a regression?
>>
>
> Yes
>
>> Not that I am aware of, but if you consider the reported the developer
>> then yes.
>>
>
> I haven't verified that this particular use case actually worked before,
> but this code works with lower timeout values (e.g. 60 in the fallback
> case), so this looks isolated.

This is true, but as I noted the broken aspect was when the timeout
was set to the max value.

> The bug was clearly introduced in v4.0 by:
>
> 68ff2a00dbf5 "firmware_loader: handle timeout via wait_for_completion_interruptible_timeout()"
>
> So please add a Fixes: and
>
> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

This I agree with, thanks for that, and because of this then:

Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>

And because of this do recommend it for stable. I would still prefer
at least a new re-submit with the respected tags and a changed commit
log describing the reason for the fix, how the cast is an issue
exactly, and how this is a regression.

>> > A reported issue with an older kernel version
>> > that people have hit?
>>
>> Definitely not.
>>
>> >  It shouldn't be hard to figure out if a patch should be in stable or not...
>>
>> Well with the only caveat now that I am suggesting we consider remove
>> this logic completely as only 2 drivers were using it explicitly
>> (second argument to request_firmware_nowait() set to false), it seems
>> they had good reasons for it but ... this has been broken for ages and
>> we seem to be happy to compartamentalize the UMH further, its unclear
>> why we would want to expand and "fix" that instead of just removing
>> crap that never worked. Thoughts?
>>
>
> Please Luis, just stop your crusade on this code. You're grasping at
> every straw of opportunity to get this code out of the kernel,

No, I'm pointing out valid issues the code has had historically and
things folks had not realized. I already knew we could not get rid of
it, but if this was *not* a regression and if this was broken always
then clearly it was something worth considering to just remove. But as
you note, its a regression. Thanks for identifying that.

> but it
> has not been broken for ages, it works just fine and it is ABI.

Agreed.

> I'm very concerned about your mission to to "compartamentalize" this
> code when you're so certain that it's "broken crap".

Well the firmware UMH fallback code is craptastic code, use at your own risk.

 Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-10 19:18         ` Greg Kroah-Hartman
@ 2016-11-10 19:49           ` Luis R. Rodriguez
  0 siblings, 0 replies; 15+ messages in thread
From: Luis R. Rodriguez @ 2016-11-10 19:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Yves-Alexis Perez, linux-kernel, Yves-Alexis Perez, Ming Lei,
	Johannes Berg, Jouni Malinen, Kees Cook, Jiri Kosina, Jiri Slaby,
	Tom Gundersen, Kay Sievers, Josh Boyer, Dmitry Torokhov,
	Andy Lutomirski, Harald Hoyer, Seth Forshee, Bjorn Andersson,
	Daniel Wagner, 4.2+

On Thu, Nov 10, 2016 at 11:18 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Thu, Nov 10, 2016 at 08:07:58AM -0800, Luis R. Rodriguez wrote:
>> >  It shouldn't be hard to figure out if a patch should be in stable or not...
>>
>> Well with the only caveat now that I am suggesting we consider remove
>> this logic completely as only 2 drivers were using it explicitly
>> (second argument to request_firmware_nowait() set to false), it seems
>> they had good reasons for it but ... this has been broken for ages and
>> we seem to be happy to compartamentalize the UMH further, its unclear
>> why we would want to expand and "fix" that instead of just removing
>> crap that never worked. Thoughts?
>
> Why would you want to remove stuff that works and people rely on?  Don't
> be foolish, you know we can't do that...

I was not advocating removing it if it had worked, I advocated
removing it if it truly was something that never worked.

 Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-10 19:48           ` Luis R. Rodriguez
@ 2016-11-10 21:04             ` Yves-Alexis Perez
  2016-11-10 21:22               ` Luis R. Rodriguez
  0 siblings, 1 reply; 15+ messages in thread
From: Yves-Alexis Perez @ 2016-11-10 21:04 UTC (permalink / raw)
  To: Luis R. Rodriguez, Bjorn Andersson
  Cc: Greg Kroah-Hartman, linux-kernel, Ming Lei, Johannes Berg,
	Jouni Malinen, Kees Cook, Jiri Kosina, Jiri Slaby, Tom Gundersen,
	Kay Sievers, Josh Boyer, Dmitry Torokhov, Andy Lutomirski,
	Harald Hoyer, Seth Forshee, Daniel Wagner, 4.2+

On Thu, 2016-11-10 at 11:48 -0800, Luis R. Rodriguez wrote:
> > I haven't verified that this particular use case actually worked before,
> > but this code works with lower timeout values (e.g. 60 in the fallback
> > case), so this looks isolated.
> 
> This is true, but as I noted the broken aspect was when the timeout
> was set to the max value.
> 
> > The bug was clearly introduced in v4.0 by:
> > 
> > 68ff2a00dbf5 "firmware_loader: handle timeout via
> > wait_for_completion_interruptible_timeout()"
> > 
> > So please add a Fixes: and
> > 
> > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> 
> This I agree with, thanks for that, and because of this then:
> 
> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
> 
> And because of this do recommend it for stable. I would still prefer
> at least a new re-submit with the respected tags and a changed commit
> log describing the reason for the fix, how the cast is an issue
> exactly, and how this is a regression.

Hi all,

I'm still slightly confused, but I can certainly re-submit it. I've added the
CC: stable because I first experienced it on a stable box, but indeed it was
during coding a kernel driver so it might not be what's expected in
“regression” here.

I'll try to collect the tag and rewrite the commit log, then re-submit,
hopefully not messing with git send-email this time.

Regards,
-- 
Yves-Alexis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] firmware: fix async/manual firmware loading
  2016-11-10 21:04             ` Yves-Alexis Perez
@ 2016-11-10 21:22               ` Luis R. Rodriguez
  0 siblings, 0 replies; 15+ messages in thread
From: Luis R. Rodriguez @ 2016-11-10 21:22 UTC (permalink / raw)
  To: Yves-Alexis Perez
  Cc: Bjorn Andersson, Greg Kroah-Hartman, linux-kernel, Ming Lei,
	Johannes Berg, Jouni Malinen, Kees Cook, Jiri Kosina, Jiri Slaby,
	Tom Gundersen, Kay Sievers, Josh Boyer, Dmitry Torokhov,
	Andy Lutomirski, Harald Hoyer, Seth Forshee, Daniel Wagner, 4.2+

On Thu, Nov 10, 2016 at 1:04 PM, Yves-Alexis Perez <corsac@corsac.net> wrote:
> On Thu, 2016-11-10 at 11:48 -0800, Luis R. Rodriguez wrote:
>> > I haven't verified that this particular use case actually worked before,
>> > but this code works with lower timeout values (e.g. 60 in the fallback
>> > case), so this looks isolated.
>>
>> This is true, but as I noted the broken aspect was when the timeout
>> was set to the max value.
>>
>> > The bug was clearly introduced in v4.0 by:
>> >
>> > 68ff2a00dbf5 "firmware_loader: handle timeout via
>> > wait_for_completion_interruptible_timeout()"
>> >
>> > So please add a Fixes: and
>> >
>> > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
>>
>> This I agree with, thanks for that, and because of this then:
>>
>> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
>>
>> And because of this do recommend it for stable. I would still prefer
>> at least a new re-submit with the respected tags and a changed commit
>> log describing the reason for the fix, how the cast is an issue
>> exactly, and how this is a regression.
>
> Hi all,
>
> I'm still slightly confused, but I can certainly re-submit it. I've added the
> CC: stable because I first experienced it on a stable box, but indeed it was
> during coding a kernel driver so it might not be what's expected in
> “regression” here.

If you look at the code added on commit 68ff2a00dbf5
("firmware_loader: handle timeout via
wait_for_completion_interruptible_timeout()") by Ming you'll see that
the else statement setting the timeout to MAX_JIFFY_OFFSET for the
non-udev event was only added there, so this commit caused the cast to
fail, as such this should be the commit that introduced the issue.

Then

git describe --contains 68ff2a00dbf5
v4.0-rc1~83^2~1

So this was added as of v4.0, so the regression appeared first as of
v4.0. You could try v3.19 or just revert this commit to verify if this
would have fixed the issue you are reporting to confirm this indeed
was the regression, but upon code examination it seems to be the case.

> I'll try to collect the tag and rewrite the commit log, then re-submit,
> hopefully not messing with git send-email this time.

Thanks!

  Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-11-10 21:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-30 14:50 [PATCH] firmware: fix async/manual firmware loading Yves-Alexis Perez
2016-10-30 14:50 ` Yves-Alexis Perez
2016-10-30 17:28   ` Yves-Alexis Perez
2016-11-09 20:36     ` Luis R. Rodriguez
2016-11-09 22:02       ` Luis R. Rodriguez
2016-11-10  6:57         ` Yves-Alexis Perez
2016-11-09 20:39   ` Luis R. Rodriguez
2016-11-10 15:55     ` Greg Kroah-Hartman
2016-11-10 16:07       ` Luis R. Rodriguez
2016-11-10 18:52         ` Bjorn Andersson
2016-11-10 19:48           ` Luis R. Rodriguez
2016-11-10 21:04             ` Yves-Alexis Perez
2016-11-10 21:22               ` Luis R. Rodriguez
2016-11-10 19:18         ` Greg Kroah-Hartman
2016-11-10 19:49           ` Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).