linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]   ` <CALCETrXUrirO-vg3M+MGhn=0gZTwx0phsRDS4TCwWWgNYC6RsA@mail.gmail.com>
@ 2017-05-24 22:38     ` Luis R. Rodriguez
  2017-05-25  4:13       ` Andy Lutomirski
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-05-24 22:38 UTC (permalink / raw)
  To: Andy Lutomirski, Michael Kerrisk (man-pages), linux-api, Peter Zijlstra
  Cc: Greg KH, Daniel Wagner, David Woodhouse, jewalt, rafal,
	Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull,
	Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach,
	Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook,
	AKASHI Takahiro, David Howells, Peter Jones, Hans

On Wed, May 24, 2017 at 3:00 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Wed, May 24, 2017 at 2:40 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> From: Martin Fuzzey <mfuzzey@parkeon.com>
>>
>> Commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
>> is interrupted") added via 4.0 added support to abort the fallback mechanism
>> when a signal was detected and wait_for_completion_interruptible() returned
>> -ERESTARTSYS. Although the abort was effective we were unfortunately never
>> really propagating this error though and as such userspace could not know
>> why the abort happened.
>
> Can you give a simple example of what's going on and why it matters?
>
> ERESTARTSYS and friends are highly magical, and I'm not convinced that
> allowing _request_firmware_load to return -ERESTARTSYS is actually a
> good idea.  What if there are system calls that can't handle this
> style of restart that start being restarted as a result?

This seems to be a linux-api question, so Cc'ing them and Michael.

For those not familiar it is worth explaining first the user interface.

This describes the fallback mechanism of the Linux firmware API if
direct filesystem lookup fails. Although most distros disable this
stuff today some distros (Android) seem to be always relying on it
today. Other than Android, since most distros have the forced fallback
mechanism off but *enable* requests to require it, only 2 upstream
drivers do this, so on other distros we only have 2 drivers which
typically require the the fallback mechanism.

The exposed user interface here is we enable a knob through sysfs to
enable userspace to write a file as a fallback mechanism then, and we
expect to get woken up today when userspace finds the needed file and
then writes the file to the sysfs knob. We wait for userspace using
swait_event_interruptible_timeout(). While we wait we can get a
-ERESTARTSYS since swait_event_interruptible_timeout() uses
__swait_event_interruptible_timeout() under the hood and this in turn
___swait_event() which can prepare_to_swait_event() which can return
-ERESTARTSYS on signal_pending_state().

The issue discovered was that Android could issue SIGCHLD and the
waiter gets a signal but the reason for the exact reason for the
failure is not propagated. The proposed patch propagates -ERESTARTSYS
when that is returned on signal_pending_state() as we wait.

So linux-api folks please speak up.

 Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-05-24 22:38     ` [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback Luis R. Rodriguez
@ 2017-05-25  4:13       ` Andy Lutomirski
       [not found]         ` <CALCETrU4__YUGk36PN=FbuEf0SBaTrxQQqm4sWs2NrZ+6WN7jA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2017-05-25  4:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo,
	Linus Torvalds

On Wed, May 24, 2017 at 3:38 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Wed, May 24, 2017 at 3:00 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Wed, May 24, 2017 at 2:40 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>>> From: Martin Fuzzey <mfuzzey@parkeon.com>
>>>
>>> Commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
>>> is interrupted") added via 4.0 added support to abort the fallback mechanism
>>> when a signal was detected and wait_for_completion_interruptible() returned
>>> -ERESTARTSYS. Although the abort was effective we were unfortunately never
>>> really propagating this error though and as such userspace could not know
>>> why the abort happened.
>>
>> Can you give a simple example of what's going on and why it matters?
>>
>> ERESTARTSYS and friends are highly magical, and I'm not convinced that
>> allowing _request_firmware_load to return -ERESTARTSYS is actually a
>> good idea.  What if there are system calls that can't handle this
>> style of restart that start being restarted as a result?
>
> This seems to be a linux-api question, so Cc'ing them and Michael.
>
> For those not familiar it is worth explaining first the user interface.
>
> This describes the fallback mechanism of the Linux firmware API if
> direct filesystem lookup fails.

...

> While we wait we can get a
> -ERESTARTSYS since swait_event_interruptible_timeout() uses
> __swait_event_interruptible_timeout() under the hood and this in turn
> ___swait_event() which can prepare_to_swait_event() which can return
> -ERESTARTSYS on signal_pending_state().

This is too much kernel detail and too little ABI detail.

User code does some syscall.  Kernel requests firmware and that
request gets interrupted.  What syscall is this?  read(2)?  open(2)?
Something else?

mutex_lock_interruptible() returns -EINTR if interrupted.  It seems
odd to be that requesting firmware would be different.

>
> The issue discovered was that Android could issue SIGCHLD and the
> waiter gets a signal but the reason for the exact reason for the
> failure is not propagated. The proposed patch propagates -ERESTARTSYS
> when that is returned on signal_pending_state() as we wait.

Maybe SIGCHLD shouldn't interrupt firmware loading?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]         ` <CALCETrU4__YUGk36PN=FbuEf0SBaTrxQQqm4sWs2NrZ+6WN7jA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-25  8:28           ` Fuzzey, Martin
  2017-05-26 11:09             ` Eric W. Biederman
       [not found]             ` <CANh8QzwPb_+RKs5QVt7mdFk8h_rOMVS3j9m0OADgvzBtNqBBLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 2 replies; 34+ messages in thread
From: Fuzzey, Martin @ 2017-05-25  8:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds

On 25 May 2017 at 06:13, Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>
>>> Can you give a simple example of what's going on and why it matters?
>>>


Here is the use case in which I ran into this problem.

I have a driver which does request_firmware() when a write() is done
to a sysfs file.

The write() was being done by an android init script (with the init
interpreter "write" command).
init, of course, forks lots of processes and some of the children die.

So the scenario was the following:

1) Android init calls write() on the sysfs file
2) The sysfs .store() callback registered by a driver is called
3) The driver calls request_firmware()
4) request_firmware() sends the firmware load request to userspace and
calls wait_for_completion_interruptible()
5) A child dies and raises SIGCHLD
6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal
7) request_firmware() [before this patch] translated that to -EAGAIN
8) The driver (in my case) ignored this [because the firmware was not
critical - it was for checking if a microcontroler was up to date]
(but it could have returned it to userspace, same problem)

The point being that, due to a signal (SIGCHLD) which has nothing to
do with the firmware loading process, the firmware load was not done.
Also EAGAIN is the same error used if the load request times out so it
was impossible to distinguish the two cases.

ERESTARTSYS is an internal error and is not returned to userspace.
Instead it is handled by the linux syscall machinery which, after
processing the signal either restarts (transpently to userspace) the
syscall or returns EINTR to userspace (depending if the signal handler
users SA_RESTART - see man 7 signal)


With this patch here is what happens:

1) Android init calls write() on the sysfs file
2) The sysfs .store() callback registered by a driver is called
3) The driver calls request_firmware()
4) request_firmware() sends the firmware load request to userspace and
calls wait_for_completion_interruptible()
5) A child dies and raises SIGCHLD
6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal
7) request_firmware() [with this patch] returns -ERESTARTSYS
8) The driver returns -ERSTARTSYS from its sysfs .store method
9) The system call machinery invokes the signal handler
10) The signal handler does its stuff
11) Because SA_RESTART was set the system call is restarted (calling
the sysfs .store) and we try it all again from step 2

Note that, on the the userspace side  write() is only called once (the
restart is transparent to userspace which is oblivious to all this)
The kernel side write() (which calls .store() is called multiple times
(so that code does need to know about this)


>>> ERESTARTSYS and friends are highly magical, and I'm not convinced that
>>> allowing _request_firmware_load to return -ERESTARTSYS is actually a
>>> good idea.  What if there are system calls that can't handle this
>>> style of restart that start being restarted as a result?
>>

If the caller is unable to restart (for example if the driver's
.store() callback had already done lots of stuff that couldn't be
undone) it is free to translate -ERSTARTSYS to -EINTR before
returning.
But request_frimware() can't know about that.


>>> Maybe SIGCHLD shouldn't interrupt firmware loading?

I don't think there's a way of doing that without disabling all
signals (ie using the non interruptible wait variants).
It used to be that way (which is why I only ran into this after
updating from an ancient 3.16 kernel to a slightly less ancient 4.4)
But there are valid reasons for wanting to be able to interrupt
firmware loading (like being able to kill the userspace helper)

Hope this explains it better,

Regads,

Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-05-25  8:28           ` Fuzzey, Martin
@ 2017-05-26 11:09             ` Eric W. Biederman
       [not found]               ` <87fufr3mdy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
       [not found]             ` <CANh8QzwPb_+RKs5QVt7mdFk8h_rOMVS3j9m0OADgvzBtNqBBLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: Eric W. Biederman @ 2017-05-26 11:09 UTC (permalink / raw)
  To: Fuzzey, Martin
  Cc: Andy Lutomirski, Luis R. Rodriguez, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo

"Fuzzey, Martin" <mfuzzey@parkeon.com> writes:

> On 25 May 2017 at 06:13, Andy Lutomirski <luto@kernel.org> wrote:
>>>>
>>>> Can you give a simple example of what's going on and why it matters?
>>>>
>
>
> Here is the use case in which I ran into this problem.
>
> I have a driver which does request_firmware() when a write() is done
> to a sysfs file.
>
> The write() was being done by an android init script (with the init
> interpreter "write" command).
> init, of course, forks lots of processes and some of the children die.
>
> So the scenario was the following:
>
> 1) Android init calls write() on the sysfs file
> 2) The sysfs .store() callback registered by a driver is called
> 3) The driver calls request_firmware()
> 4) request_firmware() sends the firmware load request to userspace and
> calls wait_for_completion_interruptible()
> 5) A child dies and raises SIGCHLD
> 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal
> 7) request_firmware() [before this patch] translated that to -EAGAIN
> 8) The driver (in my case) ignored this [because the firmware was not
> critical - it was for checking if a microcontroler was up to date]
> (but it could have returned it to userspace, same problem)
>
> The point being that, due to a signal (SIGCHLD) which has nothing to
> do with the firmware loading process, the firmware load was not done.
> Also EAGAIN is the same error used if the load request times out so it
> was impossible to distinguish the two cases.
>
> ERESTARTSYS is an internal error and is not returned to userspace.
> Instead it is handled by the linux syscall machinery which, after
> processing the signal either restarts (transpently to userspace) the
> syscall or returns EINTR to userspace (depending if the signal handler
> users SA_RESTART - see man 7 signal)
>
>
> With this patch here is what happens:
>
> 1) Android init calls write() on the sysfs file
> 2) The sysfs .store() callback registered by a driver is called
> 3) The driver calls request_firmware()
> 4) request_firmware() sends the firmware load request to userspace and
> calls wait_for_completion_interruptible()
> 5) A child dies and raises SIGCHLD
> 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal
> 7) request_firmware() [with this patch] returns -ERESTARTSYS
> 8) The driver returns -ERSTARTSYS from its sysfs .store method
> 9) The system call machinery invokes the signal handler
> 10) The signal handler does its stuff
> 11) Because SA_RESTART was set the system call is restarted (calling
> the sysfs .store) and we try it all again from step 2
>
> Note that, on the the userspace side  write() is only called once (the
> restart is transparent to userspace which is oblivious to all this)
> The kernel side write() (which calls .store() is called multiple times
> (so that code does need to know about this)
>
>
>>>> ERESTARTSYS and friends are highly magical, and I'm not convinced that
>>>> allowing _request_firmware_load to return -ERESTARTSYS is actually a
>>>> good idea.  What if there are system calls that can't handle this
>>>> style of restart that start being restarted as a result?
>>>
>
> If the caller is unable to restart (for example if the driver's
> .store() callback had already done lots of stuff that couldn't be
> undone) it is free to translate -ERSTARTSYS to -EINTR before
> returning.
> But request_frimware() can't know about that.
>
>
>>>> Maybe SIGCHLD shouldn't interrupt firmware loading?
>
> I don't think there's a way of doing that without disabling all
> signals (ie using the non interruptible wait variants).
> It used to be that way (which is why I only ran into this after
> updating from an ancient 3.16 kernel to a slightly less ancient 4.4)
> But there are valid reasons for wanting to be able to interrupt
> firmware loading (like being able to kill the userspace helper)

Perhaps simply using a killable wait and not a fully interruptible
wait would be better?

It sounds like the code really is not prepared for an truly
interruptible wait here.

Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]             ` <CANh8QzwPb_+RKs5QVt7mdFk8h_rOMVS3j9m0OADgvzBtNqBBLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-26 19:40               ` Luis R. Rodriguez
       [not found]                 ` <20170526194001.GR8951-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
  2017-06-07 17:08               ` Luis R. Rodriguez
  1 sibling, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-05-26 19:40 UTC (permalink / raw)
  To: Fuzzey, Martin, Nicolas, Dmitry Torokhov, Kees Cook,
	Michael Kerrisk (man-pages)
  Cc: Andy Lutomirski, Luis R. Rodriguez, Linux API, Peter Zijlstra,
	Greg KH, Daniel Wagner, David Woodhouse,
	jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Luca Coelho, Emmanuel Grumbach,
	Kalle Valo, Linus Torvalds, AK

On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
> On 25 May 2017 at 06:13, Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >>>
> >>> Can you give a simple example of what's going on and why it matters?
> >>>
> 
> 
> Here is the use case in which I ran into this problem.
> 
> I have a driver which does request_firmware() when a write() is done
> to a sysfs file.
> 
> The write() was being done by an android init script (with the init
> interpreter "write" command).
> init, of course, forks lots of processes and some of the children die.
> 
> So the scenario was the following:
> 
> 1) Android init calls write() on the sysfs file
> 2) The sysfs .store() callback registered by a driver is called
> 3) The driver calls request_firmware()
> 4) request_firmware() sends the firmware load request to userspace and
> calls wait_for_completion_interruptible()

No no, this is not how the fallback system works.

Please have a look at the latest documentation revamp [0], in particular
the fallback mechanism [1], hopefully this clarifies other details you might
like to know about this first part.

Please note I *highly* encourage relying on the uevent fallback mechanism,
given the custom fallback mechanism has its own quirks and from my own review
I'd be pretty wary about using it.

[0] https://www.kernel.org/doc/html/latest/driver-api/firmware/index.html
[1] https://www.kernel.org/doc/html/latest/driver-api/firmware/fallback-mechanisms.html

Here is the correct summary:

1) Driver issues request_firmware()
2) firmware_class looks for the firmware on the local filesystem, if it
   does not find it and the fallback mechanism is enabled
   firmware_class issues uevent for the device that requested the firmware
3) Userspace is expected to be monitoring for these uevents and based on
   them should only *then* try a write() on the respective sysfs file.
4) The kernel actually uses swait_event_interruptible_timeout() as of
   v4.10 due to commit 5b029624948d6 ("firmware: do not use fw_lock for
   fw_state protection"), before it used wait_for_completion_interruptible_timeout().
   It uses this to *wait* for the sysfs write to complete, so can be
   woken up by it.

Please note we just uncovered another issue on firmware_class reported by
Nicolas when multiple cards are used and a signal of completion on sucesss or
error is not issued [2] for which John Ewalt has suggested a fix for [3].
I just reviewed this yesterday and determiend the swake_up()/swake_up_all()
fix is one atomic fix for 5b029624948d6 which moved us to swait given otherwise
we won't wake up all waiters on the same file. The wait stuff is used for
other means not well documented than the sysfs interface: if multiple 
requests come in for the same file we batch up these requests as one and
wait for a signal so we can rely on the same file fetch to write to the
pending request. Unfortunately I have discovered yesterday we have quite a
bit of fail paths that never sent the swake_up_all() (this would be
a completion on kernels older than v4.10) on *many* error paths. Note
though there is only one wait used and it uses:

	swait_event_interruptible_timeout()	>= v4.10
	wait_for_completion_interruptible_timeout() older

Since this is interruptible, and has a timeout it does mean if we have
a reboot the wait will be killed.

Since this wait also has a timeout it means we will have an end if
userspace never does anything.

5) Two sysfs files are exposed:

        /sys/$DEVPATH/loading
		echo 1 > loading # starts load
		echo 0 > loading # indicates load is done
				 # asking to load data to driver
		echo -1 > loading # indicates an error occured

For detail see firmware_loading_store(), note we check for fw_state_is_aborted()
before proceeding.

        /sys/$DEVPATH/data

For details see firmware_data_read(), note it checks for fw_state_is_done()
early on before proceeding. fw_state_is_done() includes an abort check:

static inline bool __fw_state_is_done(enum fw_status status)                    
{                                                                               
        return status == FW_STATUS_DONE || status == FW_STATUS_ABORTED;         
}   

[2] https://bugzilla.kernel.org/show_bug.cgi?id=195477
[3] https://bugzilla.kernel.org/attachment.cgi?id=256493&action=diff&collapsed=&headers=1&format=raw

> 5) A child dies and raises SIGCHLD

What child? The process doing the write() ?

If we have userspace doing a write() on the sysfs interface then
firmware_loading_store() gets kicked off, it should actually try to complete,
unless the singal was processed early in which case the write would bail early
on fw_state_is_done() check.

Otherwise I suppose the kernel can send a signal pending to the swait but
its not clear under what conditions exactly.

> 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal

That's for kernels older than v4.10, on >= v4.10 that would be
swait_event_interruptible_timeout() and I confirm -ERESTARTSYS can
be sent as a return value.

> 7) request_firmware() [before this patch] translated that to -EAGAIN

Right, this is today's default error if we fail on the wait. This error would
be returned to the *driver*. This is returned on _request_firmware_load() and
sent back to the request_firmware_*() call.

Martin seems to be arguing -ERESTARTSYS should be sent back instead given
it is what the wait returned after all.

> 8) The driver (in my case) ignored this [because the firmware was not
> critical - it was for checking if a microcontroler was up to date]
> (but it could have returned it to userspace, same problem)

I agree that using -EAGAIN for a signal is incorrect and we should propagate
something more reasonable to the driver, but also given that firmware loading
is typically a bad idea on init but is acceptable on probe, it could mean a
driver propagates this to userspace.

Another issue to consider here is drivers using request_firmware_nowait() may
not get an actual error passed down given the lack of semantics of the
firmware_class. This will be fixed later on the driver data API but for now
it means async callbacks can only infer what might have happened. The callback
must be revised in this light to ensure some sanity check is done of some sort.

So to say that the driver ignored it is saying that the sync call was used
for sure and we know that it ignored -EAGAIN. Is that the case ?

If not what driver was used that ignored -EAGAIN? If it was an async callback
no error value is passed and what you are dealing with is more of an issue
with lack of semantics in the callback very likely and the inability to
deal with error respectively.

So we have three separate issues here:

1) A driver's possible lack of propagation errors
2) The actual error returned by firmware_class and semantics on async callback
3) What errors we should send on syfs files if a file if interrupted or error

The gruseome TLDR details below.

=========================================================

1) A driver's possible lack of propagation errors

Given 2) drivers should take care to be pedantic about how they deal with
errors. In particular if an async callback is used, a signal received
may mean the callback can only check for NULL, and it can only infer if
an error occurred. If the driver deals with daisy chaining files it might
mean it could skip over files without treating fatal errors accordingly.

2) The actual error returned by firmware_class and semantics on async callback

The async callbacks only get NULL on the firmware passed if an error was
detected, it cannot know if a specific error actually occurred. Currently only
sync calls can get the actual -EAGAIN (and if we stich this the respective
-ERESTARTSYS). This will be fixed with the driver data API, as it allows
callbacks to get the actual error passed.

Also, since async callbacks can be used on probe, but we don't want to delay
boot and since init + probe() are batched together we tend to not want to wait
on probe, so this is why many drivers returns right away after an async
firmware call. Even though we have async probe now (and drivers can prefer
this) that still would mean userspace does not get the actual error value. Its
up to drivers to also be able to take down the driver on error if an async call
for fw is used and it fails. The actual error on an async call will only be
available in the future, right now only NULL is passed.

Care with the above lack of current semantics must be taken into account

A user API question still stands as to if we want to send -ERESTARTSYS to
drivers to potentially send back to userspace on driver load given this
can still be sent for sync calls today and we are sending -EAGAIN.

3) What errors we should send on syfs files if a file if interrupted or error

We could capture an error happened on any of the sysfs files by checking
the firwmware status. For instance fw_state_is_aborted() checks on
firmware_loading_store(); fw_state_is_done() on firmware_data_read();
or fw_state_is_done() check on firmware_data_write().

Right now the errors we pass vary. As an example firmware_data_write()
can fail with -ENODEV if for some reason a signal was sent and we
bailed early. That -EPERM on !capable(CAP_SYS_RAWIO) should probably
be changed to -EACCESS. The firmware_data_read() sends -ENODEV() on
error. firmware_loading_store() sends the size of the buffer passed
to us on error -- that seems like an issue.

> The point being that, due to a signal (SIGCHLD) which has nothing to
> do with the firmware loading process, the firmware load was not done.

Who gets SIGCHLD and how does this exactly get to swait_event_interruptible_timeout()
or wait_for_completion_interruptible_timeout() exactly ?

> Also EAGAIN is the same error used if the load request times out so it
> was impossible to distinguish the two cases.

I agree this is an issue.

> ERESTARTSYS is an internal error and is not returned to userspace.
> Instead it is handled by the linux syscall machinery which,

Not accurate, we just never pass it on the firmware_class.c on
swait_event_interruptible_timeout() error. In fact we mask all
errors to -EAGAIN, and if the timeout ran out we just send
-ETIMEDOUT. If the request was aborted early the error -ENOENT
is sent.

> after
> processing the signal either restarts (transpently to userspace) the
> syscall or returns EINTR to userspace (depending if the signal handler
> users SA_RESTART - see man 7 signal)
> 
> 
> With this patch here is what happens:
> 
> 1) Android init calls write() on the sysfs file
> 2) The sysfs .store() callback registered by a driver is called
> 3) The driver calls request_firmware()
> 4) request_firmware() sends the firmware load request to userspace and
> calls wait_for_completion_interruptible()
> 5) A child dies and raises SIGCHLD
> 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal
> 7) request_firmware() [with this patch] returns -ERESTARTSYS
> 8) The driver returns -ERSTARTSYS from its sysfs .store method

If you're talking about a custom driver of sorts that triggered
a request_firmware() call (note sync) then yes your earlier description
is accurate and in this case as well the driver specific sysfs interface
can end up returning the same error.

If this file is custom then its up to you to decide what you want
for error on that file, but for the firmware_class.c I do agree on
returning an agreed upon error so drivers can Do The Right Thing (TM).

> 9) The system call machinery invokes the signal handler
> 10) The signal handler does its stuff
> 11) Because SA_RESTART was set the system call is restarted (calling
> the sysfs .store) and we try it all again from step 2

OK so this seems to reveal an internal working of some android
loader of some sort. Thanks.

> Note that, on the the userspace side  write() is only called once

The write is to the custom driver trigger which calls request_firmware() ?

> (the
> restart is transparent to userspace which is oblivious to all this)
> The kernel side write() (which calls .store() is called multiple times
> (so that code does need to know about this)

Even if the signal is captured and -ERSTARTSYS is sent down to a custom
sysfs file trigger which called request_firmware(), if -ERSTARTSYS
triggers userspace to try again the old sysfs loader for the fallback
mechanism will not be available anymore, so actually I'd be surprised
if the retry designed by android works here. It *would* have to
retry the whole thing again, starting from the custom trigger.

> >>> ERESTARTSYS and friends are highly magical, and I'm not convinced that
> >>> allowing _request_firmware_load to return -ERESTARTSYS is actually a
> >>> good idea.  What if there are system calls that can't handle this
> >>> style of restart that start being restarted as a result?

Given the above explanation I see the concern now. In our case I believe a
valid concern would be sending -ERESTARTSYS to the system call finit_module()
given request_firmware() could be called on probe and if a stupid driver did a
sync call on probe it could potentially send -ERESTARTSYS down.

This could mean we get a loop on finit_module() if a signal is detected
on firmware_request() on every call. Is that fine? Is it expected ?

> If the caller is unable to restart (for example if the driver's
> .store() callback had already done lots of stuff that couldn't be
> undone) it is free to translate -ERSTARTSYS to -EINTR before
> returning.
> But request_frimware() can't know about that.

So you seem to be suggesting the driver's should decide to mask or
not -ERSTARTSYS.

Do we send -ERSTARTSYS on any other paths on finit_module() ?

> >>> Maybe SIGCHLD shouldn't interrupt firmware loading?
> 
> I don't think there's a way of doing that without disabling all
> signals (ie using the non interruptible wait variants).

It could be an option if we deem such API is needed.

> It used to be that way (which is why I only ran into this after
> updating from an ancient 3.16 kernel to a slightly less ancient 4.4)

Indeed and back then I think we only ever returned -ENOMEM on error !

> But there are valid reasons for wanting to be able to interrupt
> firmware loading (like being able to kill the userspace helper)

Let's call it the fallback mechanism, as we should. OK so if the
fallback helper is called I fail to see how this is detrimental,
we'd just timeout, no ?

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]               ` <87fufr3mdy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-05-26 19:46                 ` Luis R. Rodriguez
  2017-05-26 21:26                   ` Dmitry Torokhov
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-05-26 19:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Fuzzey, Martin, Andy Lutomirski, Luis R. Rodriguez,
	Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho

On Fri, May 26, 2017 at 06:09:29AM -0500, Eric W. Biederman wrote:
> "Fuzzey, Martin" <mfuzzey-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org> writes:
> >>>> Maybe SIGCHLD shouldn't interrupt firmware loading?
> >
> > I don't think there's a way of doing that without disabling all
> > signals (ie using the non interruptible wait variants).
> > It used to be that way (which is why I only ran into this after
> > updating from an ancient 3.16 kernel to a slightly less ancient 4.4)
> > But there are valid reasons for wanting to be able to interrupt
> > firmware loading (like being able to kill the userspace helper)
> 
> Perhaps simply using a killable wait and not a fully interruptible
> wait would be better?

What do you mean by a killable wait BTW?

ret = swait_event_interruptible_timeout() is being used right now.

The problem is we have:

        if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)                     
                return -ENOENT;                                                 
        if (!ret)                                                               
                return -ETIMEDOUT;                                              
                                                                                
        return ret < 0 ? ret : 0;  

The (!ret) return -ETIMEDOUT ensures that if there was no time left
then we know we ran out of time.

The ret < 0 ? ret makes sure we send any errors
swait_event_interruptible_timeout() sent.

But the caller of this code has:

        if (fw_state_is_aborted(&buf->fw_st))                                   
                retval = -EAGAIN;                                               
        else if (buf->is_paged_buf && !buf->data)                               
                retval = -ENOMEM; 

And this retval is used. so we mask all errors with -EAGAIN.

So Martin is asking us to let us send -ERESTARTSYS back down to drivers.
These potentially could send back down to probe, and so finit_module()
could get this.

Another use case is a custom syfs knob which triggers a request_firmware(),
in such case this is a simple write(), but Anroid is configured to retry
if -ERESTARTSYS so I gather it will *retry* writing again to this file
if -ERESTARTSYS was sent and therefore triggering another firmware request.

> It sounds like the code really is not prepared for an truly
> interruptible wait here.

Can you clarify what you mean?

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                 ` <20170526194001.GR8951-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
@ 2017-05-26 20:23                   ` Fuzzey, Martin
       [not found]                     ` <CANh8QzyqQ5hubWJvWYxgoQ3baL6sgoQPSzEHMY0tu8WNGS2gZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Fuzzey, Martin @ 2017-05-26 20:23 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Nicolas, John Ewalt, Dmitry Torokhov, Kees Cook,
	Michael Kerrisk (man-pages),
	Andy Lutomirski, Linux API, Peter Zijlstra, Greg KH,
	Daniel Wagner, David Woodhouse, rafal-g1n6cQUeyibVItvQsEIGlw,
	Arend Van Spriel, Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Luca Coelho, Emmanu

On 26 May 2017 at 21:40, Luis R. Rodriguez <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>> On 25 May 2017 at 06:13, Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> >>>
>> >>> Can you give a simple example of what's going on and why it matters?
>> >>>
>>
>>
>> Here is the use case in which I ran into this problem.
>>
>> I have a driver which does request_firmware() when a write() is done
>> to a sysfs file.
>>
>> The write() was being done by an android init script (with the init
>> interpreter "write" command).
>> init, of course, forks lots of processes and some of the children die.
>>
>> So the scenario was the following:
>>
>
> No no, this is not how the fallback system works.
>

I was (implicitly) describing the fallback mechanism.
Indeed as you say the kernel now first tries to load the firmware
itself without involving userspace.
I'm describing what happens *after* this fails and the kernel falls
back to userspace

The sysfs file I was talking about is *not* the sysfs file involved in
the firmware loading mechanism
but a *custom* sysfs file used by a driver to *trigger* the call to
request_firmware() [synchronous] in the first place.

That is, my driver does not do request_firmware() in its probe()
function like most but when requested by
userspace. That's a valid usage as far as I can tell. Nothing says
request_firmware() is only allowed from probe.

Not sure it is relevant here but here's the reason for doing it this
way: (skip this bit if not interested)
I have a small microcontroller used to manage the power to the main CPU.
It is connected to the CPU by a GPIO line.
On power up the microcrontroller powers up the main CPU and starts a timer.
If the application doesn't start in time the microcontroller power
cycles the CPU.

In addition to the the above the microcontroller is connected to the
CPU by a I2C bus for various other functions
including firmware update (of the microcontroller software).
In order to keep the critical power up code on the microcontroller
very simple the I2C connection is not available
until *after* the power up confirmation by the GPIO line.

So, on the linux side, there is a custom driver which exports a sysfs
entry that userspace writes to in order to confirm
startup.
The driver, when that sysfs file is written to, first sets the GPIO
line to signal the microcontroller that the application has
started so it can stop its timer and activate the I2C interface. Then
it does a request_firmware() to obtain the firmware
the microcontroler is supposed to have. It then verifies it using
commands over I2C to compare checksums etc and
updates it if needed.


>> 5) A child dies and raises SIGCHLD
>
> What child? The process doing the write() ?
>
A child of the process doing the write on the drivers custom sysfs
entry that triggered the request_firmware()

> Martin seems to be arguing -ERESTARTSYS should be sent back instead given
> it is what the wait returned after all.
>

Yes


> If you're talking about a custom driver of sorts that triggered
> a request_firmware() call (note sync) then yes your earlier description
> is accurate and in this case as well the driver specific sysfs interface
> can end up returning the same error.
>

Exactly


> If this file is custom then its up to you to decide what you want
> for error on that file, but for the firmware_class.c I do agree on
> returning an agreed upon error so drivers can Do The Right Thing (TM).
>

Yes


>> 9) The system call machinery invokes the signal handler
>> 10) The signal handler does its stuff
>> 11) Because SA_RESTART was set the system call is restarted (calling
>> the sysfs .store) and we try it all again from step 2
>
> OK so this seems to reveal an internal working of some android
> loader of some sort. Thanks.
>

Nothing Android specific here.

This is the standard *linux* behaviour.
The retry is done *in the kernel* not on the android userspace side


>> Note that, on the the userspace side  write() is only called once
>
> The write is to the custom driver trigger which calls request_firmware() ?
>

Yes



> This could mean we get a loop on finit_module() if a signal is detected
> on firmware_request() on every call. Is that fine? Is it expected ?

Ok so here we are talking about the standard case of
request_firmware() being called from probe()
If the driver is a loadable module then yes the it will be retried.
If it gets a signal on every try then something else is wrong I'd say

If the driver is compiled in then there is no retry since the call to probe
doesn't go through the syscall machinary. But there shouldn't be a
signal either in that
case since it's not being called from a userspace process in that case.

> So you seem to be suggesting the driver's should decide to mask or
> not -ERSTARTSYS.
>

Only if the driver knows that it is not restartable *itself*.
Shoudn't happen very often


Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                     ` <CANh8QzyqQ5hubWJvWYxgoQ3baL6sgoQPSzEHMY0tu8WNGS2gZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-26 20:52                       ` Luis R. Rodriguez
  0 siblings, 0 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-05-26 20:52 UTC (permalink / raw)
  To: Fuzzey, Martin
  Cc: Luis R. Rodriguez, Nicolas, John Ewalt, Dmitry Torokhov,
	Kees Cook, Michael Kerrisk (man-pages),
	Andy Lutomirski, Linux API, Peter Zijlstra, Greg KH,
	Daniel Wagner, David Woodhouse, rafal-g1n6cQUeyibVItvQsEIGlw,
	Arend Van Spriel, Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg

On Fri, May 26, 2017 at 10:23:03PM +0200, Fuzzey, Martin wrote:
> On 26 May 2017 at 21:40, Luis R. Rodriguez <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >
> > No no, this is not how the fallback system works.
> 
> The sysfs file I was talking about is *not* the sysfs file involved in
> the firmware loading mechanism
> but a *custom* sysfs file used by a driver to *trigger* the call to
> request_firmware() [synchronous] in the first place.

OK!

> That is, my driver does not do request_firmware() in its probe()
> function like most but when requested by
> userspace. That's a valid usage as far as I can tell. Nothing says
> request_firmware() is only allowed from probe.

In fact we prefer sync calls not to be done on probe from just a practical
perspective as this can delay boot.

> Not sure it is relevant here but here's the reason for doing it this
> way: (skip this bit if not interested)
> I have a small microcontroller used to manage the power to the main CPU.
> It is connected to the CPU by a GPIO line.
> On power up the microcrontroller powers up the main CPU and starts a timer.
> If the application doesn't start in time the microcontroller power
> cycles the CPU.

OK

> In addition to the the above the microcontroller is connected to the
> CPU by a I2C bus for various other functions
> including firmware update (of the microcontroller software).

OK.

> In order to keep the critical power up code on the microcontroller
> very simple the I2C connection is not available
> until *after* the power up confirmation by the GPIO line.

I was not able to parse this.

You seem to be saying the I2C interface for firmware updates of the
microcontroller software is not available until the CPU it controls
is up, and this is confirmed through the GPIO line ? The I2c firmware
update interface is not for the target CPU the microcontroller controls ?

>From reading the below it seems this is right.

> So, on the linux side, there is a custom driver which exports a sysfs
> entry that userspace writes to in order to confirm
> startup.

OK so Linux runs on this host the microcontroller powers up, and it has
a custom driver, and it just toggles a knob to inform the microcontroller,
"Hey dude, I'm up, give me firmware!" ?

> The driver, when that sysfs file is written to, first sets the GPIO
> line to signal the microcontroller that the application has
> started so it can stop its timer and activate the I2C interface.

Got it! How does it know the I2C interface is ready ? Does it wait ?

> Then it does a request_firmware() to obtain the firmware the microcontroler
> is supposed to have. It then verifies it using commands over I2C to compare
> checksums etc and updates it if needed.

OK. Got it.

> >> 9) The system call machinery invokes the signal handler
> >> 10) The signal handler does its stuff
> >> 11) Because SA_RESTART was set the system call is restarted (calling
> >> the sysfs .store) and we try it all again from step 2
> >
> > OK so this seems to reveal an internal working of some android
> > loader of some sort. Thanks.
> >
> 
> Nothing Android specific here.
> 
> This is the standard *linux* behaviour.
> The retry is done *in the kernel* not on the android userspace side

Alright !

> > This could mean we get a loop on finit_module() if a signal is detected
> > on firmware_request() on every call. Is that fine? Is it expected ?
> 
> Ok so here we are talking about the standard case 

I would not say its standard. We support it, but we prefer async requests
are used there.

But its possible...  And since this as you clarify Linux deals with this
specially it is of interest what we do return.

> of request_firmware() being
> called from probe() If the driver is a loadable module then yes the it will
> be retried.  If it gets a signal on every try then something else is wrong
> I'd say
> 
> If the driver is compiled in then there is no retry since the call to probe
> doesn't go through the syscall machinary. But there shouldn't be a
> signal either in that
> case since it's not being called from a userspace process in that case.

Still -- its all possible on finit_module() so we still must consider the case.
If sending -ERESTARTSYS is not a good idea we could for instance send another
special error to let driver *then* make an informed decision, but that it
could still interpret as it actually having been -ERESTARTSYS.

> > So you seem to be suggesting the driver's should decide to mask or
> > not -ERSTARTSYS.
> >
> 
> Only if the driver knows that it is not restartable *itself*.
> Shoudn't happen very often

I'd prefer to hear more from linux-api folks.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-05-26 19:46                 ` Luis R. Rodriguez
@ 2017-05-26 21:26                   ` Dmitry Torokhov
       [not found]                     ` <CAKdAkRTrcTVOAP5GK-R=Au_tL5WqSn5UkQEzNe5NcCWXS8mbtA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Dmitry Torokhov @ 2017-05-26 21:26 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Eric W. Biederman, Fuzzey, Martin, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho

On Fri, May 26, 2017 at 12:46 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Fri, May 26, 2017 at 06:09:29AM -0500, Eric W. Biederman wrote:
>> "Fuzzey, Martin" <mfuzzey@parkeon.com> writes:
>> >>>> Maybe SIGCHLD shouldn't interrupt firmware loading?
>> >
>> > I don't think there's a way of doing that without disabling all
>> > signals (ie using the non interruptible wait variants).
>> > It used to be that way (which is why I only ran into this after
>> > updating from an ancient 3.16 kernel to a slightly less ancient 4.4)
>> > But there are valid reasons for wanting to be able to interrupt
>> > firmware loading (like being able to kill the userspace helper)
>>
>> Perhaps simply using a killable wait and not a fully interruptible
>> wait would be better?
>
> What do you mean by a killable wait BTW?

https://lwn.net/Articles/288056/

I think only interrupting firmware loading with fatal signals would
make a lot of sense.

>
> ret = swait_event_interruptible_timeout() is being used right now.

It looks like we are missing swait_event_killable*(), but I do not
think it would be hard to add.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                     ` <CAKdAkRTrcTVOAP5GK-R=Au_tL5WqSn5UkQEzNe5NcCWXS8mbtA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-26 21:32                       ` Luis R. Rodriguez
  2017-05-26 21:55                         ` Dmitry Torokhov
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-05-26 21:32 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Eric W. Biederman, Fuzzey, Martin, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho

On Fri, May 26, 2017 at 2:26 PM, Dmitry Torokhov
<dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Fri, May 26, 2017 at 12:46 PM, Luis R. Rodriguez <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> On Fri, May 26, 2017 at 06:09:29AM -0500, Eric W. Biederman wrote:
>>> "Fuzzey, Martin" <mfuzzey-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org> writes:
>>> >>>> Maybe SIGCHLD shouldn't interrupt firmware loading?
>>> >
>>> > I don't think there's a way of doing that without disabling all
>>> > signals (ie using the non interruptible wait variants).
>>> > It used to be that way (which is why I only ran into this after
>>> > updating from an ancient 3.16 kernel to a slightly less ancient 4.4)
>>> > But there are valid reasons for wanting to be able to interrupt
>>> > firmware loading (like being able to kill the userspace helper)
>>>
>>> Perhaps simply using a killable wait and not a fully interruptible
>>> wait would be better?
>>
>> What do you mean by a killable wait BTW?
>
> https://lwn.net/Articles/288056/
>
> I think only interrupting firmware loading with fatal signals would
> make a lot of sense.
>
>>
>> ret = swait_event_interruptible_timeout() is being used right now.
>
> It looks like we are missing swait_event_killable*(), but I do not
> think it would be hard to add.

What should we do for stable ? Is this a *stable* issue ?

 Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-05-26 21:32                       ` Luis R. Rodriguez
@ 2017-05-26 21:55                         ` Dmitry Torokhov
  2017-06-05 20:24                           ` Luis R. Rodriguez
  0 siblings, 1 reply; 34+ messages in thread
From: Dmitry Torokhov @ 2017-05-26 21:55 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Eric W. Biederman, Fuzzey, Martin, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho

On Fri, May 26, 2017 at 02:32:31PM -0700, Luis R. Rodriguez wrote:
> On Fri, May 26, 2017 at 2:26 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
> > On Fri, May 26, 2017 at 12:46 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >> On Fri, May 26, 2017 at 06:09:29AM -0500, Eric W. Biederman wrote:
> >>> "Fuzzey, Martin" <mfuzzey@parkeon.com> writes:
> >>> >>>> Maybe SIGCHLD shouldn't interrupt firmware loading?
> >>> >
> >>> > I don't think there's a way of doing that without disabling all
> >>> > signals (ie using the non interruptible wait variants).
> >>> > It used to be that way (which is why I only ran into this after
> >>> > updating from an ancient 3.16 kernel to a slightly less ancient 4.4)
> >>> > But there are valid reasons for wanting to be able to interrupt
> >>> > firmware loading (like being able to kill the userspace helper)
> >>>
> >>> Perhaps simply using a killable wait and not a fully interruptible
> >>> wait would be better?
> >>
> >> What do you mean by a killable wait BTW?
> >
> > https://lwn.net/Articles/288056/
> >
> > I think only interrupting firmware loading with fatal signals would
> > make a lot of sense.
> >
> >>
> >> ret = swait_event_interruptible_timeout() is being used right now.
> >
> > It looks like we are missing swait_event_killable*(), but I do not
> > think it would be hard to add.
> 
> What should we do for stable ? Is this a *stable* issue ?

I think it is, as you have users complaining about behavior. I do not
think we need to make their lives harder than needed by requiring
handling signals.

I do not see why we could not introduce wait_event_killable_timeout()
and swait_event_killable_timeout() into -stables.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-05-26 21:55                         ` Dmitry Torokhov
@ 2017-06-05 20:24                           ` Luis R. Rodriguez
       [not found]                             ` <20170605202410.GQ8951-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-05 20:24 UTC (permalink / raw)
  To: Dmitry Torokhov, Peter Zijlstra, Alan Cox, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages),
	Andy Lutomirski, Greg KH
  Cc: Luis R. Rodriguez, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro

On Fri, May 26, 2017 at 02:55:18PM -0700, Dmitry Torokhov wrote:
> On Fri, May 26, 2017 at 02:32:31PM -0700, Luis R. Rodriguez wrote:
> > On Fri, May 26, 2017 at 2:26 PM, Dmitry Torokhov
> > <dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > On Fri, May 26, 2017 at 12:46 PM, Luis R. Rodriguez <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > >> On Fri, May 26, 2017 at 06:09:29AM -0500, Eric W. Biederman wrote:
> > >>> "Fuzzey, Martin" <mfuzzey-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org> writes:
> > >>> >>>> Maybe SIGCHLD shouldn't interrupt firmware loading?
> > >>> >
> > >>> > I don't think there's a way of doing that without disabling all
> > >>> > signals (ie using the non interruptible wait variants).
> > >>> > It used to be that way (which is why I only ran into this after
> > >>> > updating from an ancient 3.16 kernel to a slightly less ancient 4.4)
> > >>> > But there are valid reasons for wanting to be able to interrupt
> > >>> > firmware loading (like being able to kill the userspace helper)
> > >>>
> > >>> Perhaps simply using a killable wait and not a fully interruptible
> > >>> wait would be better?
> > >>
> > >> What do you mean by a killable wait BTW?
> > >
> > > https://lwn.net/Articles/288056/

Read it thanks ! As per this it states, "Kernel code which uses interruptible
sleeps must always check to see whether it woke up as a result of a signal,
and, if so, clean up whatever it was doing and return -EINTR back to user
space." -- but also on the same article it quotes Alan Cox as having noted
"Unix tradition (and thus almost all applications) believe file store writes to
be non signal interruptible. It would not be safe or practical to change that
guarantee."

For these two reasons then it would seem best we do two things actually:

1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout()
got interrupted by a signal (it returns -ERESTARTSYS)
2) Do as you note below and add wait_event_killable_timeout()

> > > I think only interrupting firmware loading with fatal signals would
> > > make a lot of sense.
> > >
> > >>
> > >> ret = swait_event_interruptible_timeout() is being used right now.
> > >
> > > It looks like we are missing swait_event_killable*(), but I do not
> > > think it would be hard to add.
> > 
> > What should we do for stable ? Is this a *stable* issue ?
> 
> I think it is, as you have users complaining about behavior. I do not
> think we need to make their lives harder than needed by requiring
> handling signals.

Makes sense, specially given the long tradition, it would breaking a long
tradition. Even though in this case we are dealing with sysfs files it
should be no different.

> I do not see why we could not introduce wait_event_killable_timeout()
> and swait_event_killable_timeout() into -stables.

After seeing how simple it is to do so I tend to agree. Greg, Peter,
what are your thoughts ?

Martin Fuzzey can you test this patch as an alternative to your issue ?

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index b9f907eedbf7..70fc42e5e0da 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
 {
 	long ret;
 
-	ret = swait_event_interruptible_timeout(fw_st->wq,
+	ret = swait_event_killable_timeout(fw_st->wq,
 				__fw_state_is_done(READ_ONCE(fw_st->status)),
 				timeout);
 	if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
diff --git a/include/linux/swait.h b/include/linux/swait.h
index c1f9c62a8a50..9c5ca2898b2f 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -169,4 +169,29 @@ do {									\
 	__ret;								\
 })
 
+#define __swait_event_killable(wq, condition)				\
+	(void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule())
+
+#define swait_event_killable(wq, condition)				\
+({									\
+	int __ret = 0;							\
+	if (!(condition))						\
+		__ret = __swait_event_killable(wq, condition);		\
+	__ret;								\
+})
+
+#define __swait_event_killable_timeout(wq, condition, timeout)		\
+	___swait_event(wq, ___wait_cond_timeout(condition),		\
+		      TASK_INTERRUPTIBLE, timeout,			\
+		      __ret = schedule_timeout(__ret))
+
+#define swait_event_killable_timeout(wq, condition, timeout)		\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __swait_event_killable_timeout(wq,		\
+						condition, timeout);	\
+	__ret;								\
+})
+
 #endif /* _LINUX_SWAIT_H */

  Luis

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                             ` <20170605202410.GQ8951-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
@ 2017-06-06  9:04                               ` Martin Fuzzey
       [not found]                                 ` <59367025.3020901-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
  2017-06-06 14:53                               ` Alan Cox
  1 sibling, 1 reply; 34+ messages in thread
From: Martin Fuzzey @ 2017-06-06  9:04 UTC (permalink / raw)
  To: Luis R. Rodriguez, Dmitry Torokhov, Peter Zijlstra, Alan Cox,
	Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages),
	Andy Lutomirski, Greg KH
  Cc: Linux API, Daniel Wagner, David Woodhouse,
	jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro,
	David Howells, Peter Jones, Hans

On 05/06/17 22:24, Luis R. Rodriguez wrote:
>
>
> For these two reasons then it would seem best we do two things actually:
>
> 1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout()
> got interrupted by a signal (it returns -ERESTARTSYS)


I disagree. That would force userspace to handle the signal rather than 
having the kernel retry.

 From Documentation/DocBook/kernel-hacking.tmpl:

    After you slept you should check if a signal occurred: the
    Unix/Linux way of handling signals is to temporarily exit the
    system call with the <constant>-ERESTARTSYS</constant> error.  The
    system call entry code will switch back to user context, process
    the signal handler and then your system call will be restarted
    (unless the user disabled that).  So you should be prepared to
    process the restart, e.g. if you're in the middle of manipulating
    some data structure.



> 2) Do as you note below and add wait_event_killable_timeout()

Hum,
I do think that would be better but, (please correct me if I'm wrong) 
the _killable_ variants only allow
SIGKILL  (and not SIGINT).

0cb64249ca "firmware_loader: abort request if wait_for_completion is 
interrupted"

specifically mentrions ctrl-c (SIGINT) in the commit message so that 
would no longer work.

Myself I think having to use kill -9 to interrupt firmware loading by a 
usespace helper is OK but others may disagree.

> I do not see why we could not introduce wait_event_killable_timeout()
> and swait_event_killable_timeout() into -stables.
> After seeing how simple it is to do so I tend to agree. Greg, Peter,
> what are your thoughts ?
>
> Martin Fuzzey can you test this patch as an alternative to your issue ?
>
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index b9f907eedbf7..70fc42e5e0da 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
>   {
>   	long ret;
>   
> -	ret = swait_event_interruptible_timeout(fw_st->wq,
> +	ret = swait_event_killable_timeout(fw_st->wq,
>   				__fw_state_is_done(READ_ONCE(fw_st->status)),
>   				timeout);
>   	if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> diff --git a/include/linux/swait.h b/include/linux/swait.h
> index c1f9c62a8a50..9c5ca2898b2f 100644
> --- a/include/linux/swait.h
> +++ b/include/linux/swait.h
> @@ -169,4 +169,29 @@ do {									\
>   	__ret;								\
>   })
>   
> +#define __swait_event_killable(wq, condition)				\
> +	(void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule())
> +
> +#define swait_event_killable(wq, condition)				\
> +({									\
> +	int __ret = 0;							\
> +	if (!(condition))						\
> +		__ret = __swait_event_killable(wq, condition);		\
> +	__ret;								\
> +})
> +
> +#define __swait_event_killable_timeout(wq, condition, timeout)		\
> +	___swait_event(wq, ___wait_cond_timeout(condition),		\
> +		      TASK_INTERRUPTIBLE, timeout,			\
> +		      __ret = schedule_timeout(__ret))
> +

Should be TASK_KILLABLE above

> +#define swait_event_killable_timeout(wq, condition, timeout)		\
> +({									\
> +	long __ret = timeout;						\
> +	if (!___wait_cond_timeout(condition))				\
> +		__ret = __swait_event_killable_timeout(wq,		\
> +						condition, timeout);	\
> +	__ret;								\
> +})
> +
>   #endif /* _LINUX_SWAIT_H */
>
>    Luis

After replacing TASK_INTERRUPTIBLE with TASK_KILLABLE above it works for me.


Regards,

Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                             ` <20170605202410.GQ8951-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
  2017-06-06  9:04                               ` Martin Fuzzey
@ 2017-06-06 14:53                               ` Alan Cox
       [not found]                                 ` <1496760796.5682.48.camel-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: Alan Cox @ 2017-06-06 14:53 UTC (permalink / raw)
  To: Luis R. Rodriguez, Dmitry Torokhov, Peter Zijlstra,
	Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages),
	Andy Lutomirski, Greg KH
  Cc: Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse,
	jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro,
	David Howells

> "Unix tradition (and thus almost all applications) believe file store
> writes to
> be non signal interruptible. It would not be safe or practical to
> change that
> guarantee."

Yep everyone codes

	write(disk_file, "foo", 3);

not while(..) blah around it.

> For these two reasons then it would seem best we do two things
> actually:
> 
> 1) return -EINTR instead of -EAGAIN when we detect
> swait_event_interruptible_timeout()
> got interrupted by a signal (it returns -ERESTARTSYS)
> 2) Do as you note below and add wait_event_killable_timeout()

Pedantic detail that I don't think affects you

If you have completed a part of the I/O then you should return the byte
processed count not EINTR, but -1,EINTR if no progress was made.

Alan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                                 ` <59367025.3020901-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
@ 2017-06-06 16:34                                   ` Luis R. Rodriguez
       [not found]                                     ` <20170606163401.GA27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-06 16:34 UTC (permalink / raw)
  To: Martin Fuzzey, Matthew Wilcox, Alan Cox, Jonathan Corbet,
	Michael Kerrisk (man-pages),
	Linux API, fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Luis R. Rodriguez, David Howells, Dmitry Torokhov,
	Peter Zijlstra, Eric W. Biederman, Andy Lutomirski, Greg KH,
	Daniel Wagner, David Woodhouse,
	jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho

Adding fsdevel for review on the correct semantics of handling signals on
write(), in this case a sysfs write which triggered a sync request firmware
call and what the firmware API should return in such case of a signal (I gather
this should be -EINTR and not -ERESTARTSYS). Also whether or not SIGINT should
be followed or if only allowing SIGKILL is fine (fine by me, but it would
change old behaviour).

Hoping between fsdevel and linux-api folks we can hash this out.

On Tue, Jun 06, 2017 at 11:04:37AM +0200, Martin Fuzzey wrote:
> On 05/06/17 22:24, Luis R. Rodriguez wrote:
> > 
> > 
> > For these two reasons then it would seem best we do two things actually:
> > 
> > 1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout()
> > got interrupted by a signal (it returns -ERESTARTSYS)
> 
> 
> I disagree. That would force userspace to handle the signal rather than
> having the kernel retry.
> 
> From Documentation/DocBook/kernel-hacking.tmpl:
> 
>    After you slept you should check if a signal occurred: the
>    Unix/Linux way of handling signals is to temporarily exit the
>    system call with the <constant>-ERESTARTSYS</constant> error.  The
>    system call entry code will switch back to user context, process
>    the signal handler and then your system call will be restarted
>    (unless the user disabled that).  So you should be prepared to
>    process the restart, e.g. if you're in the middle of manipulating
>    some data structure.

This applies but you are missing my point that the LWN article [0] I referred
to also stated "Kernel code which uses interruptible sleeps must always check
to see whether it woke up as a result of a signal, and, if so, clean up
whatever it was doing and return -EINTR back to user space." -- I realize there
may be contradiction with above documentation -- this perhaps can be clarified
with fsdevel folks *but* regardless of that the same article notes Alan Cox
explains that "Unix tradition (and thus almost all applications) believe file
store writes to be non signal interruptible. It would not be safe or practical
to change that guarantee." So for this reason alone there does seem to be an
exemption to the above documentation worth noting for file store writes, and
the patch which you tested below *moves* the sysfs write op for firmware in
that direction by adding a new killable swait.

[0] https://lwn.net/Articles/288056/                                                                                                                                          
                                                                                                                                                                                
> > 2) Do as you note below and add wait_event_killable_timeout()
> 
> Hum,  I do think that would be better but, (please correct me if I'm wrong)
> the _killable_ variants only allow SIGKILL  (and not SIGINT).

That seems correct given a TASK_KILLABLE is also TASK_UNINTERRUPTIBLE.

> 0cb64249ca "firmware_loader: abort request if wait_for_completion is
> interrupted"
> 
> specifically mentrions ctrl-c (SIGINT) in the commit message so that would
> no longer work.

Great point, but it *also* allowed SIGKILL, so I do feel the goal was also to
allow it to be killable. I'm afraid that patch probably did not get proper
review from sufficient folks and its worth now asking ourselves what we'd like
to do.  I'm fine with letting go of SIGINT for firmware sysfs calls for the
sake of keeping with the long standing unix tradition on write, given we *still
have SIGKILL*.

> Myself I think having to use kill -9 to interrupt firmware loading by a
> usespace helper is OK but others may disagree.

Its why I added fsdevel as well. This is really a semantics and uapi question.
Between fsdevel and linux-api folks I would hope we can come to a sensible
resolution.

> > I do not see why we could not introduce wait_event_killable_timeout()
> > and swait_event_killable_timeout() into -stables.
> > After seeing how simple it is to do so I tend to agree. Greg, Peter,
> > what are your thoughts ?
> > 
> > Martin Fuzzey can you test this patch as an alternative to your issue ?
> > 
> > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> > index b9f907eedbf7..70fc42e5e0da 100644
> > --- a/drivers/base/firmware_class.c
> > +++ b/drivers/base/firmware_class.c
> > @@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
> >   {
> >   	long ret;
> > -	ret = swait_event_interruptible_timeout(fw_st->wq,
> > +	ret = swait_event_killable_timeout(fw_st->wq,
> >   				__fw_state_is_done(READ_ONCE(fw_st->status)),
> >   				timeout);
> >   	if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> > diff --git a/include/linux/swait.h b/include/linux/swait.h
> > index c1f9c62a8a50..9c5ca2898b2f 100644
> > --- a/include/linux/swait.h
> > +++ b/include/linux/swait.h
> > @@ -169,4 +169,29 @@ do {									\
> >   	__ret;								\
> >   })
> > +#define __swait_event_killable(wq, condition)				\
> > +	(void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule())
> > +
> > +#define swait_event_killable(wq, condition)				\
> > +({									\
> > +	int __ret = 0;							\
> > +	if (!(condition))						\
> > +		__ret = __swait_event_killable(wq, condition);		\
> > +	__ret;								\
> > +})
> > +
> > +#define __swait_event_killable_timeout(wq, condition, timeout)		\
> > +	___swait_event(wq, ___wait_cond_timeout(condition),		\
> > +		      TASK_INTERRUPTIBLE, timeout,			\
> > +		      __ret = schedule_timeout(__ret))
> > +
> 
> Should be TASK_KILLABLE above

Oops yes sorry.

> > +#define swait_event_killable_timeout(wq, condition, timeout)		\
> > +({									\
> > +	long __ret = timeout;						\
> > +	if (!___wait_cond_timeout(condition))				\
> > +		__ret = __swait_event_killable_timeout(wq,		\
> > +						condition, timeout);	\
> > +	__ret;								\
> > +})
> > +
> >   #endif /* _LINUX_SWAIT_H */
> > 
> >    Luis
> 
> After replacing TASK_INTERRUPTIBLE with TASK_KILLABLE above it works for me.

Great, thanks for testing.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                                 ` <1496760796.5682.48.camel-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
@ 2017-06-06 16:47                                   ` Luis R. Rodriguez
       [not found]                                     ` <20170606164734.GB27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
  2017-06-06 22:11                                     ` Theodore Ts'o
  0 siblings, 2 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-06 16:47 UTC (permalink / raw)
  To: Alan Cox, fsdevel-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd, Li, Yi
  Cc: Luis R. Rodriguez, Dmitry Torokhov, Peter Zijlstra,
	Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages),
	Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API,
	Daniel Wagner, David Woodhouse,
	jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx,
	Moritz Fischer, Petr Mladek, Johannes Berg

Adding fsdevel folks.

On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > "Unix tradition (and thus almost all applications) believe file store
> > writes to
> > be non signal interruptible. It would not be safe or practical to
> > change that
> > guarantee."
> 
> Yep everyone codes
> 
> 	write(disk_file, "foo", 3);
> 
> not while(..) blah around it.

Thanks for the confirmation! That's a simple enough explanation.

> > For these two reasons then it would seem best we do two things
> > actually:
> > 
> > 1) return -EINTR instead of -EAGAIN when we detect
> > swait_event_interruptible_timeout()
> > got interrupted by a signal (it returns -ERESTARTSYS)
> > 2) Do as you note below and add wait_event_killable_timeout()
> 
> Pedantic detail that I don't think affects you
> 
> If you have completed a part of the I/O then you should return the byte
> processed count not EINTR, but -1,EINTR if no progress was made.

You are right with some new exceptions and with regards to the future:

The syfs loading interface for firmware currently goes through the
data file exposed on syfs, the respective write op firmware_data_write()
only checks for signals at the beginning. After that its a full one
swoop try to write if you are following the old tradition and are using
a buffer allocated by the firmware API.

If you are using the relatively new request_firmware_into_buf() added
by Stephen Boyd which lets the driver provide the allocated buffer then
we have a loop in firmware_rw() which should be fixed to:

1) Check for signals
2) Do what you noted above.

Furthermore Yi Li over at Intel is adding some new API calls which would
re-use some of this for FPGA firmwares which are also very large, that
work should consider the above and fix appropriately as well.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                                     ` <20170606163401.GA27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
@ 2017-06-06 17:52                                       ` Luis R. Rodriguez
  0 siblings, 0 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-06 17:52 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Martin Fuzzey, mcgrof-DgEjT+Ai2ygdnm+yROfE0A, Matthew Wilcox,
	Alan Cox, Jonathan Corbet, Michael Kerrisk (man-pages),
	Linux API, David Howells, Dmitry Torokhov, Peter Zijlstra,
	Eric W. Biederman, Andy Lutomirski, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull-yzvPICuk2ADJ97qv1iLQHA

Used wrong alias for fsdevel now, its linux-fsdevel ...

  Luis

On Tue, Jun 06, 2017 at 06:34:01PM +0200, Luis R. Rodriguez wrote:
> Adding fsdevel for review on the correct semantics of handling signals on
> write(), in this case a sysfs write which triggered a sync request firmware
> call and what the firmware API should return in such case of a signal (I gather
> this should be -EINTR and not -ERESTARTSYS). Also whether or not SIGINT should
> be followed or if only allowing SIGKILL is fine (fine by me, but it would
> change old behaviour).
> 
> Hoping between fsdevel and linux-api folks we can hash this out.
> 
> On Tue, Jun 06, 2017 at 11:04:37AM +0200, Martin Fuzzey wrote:
> > On 05/06/17 22:24, Luis R. Rodriguez wrote:
> > > 
> > > 
> > > For these two reasons then it would seem best we do two things actually:
> > > 
> > > 1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout()
> > > got interrupted by a signal (it returns -ERESTARTSYS)
> > 
> > 
> > I disagree. That would force userspace to handle the signal rather than
> > having the kernel retry.
> > 
> > From Documentation/DocBook/kernel-hacking.tmpl:
> > 
> >    After you slept you should check if a signal occurred: the
> >    Unix/Linux way of handling signals is to temporarily exit the
> >    system call with the <constant>-ERESTARTSYS</constant> error.  The
> >    system call entry code will switch back to user context, process
> >    the signal handler and then your system call will be restarted
> >    (unless the user disabled that).  So you should be prepared to
> >    process the restart, e.g. if you're in the middle of manipulating
> >    some data structure.
> 
> This applies but you are missing my point that the LWN article [0] I referred
> to also stated "Kernel code which uses interruptible sleeps must always check
> to see whether it woke up as a result of a signal, and, if so, clean up
> whatever it was doing and return -EINTR back to user space." -- I realize there
> may be contradiction with above documentation -- this perhaps can be clarified
> with fsdevel folks *but* regardless of that the same article notes Alan Cox
> explains that "Unix tradition (and thus almost all applications) believe file
> store writes to be non signal interruptible. It would not be safe or practical
> to change that guarantee." So for this reason alone there does seem to be an
> exemption to the above documentation worth noting for file store writes, and
> the patch which you tested below *moves* the sysfs write op for firmware in
> that direction by adding a new killable swait.
> 
> [0] https://lwn.net/Articles/288056/                                                                                                                                          
>                                                                                                                                                                                 
> > > 2) Do as you note below and add wait_event_killable_timeout()
> > 
> > Hum,  I do think that would be better but, (please correct me if I'm wrong)
> > the _killable_ variants only allow SIGKILL  (and not SIGINT).
> 
> That seems correct given a TASK_KILLABLE is also TASK_UNINTERRUPTIBLE.
> 
> > 0cb64249ca "firmware_loader: abort request if wait_for_completion is
> > interrupted"
> > 
> > specifically mentrions ctrl-c (SIGINT) in the commit message so that would
> > no longer work.
> 
> Great point, but it *also* allowed SIGKILL, so I do feel the goal was also to
> allow it to be killable. I'm afraid that patch probably did not get proper
> review from sufficient folks and its worth now asking ourselves what we'd like
> to do.  I'm fine with letting go of SIGINT for firmware sysfs calls for the
> sake of keeping with the long standing unix tradition on write, given we *still
> have SIGKILL*.
> 
> > Myself I think having to use kill -9 to interrupt firmware loading by a
> > usespace helper is OK but others may disagree.
> 
> Its why I added fsdevel as well. This is really a semantics and uapi question.
> Between fsdevel and linux-api folks I would hope we can come to a sensible
> resolution.
> 
> > > I do not see why we could not introduce wait_event_killable_timeout()
> > > and swait_event_killable_timeout() into -stables.
> > > After seeing how simple it is to do so I tend to agree. Greg, Peter,
> > > what are your thoughts ?
> > > 
> > > Martin Fuzzey can you test this patch as an alternative to your issue ?
> > > 
> > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> > > index b9f907eedbf7..70fc42e5e0da 100644
> > > --- a/drivers/base/firmware_class.c
> > > +++ b/drivers/base/firmware_class.c
> > > @@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
> > >   {
> > >   	long ret;
> > > -	ret = swait_event_interruptible_timeout(fw_st->wq,
> > > +	ret = swait_event_killable_timeout(fw_st->wq,
> > >   				__fw_state_is_done(READ_ONCE(fw_st->status)),
> > >   				timeout);
> > >   	if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> > > diff --git a/include/linux/swait.h b/include/linux/swait.h
> > > index c1f9c62a8a50..9c5ca2898b2f 100644
> > > --- a/include/linux/swait.h
> > > +++ b/include/linux/swait.h
> > > @@ -169,4 +169,29 @@ do {									\
> > >   	__ret;								\
> > >   })
> > > +#define __swait_event_killable(wq, condition)				\
> > > +	(void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule())
> > > +
> > > +#define swait_event_killable(wq, condition)				\
> > > +({									\
> > > +	int __ret = 0;							\
> > > +	if (!(condition))						\
> > > +		__ret = __swait_event_killable(wq, condition);		\
> > > +	__ret;								\
> > > +})
> > > +
> > > +#define __swait_event_killable_timeout(wq, condition, timeout)		\
> > > +	___swait_event(wq, ___wait_cond_timeout(condition),		\
> > > +		      TASK_INTERRUPTIBLE, timeout,			\
> > > +		      __ret = schedule_timeout(__ret))
> > > +
> > 
> > Should be TASK_KILLABLE above
> 
> Oops yes sorry.
> 
> > > +#define swait_event_killable_timeout(wq, condition, timeout)		\
> > > +({									\
> > > +	long __ret = timeout;						\
> > > +	if (!___wait_cond_timeout(condition))				\
> > > +		__ret = __swait_event_killable_timeout(wq,		\
> > > +						condition, timeout);	\
> > > +	__ret;								\
> > > +})
> > > +
> > >   #endif /* _LINUX_SWAIT_H */
> > > 
> > >    Luis
> > 
> > After replacing TASK_INTERRUPTIBLE with TASK_KILLABLE above it works for me.
> 
> Great, thanks for testing.
> 
>   Luis
> 

-- 
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                                     ` <20170606164734.GB27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
@ 2017-06-06 17:54                                       ` Luis R. Rodriguez
  0 siblings, 0 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-06 17:54 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Alan Cox, fsdevel-u79uwXL29TY76Z2rM5mHXA, Luis R. Rodriguez,
	Stephen Boyd, Li, Yi, Dmitry Torokhov, Peter Zijlstra,
	Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages),
	Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API,
	Daniel Wagner, David Woodhouse,
	jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki

Using the right linux-fsdevel this time also, this was the second reply.

  Luis

On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> Adding fsdevel folks.
> 
> On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > > "Unix tradition (and thus almost all applications) believe file store
> > > writes to
> > > be non signal interruptible. It would not be safe or practical to
> > > change that
> > > guarantee."
> > 
> > Yep everyone codes
> > 
> > 	write(disk_file, "foo", 3);
> > 
> > not while(..) blah around it.
> 
> Thanks for the confirmation! That's a simple enough explanation.
> 
> > > For these two reasons then it would seem best we do two things
> > > actually:
> > > 
> > > 1) return -EINTR instead of -EAGAIN when we detect
> > > swait_event_interruptible_timeout()
> > > got interrupted by a signal (it returns -ERESTARTSYS)
> > > 2) Do as you note below and add wait_event_killable_timeout()
> > 
> > Pedantic detail that I don't think affects you
> > 
> > If you have completed a part of the I/O then you should return the byte
> > processed count not EINTR, but -1,EINTR if no progress was made.
> 
> You are right with some new exceptions and with regards to the future:
> 
> The syfs loading interface for firmware currently goes through the
> data file exposed on syfs, the respective write op firmware_data_write()
> only checks for signals at the beginning. After that its a full one
> swoop try to write if you are following the old tradition and are using
> a buffer allocated by the firmware API.
> 
> If you are using the relatively new request_firmware_into_buf() added
> by Stephen Boyd which lets the driver provide the allocated buffer then
> we have a loop in firmware_rw() which should be fixed to:
> 
> 1) Check for signals
> 2) Do what you noted above.
> 
> Furthermore Yi Li over at Intel is adding some new API calls which would
> re-use some of this for FPGA firmwares which are also very large, that
> work should consider the above and fix appropriately as well.
> 
>   Luis
> 

-- 
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-06 16:47                                   ` Luis R. Rodriguez
       [not found]                                     ` <20170606164734.GB27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
@ 2017-06-06 22:11                                     ` Theodore Ts'o
       [not found]                                       ` <20170606221151.ygoxqkwhhjsqw632-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: Theodore Ts'o @ 2017-06-06 22:11 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alan Cox, linux-fsdevel, Stephen Boyd, Li, Yi, Dmitry Torokhov,
	Peter Zijlstra, Jonathan Corbet, Eric W. Biederman,
	Michael Kerrisk (man-pages),
	Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API,
	Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, atull, Moritz

On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > Yep everyone codes
> > 
> > 	write(disk_file, "foo", 3);
> > 
> > not while(..) blah around it.

In general I/O to tty devices and other character mode devices was
where you definitely needed to check for EINTR/EAGAIN because that was
the place where historically Unix systems would interrupt system calls
--- e.g., a user typing control-Z, for example.

And in general writes to file systems and block devices in *general*
were never interrupted by signals, although that was always a
non-portable assumption.

So I've always subscribed to the "be liberal in what you accept,
conservative in what you send" rule of thumb.  Which is to say, any
programs *I* write I'll in general always check for EINTR/EAGAIN and
check for partial writes, but in general, as a kernel program I try to
adhere to the long-standing Unix trandition for disk based files.

This does beg the question about whether firmware devices are more
like tty devices or block devices or files, though.  If before signals
never caused them to return EINTR/EAGAIN, then it's probably best to
not break backwards compatbility.

That being said, not that you also have the option of using
-ERESTARTNOINTR (always restart the system call, regardless of how the
sighandle flags were set), and -ERESTARTNOHAND (restart the system
call always if there was no signal handler and the process was not
killed), in addition to -ERESTARTSYS.  So that might be another option
that's fairly easy to implement or experiment with.

       	      	      		   - Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                                       ` <20170606221151.ygoxqkwhhjsqw632-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
@ 2017-06-07  0:22                                         ` Luis R. Rodriguez
       [not found]                                           ` <20170607002237.GJ27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-07  0:22 UTC (permalink / raw)
  To: Theodore Ts'o, Luis R. Rodriguez, Alan Cox,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd, Li, Yi,
	Dmitry Torokhov, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages),
	Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API,
	Daniel Wagner, David Woodhouse,
	jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel, Rafa

On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > > Yep everyone codes
> > > 
> > > 	write(disk_file, "foo", 3);
> > > 
> > > not while(..) blah around it.
> 
> In general I/O to tty devices and other character mode devices was
> where you definitely needed to check for EINTR/EAGAIN because that was
> the place where historically Unix systems would interrupt system calls
> --- e.g., a user typing control-Z, for example.
> 
> And in general writes to file systems and block devices in *general*
> were never interrupted by signals, although that was always a
> non-portable assumption.
> 
> So I've always subscribed to the "be liberal in what you accept,
> conservative in what you send" rule of thumb.  Which is to say, any
> programs *I* write I'll in general always check for EINTR/EAGAIN and
> check for partial writes, but in general, as a kernel program I try to
> adhere to the long-standing Unix trandition for disk based files.

OK so userspace should consider checking for EINTR/EAGAIN.  On the kernel front
though we we do *strive* to go by the old unix tradition -- there are
exceptions though, of course.

> This does beg the question about whether firmware devices are more
> like tty devices or block devices or files, though.

And here you raise the analogy to see if its *worthy* to break the old unix
tradition, the tty example is a case that we seem to accept as reasonable
for an exception, clearly.

This help, thanks!

So, "firmware devices" is a misnomer, the firmware API does direct FS lookup
and only if that fails and the distribution enabled the fallback mechanism will
it be used if the file was not found on /lib/firmware/ paths. So the syfs
interface we are evaluating here is this fallback mechanism. The "firmware
device" then is really more a user of the firmware API, and these do vary
widely. In fact it recent developments with the "driver data API" generalize
the interface given things outside our typical device drivers for what we know
as old "firmware" are looking to use the firmware API for looking for files
from userspace. This in fact already has happened, it was just subtle: we look
for default EEPROMs, configururation files, microcode updates, and soon we'll
want to replace the userspace wireless CRDA tool for the regulatory database
with a simple file lookup once we get firmware signing upstream. So -- more and
more the API is being used for general file lookups.

Which type of drivers request these files ? It all varies across the board,
and we have really early things like CPU microcode files (so we enable built-in
files), regular old firmware files for things like networking drivers,
subsystem configuration stuff (for the wireless regulatory database), and it
seems now *huge* (100s of MiBs I'm told) for remote-proc and FPGAs. The FPGA
use case is one use case where having the uploader use a while loop makes more
sense. Drivers may also want to have more fine control to the buffer where the
"driver data" gets stashed into, so the API request_firmware_into_buf() was
added for this purpose. The FPGA devices seem to want to use this breakdown
mechanism as well. Although the remote-proc folks seem to be relying on the
sysfs interface as a default mechanism.

So it would seem to make sense to support the loop thing since some
files these days *can* be really big. I think we can easily address
this by relying on the killable signal (SIGKILL), rather than capturing
SIGINT (CTRL-C).

One problem is we had supported SIGINT before, it was reported however we
never gave back to userspace the fact that a signal was captured, we always
returned EAGAIN, so usersace did not know WTF was going on. The reporter wanted
to detect this and wanted to get us to return the same value the wait call
passed to the firmware API, -ERESTARTSYS. Hence this patch, however since
-ERESTARTSYS is special and folks may just restart the syscall always when this
happens, one question is if we should return EINTR instead of today's EAGAIN.

> If before signals
> never caused them to return EINTR/EAGAIN, then it's probably best to
> not break backwards compatbility.

There was a time where did not have signal, and always just used to return
-ENOMEM on failure. After signals support was added we still were
returning -ENOMEM for a while. It was only later that we started capturing
the actual signal, however this was masked as EAGAIN, for no precise good
reason, I suppose just lack of review.

The patch in question wants us to return -ERESTARTSYS so that userspace
can restart the upload. But the issue was that the sysfs firmware loader
was killed by a signal as well, so this is why Dmitry noted that we can
just fix this issue by just making the wait killable with SIGKILL only.
The swait killable patch I supplied upon Dmitry's suggestion as a replacement
would do away with capturing SIGINT but allow SIGKILL then.

Since EAGAIN was always returned even if a signal was captured I'm
inclined to take the position we really never gave userspace a proper
clue about the signal. Additionally I cannot see why userspace would
rely on SIGINT working *only* but not *SIGKILL*.

The risk here I think is if userspace ever *did* rely on SIGINT for the sysfs
interface as userspace functional API. If this is not worth breaking
then I suppose we are stuck with the current wait. If we do that I'd
say we return EINTR, based on what I have read so far.

If doing away with SIGINT but keeping SIGKILL is rather sane then I'd go
with the approach of the new swait killable + returning -EINTR.

> That being said, note that you also have the option of using
> -ERESTARTNOINTR (always restart the system call, regardless of how the
> sighandle flags were set), and -ERESTARTNOHAND (restart the system
> call always if there was no signal handler and the process was not
> killed), in addition to -ERESTARTSYS.

We rely on swait, and swait right now only uses -ERESTARTSYS. Are
you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
or -ERESTARTNOHAND if we see fit for some future functionality / need ?

> So that might be another option
> that's fairly easy to implement or experiment with.

Thanks!

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                                           ` <20170607002237.GJ27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
@ 2017-06-07  4:56                                             ` Andy Lutomirski
  2017-06-07  6:25                                               ` Dmitry Torokhov
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2017-06-07  4:56 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Theodore Ts'o, Alan Cox, Linux FS Devel, Stephen Boyd, Li,
	Yi, Dmitry Torokhov, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages),
	Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API,
	Daniel Wagner, David Woodhouse,
	jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki

On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>
> We rely on swait, and swait right now only uses -ERESTARTSYS. Are
> you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
> or -ERESTARTNOHAND if we see fit for some future functionality / need ?

I think that has essentially nothing to do with swait.  User code does
some syscall.  That syscall triggers a firmware load.  The caller gets
a signal.  If you're going to let firmware load get interrupted, you
need to consider what the syscall is.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07  4:56                                             ` Andy Lutomirski
@ 2017-06-07  6:25                                               ` Dmitry Torokhov
  2017-06-07 12:25                                                 ` Alan Cox
  2017-06-09  1:14                                                 ` Andy Lutomirski
  0 siblings, 2 replies; 34+ messages in thread
From: Dmitry Torokhov @ 2017-06-07  6:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Theodore Ts'o, Alan Cox, Linux FS Devel,
	Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages),
	Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, atull

On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote:
> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> >
> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are
> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
> > or -ERESTARTNOHAND if we see fit for some future functionality / need ?
> 
> I think that has essentially nothing to do with swait.  User code does
> some syscall.  That syscall triggers a firmware load.  The caller gets
> a signal.  If you're going to let firmware load get interrupted, you
> need to consider what the syscall is.

I think it is way too complicated and I do not think driver writers will
stand a chance of implementing this correctly, given that often firmware
load might be triggered indirectly and by multitude of syscalls.

What's wrong with saying that the only way to interrupt firmware loading
is to kill the process? So ctrl-c will no longer interrupt it, but I do
not think that ease of aborting firmware update is primary goal here. I
consider simple is good here.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07  6:25                                               ` Dmitry Torokhov
@ 2017-06-07 12:25                                                 ` Alan Cox
  2017-06-07 17:15                                                   ` Luis R. Rodriguez
  2017-06-09  1:14                                                 ` Andy Lutomirski
  1 sibling, 1 reply; 34+ messages in thread
From: Alan Cox @ 2017-06-07 12:25 UTC (permalink / raw)
  To: Dmitry Torokhov, Andy Lutomirski
  Cc: Luis R. Rodriguez, Theodore Ts'o, Linux FS Devel,
	Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages),
	Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, atull, Moritz Fischer

> What's wrong with saying that the only way to interrupt firmware
> loading is to kill the process? So ctrl-c will no longer interrupt
> it, but I do not think that ease of aborting firmware update is
> primary goal here. I consider simple is good here.

Agreed 100%. The user process did not ask for firmware load, it asked
for an I/O operation. Semantically it should appear as if someone else
did the firmware load and it just had to wait for it to happen.

Alan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]             ` <CANh8QzwPb_+RKs5QVt7mdFk8h_rOMVS3j9m0OADgvzBtNqBBLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-05-26 19:40               ` Luis R. Rodriguez
@ 2017-06-07 17:08               ` Luis R. Rodriguez
  2017-06-07 17:54                 ` Martin Fuzzey
  1 sibling, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-07 17:08 UTC (permalink / raw)
  To: Fuzzey, Martin, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Alan Cox,
	Ted Ts'o, Andy Lutomirski, Dmitry Torokhov
  Cc: Luis R. Rodriguez, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds

On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
> 1) Android init calls write() on the sysfs file
> 2) The sysfs .store() callback registered by a driver is called
> 3) The driver calls request_firmware()
> 4) request_firmware() sends the firmware load request to userspace and
> calls wait_for_completion_interruptible()

Martin, just for completeness on documenting on the commit log of the next
swait proposed fix for this -- what signal did the process get from which you
note the child dies below ? Exactly what in Android sent this signal ?

> 5) A child dies and raises SIGCHLD
> 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07 12:25                                                 ` Alan Cox
@ 2017-06-07 17:15                                                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-07 17:15 UTC (permalink / raw)
  To: Alan Cox
  Cc: Dmitry Torokhov, Andy Lutomirski, Luis R. Rodriguez,
	Theodore Ts'o, Linux FS Devel, Stephen Boyd, Li, Yi,
	Peter Zijlstra, Jonathan Corbet, Eric W. Biederman,
	Michael Kerrisk (man-pages),
	Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel

On Wed, Jun 07, 2017 at 01:25:51PM +0100, Alan Cox wrote:
> > What's wrong with saying that the only way to interrupt firmware
> > loading is to kill the process? So ctrl-c will no longer interrupt
> > it, but I do not think that ease of aborting firmware update is
> > primary goal here. I consider simple is good here.
> 
> Agreed 100%. The user process did not ask for firmware load, it asked
> for an I/O operation. Semantically it should appear as if someone else
> did the firmware load and it just had to wait for it to happen.

Fine by me ! Will wrap up the patch for the new killable swait then.
I suppose noting it as a stable fix is worth it given the known issues
with for example Android killing loaders unexpectedly.

Unless I hear otherwise I'll also provide a follow up to return -EINTR instead
of -EAGAIN if swait returned -ERESTARTSYS, this way at least userspace could
tell a signal was definitely received. I *don't* think that follow up is
required for stable though.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07 17:08               ` Luis R. Rodriguez
@ 2017-06-07 17:54                 ` Martin Fuzzey
       [not found]                   ` <59383DDA.3040702-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Fuzzey @ 2017-06-07 17:54 UTC (permalink / raw)
  To: Luis R. Rodriguez, linux-fsdevel, Alan Cox, Ted Ts'o,
	Andy Lutomirski, Dmitry Torokhov
  Cc: Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo,
	Linus Torvalds, Kees Cook

On 07/06/17 19:08, Luis R. Rodriguez wrote:
> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>> 1) Android init calls write() on the sysfs file
>> 2) The sysfs .store() callback registered by a driver is called
>> 3) The driver calls request_firmware()
>> 4) request_firmware() sends the firmware load request to userspace and
>> calls wait_for_completion_interruptible()
> Martin, just for completeness on documenting on the commit log of the next
> swait proposed fix for this -- what signal did the process get from which you
> note the child dies below ? Exactly what in Android sent this signal ?

Android didn't send the signal, the kernel did (SIGCHLD).

Like this:

1) Android init (pid=1) fork()s (say pid=42) [this child process is 
totally unrelated to firmware loading]
2) Android init (pid=1) does a write() on a (driver custom) sysfs file 
which ends up calling request_firmware() kernel side
3) The firmware loading fallback mechanism is used, the request is sent 
to userspace and pid 1 waits in the kernel on wait_*
4) before firmware loading completes pid 42 dies (for any reason - in my 
case normal termination)
5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which 
causes -ERESTARTSYS to be returned from wait_*


Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                   ` <59383DDA.3040702-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
@ 2017-06-09  1:10                     ` Luis R. Rodriguez
  2017-06-09  1:57                       ` Luis R. Rodriguez
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09  1:10 UTC (permalink / raw)
  To: Martin Fuzzey
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Alan Cox, Ted Ts'o,
	Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach

On Wed, Jun 7, 2017 at 10:54 AM, Martin Fuzzey <mfuzzey-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org> wrote:
> On 07/06/17 19:08, Luis R. Rodriguez wrote:
>>
>> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>>>
>>> 1) Android init calls write() on the sysfs file
>>> 2) The sysfs .store() callback registered by a driver is called
>>> 3) The driver calls request_firmware()
>>> 4) request_firmware() sends the firmware load request to userspace and
>>> calls wait_for_completion_interruptible()
>>
>> Martin, just for completeness on documenting on the commit log of the next
>> swait proposed fix for this -- what signal did the process get from which
>> you
>> note the child dies below ? Exactly what in Android sent this signal ?
>
>
> Android didn't send the signal, the kernel did (SIGCHLD).
>
> Like this:
>
> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> unrelated to firmware loading]
> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> ends up calling request_firmware() kernel side
> 3) The firmware loading fallback mechanism is used, the request is sent to
> userspace and pid 1 waits in the kernel on wait_*
> 4) before firmware loading completes pid 42 dies (for any reason - in my
> case normal termination)

Interesting, could one interpretation here be that the process
successfully finishing + the signal being sent beats out the timing of
the firmware_class syfs code detecting that the write completed ?

> 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which
> causes -ERESTARTSYS to be returned from wait_*

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07  6:25                                               ` Dmitry Torokhov
  2017-06-07 12:25                                                 ` Alan Cox
@ 2017-06-09  1:14                                                 ` Andy Lutomirski
       [not found]                                                   ` <CALCETrXbHpkN9Pujj=U1VpAR9MTOyCAqCtL0=7-vb1EdpEwCMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2017-06-09  1:14 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Andy Lutomirski, Luis R. Rodriguez, Theodore Ts'o, Alan Cox,
	Linux FS Devel, Stephen Boyd, Li, Yi, Peter Zijlstra,
	Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages),
	Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki

On Tue, Jun 6, 2017 at 11:25 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote:
>> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>> >
>> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are
>> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
>> > or -ERESTARTNOHAND if we see fit for some future functionality / need ?
>>
>> I think that has essentially nothing to do with swait.  User code does
>> some syscall.  That syscall triggers a firmware load.  The caller gets
>> a signal.  If you're going to let firmware load get interrupted, you
>> need to consider what the syscall is.
>
> I think it is way too complicated and I do not think driver writers will
> stand a chance of implementing this correctly, given that often firmware
> load might be triggered indirectly and by multitude of syscalls.
>

That's what I meant, but I said it unclearly.  I meant that, if we're
going to start allowing interruption, we would need to audit all the
callers.  Ugh.

I suppose we could have request_firmware_interruptable(), but that
seems like it's barely worth it.

--Andy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                                                   ` <CALCETrXbHpkN9Pujj=U1VpAR9MTOyCAqCtL0=7-vb1EdpEwCMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-09  1:33                                                     ` Luis R. Rodriguez
       [not found]                                                       ` <CAB=NE6USSj0sBzJSFOyyRQu=0rQXdbHc2+GNk1fse+Y8H6TrgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09  1:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Torokhov, Theodore Ts'o, Alan Cox, Linux FS Devel,
	Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages),
	Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx

On Thu, Jun 8, 2017 at 6:14 PM, Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Tue, Jun 6, 2017 at 11:25 PM, Dmitry Torokhov
> <dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote:
>>> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>>> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>>> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>>> >
>>> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are
>>> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
>>> > or -ERESTARTNOHAND if we see fit for some future functionality / need ?
>>>
>>> I think that has essentially nothing to do with swait.  User code does
>>> some syscall.  That syscall triggers a firmware load.  The caller gets
>>> a signal.  If you're going to let firmware load get interrupted, you
>>> need to consider what the syscall is.
>>
>> I think it is way too complicated and I do not think driver writers will
>> stand a chance of implementing this correctly, given that often firmware
>> load might be triggered indirectly and by multitude of syscalls.
>>
>
> That's what I meant, but I said it unclearly.  I meant that, if we're
> going to start allowing interruption, we would need to audit all the
> callers.  Ugh.

There are actually two audits worth evaluating if what we've concluded
is fair game:

  a) firmware sync calls on interruptible paths
  b) use of swait / old interruptible waits on sysfs paths

As for a) we drove tons of code away from using sync, request
firmware, and on async firmware the return value is lost, we only can
currently know if a failure of some sort occurred. If that push to
async failed we also scared away tons of drivers to use
request_firmware() calls on init and later incorrectly on probe due to
serialized init + probe delay on boot. From what I recall my last
Coccinelle evil-monster-hunt, I did indeed find quite a bit of drivers
still still relying on sync firmware which ultimately revealed use on
a probe path. The signal however was only effective as of commit
0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
is interrupted") merged as of v4.0. Creating awareness of the issue
seems fair, but I don't think its worth a huge fly swatter.

I have no information to contribute for b) other than I was reluctant
to even consider it.

> I suppose we could have request_firmware_interruptable(), but that
> seems like it's barely worth it.

I don't think that's worth it, given the signal was effective only as
of v4.0, we already had a big push away from sync requests, and also
had the "don't use request firmware" on init scare which also
propagated to probe later.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-09  1:10                     ` Luis R. Rodriguez
@ 2017-06-09  1:57                       ` Luis R. Rodriguez
       [not found]                         ` <CAB=NE6UQZMmLvxTu7RcFHh3neAh+RFpTTFCSwJ8_EsmmtEq94Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09  1:57 UTC (permalink / raw)
  To: Martin Fuzzey
  Cc: Linux FS Devel, Alan Cox, Ted Ts'o, Andy Lutomirski,
	Dmitry Torokhov, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel

On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Wed, Jun 7, 2017 at 10:54 AM, Martin Fuzzey <mfuzzey@parkeon.com> wrote:
>> On 07/06/17 19:08, Luis R. Rodriguez wrote:
>>>
>>> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>>>>
>>>> 1) Android init calls write() on the sysfs file
>>>> 2) The sysfs .store() callback registered by a driver is called
>>>> 3) The driver calls request_firmware()
>>>> 4) request_firmware() sends the firmware load request to userspace and
>>>> calls wait_for_completion_interruptible()
>>>
>>> Martin, just for completeness on documenting on the commit log of the next
>>> swait proposed fix for this -- what signal did the process get from which
>>> you
>>> note the child dies below ? Exactly what in Android sent this signal ?
>>
>>
>> Android didn't send the signal, the kernel did (SIGCHLD).
>>
>> Like this:
>>
>> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
>> unrelated to firmware loading]
>> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
>> ends up calling request_firmware() kernel side
>> 3) The firmware loading fallback mechanism is used, the request is sent to
>> userspace and pid 1 waits in the kernel on wait_*
>> 4) before firmware loading completes pid 42 dies (for any reason - in my
>> case normal termination)

Martin just to be clear, by "normal case termination" do you mean
completing successfully ?? Ie the firmware actually did make it onto
the device ?

> Interesting, could one interpretation here be that the process
> successfully finishing + the signal being sent beats out the timing of
> the firmware_class syfs code detecting that the write completed ?
>
>> 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which
>> causes -ERESTARTSYS to be returned from wait_*
>
>   Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                         ` <CAB=NE6UQZMmLvxTu7RcFHh3neAh+RFpTTFCSwJ8_EsmmtEq94Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-09  7:40                           ` Martin Fuzzey
       [not found]                             ` <593A50FF.40604-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
  2017-06-09 22:55                             ` Luis R. Rodriguez
  0 siblings, 2 replies; 34+ messages in thread
From: Martin Fuzzey @ 2017-06-09  7:40 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux FS Devel, Alan Cox, Ted Ts'o, Andy Lutomirski,
	Dmitry Torokhov, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel

On 09/06/17 03:57, Luis R. Rodriguez wrote:
> On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>> Android didn't send the signal, the kernel did (SIGCHLD).
>>>
>>> Like this:
>>>
>>> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
>>> unrelated to firmware loading]
>>> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
>>> ends up calling request_firmware() kernel side
>>> 3) The firmware loading fallback mechanism is used, the request is sent to
>>> userspace and pid 1 waits in the kernel on wait_*
>>> 4) before firmware loading completes pid 42 dies (for any reason - in my
>>> case normal termination)
> Martin just to be clear, by "normal case termination" do you mean
> completing successfully ?? Ie the firmware actually did make it onto
> the device ?

The firmware did *not* make it onto the device since the 
request_firmware() call returned an error
(the code that would have transfered it to the device is only executed 
following a successful request_firmware)

The process that terminates normally is unrelated to firmware loading as 
I said above.

The only things that matter are:
- It is a child process of the process that calls request_firmware()
- It terminates *while* the the wait_ is still in progress


Here is a way of reproducing the problem using the test_firmware module 
(which I only just saw) on normal linux with no Android or custom driver


#!/bin/sh
set -e

# Make sure the system firmware loader doesn't get in the way
/etc/init.d/udev stop

modprobe test_firmware

DIR=/sys/devices/virtual/misc/test_firmware

echo 10 >/sys/class/firmware/timeout;
sleep 2 &
echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;



If run with the "sleep 2 &" it terminates after 2 seconds
If the sleep is commented it runs for the expected 10 seconds (the 
firmware loading timeout)

Since the sleep process is a child of the script process requesting a 
firmware load its death causes a SIGCHLD causing request_firmware() to 
abort prematurely.


Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                             ` <593A50FF.40604-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
@ 2017-06-09 21:12                               ` Luis R. Rodriguez
  0 siblings, 0 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09 21:12 UTC (permalink / raw)
  To: Martin Fuzzey
  Cc: Luis R. Rodriguez, Linux FS Devel, Alan Cox, Ted Ts'o,
	Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi,
	atull-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Moritz Fischer,
	Petr Mladek

On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote:
> On 09/06/17 03:57, Luis R. Rodriguez wrote:
> > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > > > Android didn't send the signal, the kernel did (SIGCHLD).
> > > > 
> > > > Like this:
> > > > 
> > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> > > > unrelated to firmware loading]
> > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> > > > ends up calling request_firmware() kernel side
> > > > 3) The firmware loading fallback mechanism is used, the request is sent to
> > > > userspace and pid 1 waits in the kernel on wait_*
> > > > 4) before firmware loading completes pid 42 dies (for any reason - in my
> > > > case normal termination)
> > Martin just to be clear, by "normal case termination" do you mean
> > completing successfully ?? Ie the firmware actually did make it onto
> > the device ?
> 
> The firmware did *not* make it onto the device since the request_firmware()
> call returned an error
> (the code that would have transfered it to the device is only executed
> following a successful request_firmware)
> 
> The process that terminates normally is unrelated to firmware loading as I
> said above.
> 
> The only things that matter are:
> - It is a child process of the process that calls request_firmware()
> - It terminates *while* the the wait_ is still in progress
> 
> 
> Here is a way of reproducing the problem using the test_firmware module
> (which I only just saw) on normal linux with no Android or custom driver
> 
> 
> #!/bin/sh
> set -e
> 
> # Make sure the system firmware loader doesn't get in the way
> /etc/init.d/udev stop
> 
> modprobe test_firmware
> 
> DIR=/sys/devices/virtual/misc/test_firmware
> 
> echo 10 >/sys/class/firmware/timeout;
> sleep 2 &
> echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;
> 
> 
> 
> If run with the "sleep 2 &" it terminates after 2 seconds
> If the sleep is commented it runs for the expected 10 seconds (the firmware
> loading timeout)
> 
> Since the sleep process is a child of the script process requesting a
> firmware load its death causes a SIGCHLD causing request_firmware() to abort
> prematurely.

Thanks this could mean we also *should* trigger a failure if init is issuing
modprobe on a series of drivers and one completes before another while
request_firmware() is called on init or probe on a subsequent driver. If true
I'm surprised this never was reported back when the fallback mechanism was
popular, I suppose it was not an issue given most firmware *was* present on
/lib/firmware/ and the direct filesystem lookup first step always found the
firmware first, so this would only be an issue for folks relying on the
fallback mechanism exclusively.

Will include a test case based on your above script. Thanks!

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                                                       ` <CAB=NE6USSj0sBzJSFOyyRQu=0rQXdbHc2+GNk1fse+Y8H6TrgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-09 21:29                                                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09 21:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Torokhov, Theodore Ts'o, Alan Cox, Linux FS Devel,
	Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages),
	Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt-d4N2ExZK1jaSe5ORCPIMD9BPR1lH4CV8,
	rafal-g1n6cQUeyibVItvQsEIGlw, Arend Van Spriel,
	Rafael J. Wysocki, Moritz Fischer

On Thu, Jun 8, 2017 at 6:33 PM, Luis R. Rodriguez <mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Jun 8, 2017 at 6:14 PM, Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> That's what I meant, but I said it unclearly.  I meant that, if we're
>> going to start allowing interruption, we would need to audit all the
>> callers.  Ugh.
>
> There are actually two audits worth evaluating if what we've concluded
> is fair game:
>
>   a) firmware sync calls on interruptible paths
>   b) use of swait / old interruptible waits on sysfs paths

And as I noted in the other thread -- another possible issue could be
any swait / interruptable wait on init or probe. Provided any child
completes and the kernel code for wait handler does abort, that
request would be terminated. This could for instance happen at bootup
as modules load and any child from the loader terminates.

We already have Coccinelle grammar to hunt for "though shall not
request firmware on init or probe", such SmPL grammar could be in turn
be repruposed to hunt for these types of conditions.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-09  7:40                           ` Martin Fuzzey
       [not found]                             ` <593A50FF.40604-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
@ 2017-06-09 22:55                             ` Luis R. Rodriguez
  1 sibling, 0 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09 22:55 UTC (permalink / raw)
  To: Martin Fuzzey
  Cc: Luis R. Rodriguez, Linux FS Devel, Alan Cox, Ted Ts'o,
	Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek

On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote:
> On 09/06/17 03:57, Luis R. Rodriguez wrote:
> > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > Android didn't send the signal, the kernel did (SIGCHLD).
> > > > 
> > > > Like this:
> > > > 
> > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> > > > unrelated to firmware loading]
> > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> > > > ends up calling request_firmware() kernel side
> > > > 3) The firmware loading fallback mechanism is used, the request is sent to
> > > > userspace and pid 1 waits in the kernel on wait_*
> > > > 4) before firmware loading completes pid 42 dies (for any reason - in my
> > > > case normal termination)
> > Martin just to be clear, by "normal case termination" do you mean
> > completing successfully ?? Ie the firmware actually did make it onto
> > the device ?
> 
> The firmware did *not* make it onto the device since the request_firmware()
> call returned an error
> (the code that would have transfered it to the device is only executed
> following a successful request_firmware)
> 
> The process that terminates normally is unrelated to firmware loading as I
> said above.
> 
> The only things that matter are:
> - It is a child process of the process that calls request_firmware()
> - It terminates *while* the the wait_ is still in progress
> 
> 
> Here is a way of reproducing the problem using the test_firmware module
> (which I only just saw) on normal linux with no Android or custom driver
> 
> 
> #!/bin/sh
> set -e
> 
> # Make sure the system firmware loader doesn't get in the way
> /etc/init.d/udev stop
> 
> modprobe test_firmware
> 
> DIR=/sys/devices/virtual/misc/test_firmware
> 
> echo 10 >/sys/class/firmware/timeout;
> sleep 2 &
> echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;
> 
> 
> 
> If run with the "sleep 2 &" it terminates after 2 seconds
> If the sleep is commented it runs for the expected 10 seconds (the firmware
> loading timeout)
> 
> Since the sleep process is a child of the script process requesting a
> firmware load its death causes a SIGCHLD causing request_firmware() to abort
> prematurely.

BTW I've run some more tests and if you use request_firmware_nowait() the above
issue also is avoidable. Also, if you use the custom fallback (uevent parameter
is false) the timeout is max value always. The only way to kill the custom
requests is through the echo -1 > loading or the max timeout triggers (don't
rely on that!). The custom interface was a bad idea though so best to just
avoid it.

Anyway I can reproduce in my tests now, will add this testcase as well.
The stable fixes will be sent as well.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-06-09 22:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170524205658.GK8951@wotan.suse.de>
     [not found] ` <20170524214027.7775-1-mcgrof@kernel.org>
     [not found]   ` <CALCETrXUrirO-vg3M+MGhn=0gZTwx0phsRDS4TCwWWgNYC6RsA@mail.gmail.com>
2017-05-24 22:38     ` [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback Luis R. Rodriguez
2017-05-25  4:13       ` Andy Lutomirski
     [not found]         ` <CALCETrU4__YUGk36PN=FbuEf0SBaTrxQQqm4sWs2NrZ+6WN7jA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-25  8:28           ` Fuzzey, Martin
2017-05-26 11:09             ` Eric W. Biederman
     [not found]               ` <87fufr3mdy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-26 19:46                 ` Luis R. Rodriguez
2017-05-26 21:26                   ` Dmitry Torokhov
     [not found]                     ` <CAKdAkRTrcTVOAP5GK-R=Au_tL5WqSn5UkQEzNe5NcCWXS8mbtA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-26 21:32                       ` Luis R. Rodriguez
2017-05-26 21:55                         ` Dmitry Torokhov
2017-06-05 20:24                           ` Luis R. Rodriguez
     [not found]                             ` <20170605202410.GQ8951-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2017-06-06  9:04                               ` Martin Fuzzey
     [not found]                                 ` <59367025.3020901-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
2017-06-06 16:34                                   ` Luis R. Rodriguez
     [not found]                                     ` <20170606163401.GA27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2017-06-06 17:52                                       ` Luis R. Rodriguez
2017-06-06 14:53                               ` Alan Cox
     [not found]                                 ` <1496760796.5682.48.camel-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-06-06 16:47                                   ` Luis R. Rodriguez
     [not found]                                     ` <20170606164734.GB27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2017-06-06 17:54                                       ` Luis R. Rodriguez
2017-06-06 22:11                                     ` Theodore Ts'o
     [not found]                                       ` <20170606221151.ygoxqkwhhjsqw632-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2017-06-07  0:22                                         ` Luis R. Rodriguez
     [not found]                                           ` <20170607002237.GJ27288-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2017-06-07  4:56                                             ` Andy Lutomirski
2017-06-07  6:25                                               ` Dmitry Torokhov
2017-06-07 12:25                                                 ` Alan Cox
2017-06-07 17:15                                                   ` Luis R. Rodriguez
2017-06-09  1:14                                                 ` Andy Lutomirski
     [not found]                                                   ` <CALCETrXbHpkN9Pujj=U1VpAR9MTOyCAqCtL0=7-vb1EdpEwCMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-09  1:33                                                     ` Luis R. Rodriguez
     [not found]                                                       ` <CAB=NE6USSj0sBzJSFOyyRQu=0rQXdbHc2+GNk1fse+Y8H6TrgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-09 21:29                                                         ` Luis R. Rodriguez
     [not found]             ` <CANh8QzwPb_+RKs5QVt7mdFk8h_rOMVS3j9m0OADgvzBtNqBBLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-26 19:40               ` Luis R. Rodriguez
     [not found]                 ` <20170526194001.GR8951-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2017-05-26 20:23                   ` Fuzzey, Martin
     [not found]                     ` <CANh8QzyqQ5hubWJvWYxgoQ3baL6sgoQPSzEHMY0tu8WNGS2gZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-26 20:52                       ` Luis R. Rodriguez
2017-06-07 17:08               ` Luis R. Rodriguez
2017-06-07 17:54                 ` Martin Fuzzey
     [not found]                   ` <59383DDA.3040702-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
2017-06-09  1:10                     ` Luis R. Rodriguez
2017-06-09  1:57                       ` Luis R. Rodriguez
     [not found]                         ` <CAB=NE6UQZMmLvxTu7RcFHh3neAh+RFpTTFCSwJ8_EsmmtEq94Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-09  7:40                           ` Martin Fuzzey
     [not found]                             ` <593A50FF.40604-mB3Nsq4MPf1BDgjK7y7TUQ@public.gmane.org>
2017-06-09 21:12                               ` Luis R. Rodriguez
2017-06-09 22:55                             ` Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).