From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1032878AbdEXU5C (ORCPT ); Wed, 24 May 2017 16:57:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:48764 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753857AbdEXU5A (ORCPT ); Wed, 24 May 2017 16:57:00 -0400 Date: Wed, 24 May 2017 22:56:58 +0200 From: "Luis R. Rodriguez" To: "Luis R. Rodriguez" , jewalt@lgsinnovations.com Cc: Martin Fuzzey , Greg Kroah-Hartman , Shuah Khan , wagi@monom.org, yi1.li@linux.intel.com, takahiro.akashi@linaro.org, bjorn.andersson@linaro.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] firmware: request_firmware() should propagate -ERESTARTSYS Message-ID: <20170524205658.GK8951@wotan.suse.de> References: <20170523131607.28138.12306.stgit@localhost> <20170523133112.GA5059@kroah.com> <59244811.7080401@parkeon.com> <20170523195533.GR8951@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170523195533.GR8951@wotan.suse.de> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 23, 2017 at 09:55:33PM +0200, Luis R. Rodriguez wrote: > On Tue, May 23, 2017 at 04:32:49PM +0200, Martin Fuzzey wrote: > > On 23/05/17 15:31, Greg Kroah-Hartman wrote: > > > On Tue, May 23, 2017 at 03:16:07PM +0200, Martin Fuzzey wrote: > > > > When -ERESTARTSYS is returned by wait_* due to a signal this should > > > > be returned from request_firmware() so that the syscall may be > > > > restarted if necessary. > > > > > > > > > > > > Nice find, should this go to the stable kernels as well? > > > > Yes I think it should. > > Thanks for the patch ! > > Just a bit of more nose diving validating this. > > We actually used to send -ENOMEM for a long time always, and now you are > special-casing to allow -ERESTARTSYS -- we we must ask ourselves -- why > not other errors ? > > Let us consider what is upstream only please and focus on stable later. > > We use: > > static int __fw_state_wait_common(struct fw_state *fw_st, long timeout) > { > long ret; > > ret = swait_event_interruptible_timeout(fw_st->wq, > __fw_state_is_done(READ_ONCE(fw_st->status)), > timeout); > if (ret != 0 && fw_st->status == FW_STATUS_ABORTED) > return -ENOENT; > if (!ret) > return -ETIMEDOUT; > > return ret < 0 ? ret : 0; > } > > What can swait_event_interruptible_timeout() return ? It can return the value > of a timeout or whatever __swait_event_interruptible_timeout() returns. > __swait_event_interruptible_timeout() in turn uses ___swait_event() as > follows: > > #define __swait_event_interruptible_timeout(wq, condition, timeout) \ > ___swait_event(wq, ___wait_cond_timeout(condition), \ > TASK_INTERRUPTIBLE, timeout, \ > __ret = schedule_timeout(__ret)) > > So this ultimately use ___swait_event(): > > /* as per ___wait_event() but for swait, therefore "exclusive == 0" */ > #define ___swait_event(wq, condition, state, ret, cmd) \ > ({ \ > struct swait_queue __wait; \ > long __ret = ret; \ > \ > INIT_LIST_HEAD(&__wait.task_list); \ > for (;;) { \ > long __int = prepare_to_swait_event(&wq, &__wait, state);\ > \ > if (condition) \ > break; \ > \ > if (___wait_is_interruptible(state) && __int) { \ > __ret = __int; \ > break; \ > } \ > \ > cmd; \ > } \ > finish_swait(&wq, &__wait); \ > __ret; \ > }) > > And prepare_to_swait_event() can return -ERESTARTSYS on signal_pending_state() > otherwise it returns 0 ! So indeed -ERESTARTSYS is possible. > > But what about other errors ? Considering the above it would seem then we > actually can only get -ERESTARTSYS or whatever schedule_timeout() returns. > schedule_timeout() is documented indicating it returns 0 when the timer has > expired otherwise the remaining time in jiffies will be returned. It further > clarifies that the return value is guaranteed to be be non-negative. And > ___wait_is_interruptible() is: > > #define ___wait_is_interruptible(state) \ > (!__builtin_constant_p(state) || \ > state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \ > > So this piece of code: > \ > if (___wait_is_interruptible(state) && __int) { \ > __ret = __int; \ > break; \ > } \ > > Since ___wait_is_interruptible() is always true for us since we use > __swait_event_interruptible_timeout() the -ERESTARTSYS will always be sent if a > signal was sent. > > In all this light then the patch is correct for upstream however let's consider > stable now. At first glance this seems like a fix for an old patch, > respectively commit 0542ad88fbdd81bb ("firmware loader: Fix > _request_firmware_load() return val for fw load abort" by Shuah Khan which was > merged since v3.17, *but* back then just used wait_for_completion() and ignored > any signals here, they were just not part of our semantics. But its important > to note then we always returned -ENOMEM before that patch and then at least > returned -EAGAIN in other cases. The next thing to note is commit > 5d47ec02c37ea632398c ("firmware: Correct handling of fw_state_wait() return > value") by Bjorn Andersson. This took place *after* the swait changes. Bjorn > fixed an issue but also forgot to address the special case of -ERESTARTSYS, > given right below fw_state_wait_timeout() on _request_firmware_load() the > return value would be lost. It would seem Bjorn assumed the return value would > be propagated but did not notice the error special casing below which would > loose it. > > So before and after Bjorn's changes we were still *trying* to propagate the > -ERESTARTSYS error but it was still lost. > > The -ERESTARTSYS from signals was still something we were capturing even prior > to the swait changes, see the kernel commit prior to the swait changes (0430cafcc4fb > "firmware: drop bit ops in favor of simple state machine"), if you git blame there, > will find a series of commits with the -ERESTARTSYS handled... I can trace back > to commit 68ff2a00dbf ("firmware_loader: handle timeout via > wait_for_completion_interruptible_timeout()") as having the -ERESTARTSYS check but > it had lost that error on _request_firmware_load() due to the : > > if (retval == -ERESTARTSYS || !retval) { > mutex_lock(&fw_lock); > fw_load_abort(fw_priv); > mutex_unlock(&fw_lock); > } > > if (is_fw_load_aborted(buf)) > retval = -EAGAIN; > else if (!buf->data) > retval = -ENOMEM; > > As noted earlier the above piece of code lost the error because of > 0542ad88fbdd81bb ("firmware loader: Fix _request_firmware_load() return val for > fw load abort" which was merged since v3.17, but back then it *did not* > propagate the error. So it would be incorrect to say that your patch fixes > commit 0542ad88fbdd81bb by Shuah Khan... We'd have to ask ourselves when > such an error actually became relevant. > > It would seem the -ERESTARTSYS signal took effect first via commit 0cb64249ca500 > ("firmware_loader: abort request if wait_for_completion is interrupted") added > upstream via v4.0 and since it was introduced the error code was lost given the > -EAGAIN overwrite added *earlier* by Shuah Khan when such error codes were not > even relevant. So prior to v4.0 were we not even aborting due to signals. > > As such this as far as I can tell this is a fix for a fix for this commit. > > So we should use: > > Fixes: 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion is interrupted") > Cc: stable@vger.kernel.org # 4.0 > > Also a more reflective subject and commit log would be appreciated, you can add > my Acked-by given I have also now tested it with all the test drivers and > test scripts: > > ========================================================================== > firmware: fix sending -ERESTARTSYS due to signal on fallback > > Commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion > is interrupted") added via 4.0 added support to abort the fallback mechanism > when a signal was detected and wait_for_completion_interruptible() returned > -ERESTARTSYS. Although the abort was effective we were unfortunately never > really propagating this error though and as such userspace could not know > why the abort happened. > > The error code was always being lost to an even older change, commit > 0542ad88fbdd81bb ("firmware loader: Fix _request_firmware_load() return val > for fw load abort") by Shuah Khan which was merged since v3.17. Back then > though we never were capturing these signals or bailing on a signal. After > this change though only only -EAGAIN was being relayed back to userspace > on non-memory errors including signals trying to interrupt our fallback > process. > > It only makes sense to fix capturing -ERESTARTSYS since we were capturing > the error but when it was actually effective, since commit 0cb64249ca500 > ("firmware_loader: abort request if wait_for_completion is interrupted"). > > Only distributions relying on the fallback mechanism are impacted. An > example issue is on Android, when request_firmware() is called through > the firmware fallback mechanism -- syfs write call, sysfs .store(), and > Android sends a SIGCHLD to fail the write call -- in such cases the fact > that we failed due to a signal is lost. > > Fix this and ensure we propagate -ERESTARTSYS so that handlers can whether > or not to restart write calls. > > Signed-off-by: Martin Fuzzey > Cc: stable@vger.kernel.org # 4.0 > Acked-by: Luis R. Rodriguez > ========================================================================== > > > I have already applied a similar patch to my 4.4 tree. > > You might also be intersted in commit 2e700f8d85975 ("firmware: fix usermode > helper fallback loading". > > > The exact same patch won't apply since the code has changed a bit since. > > > > So I was planning on sending for -stable-4.4 once it's in mainline. > > Appreciated, after this is merged of course. I'll just send this with the commit log change myself. Luis