From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1032878AbdEXU5C (ORCPT <rfc822;w@1wt.eu>);
        Wed, 24 May 2017 16:57:02 -0400
Received: from mx2.suse.de ([195.135.220.15]:48764 "EHLO mx1.suse.de"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1753857AbdEXU5A (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 24 May 2017 16:57:00 -0400
Date: Wed, 24 May 2017 22:56:58 +0200
From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: "Luis R. Rodriguez" <mcgrof@kernel.org>, jewalt@lgsinnovations.com
Cc: Martin Fuzzey <mfuzzey@parkeon.com>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Shuah Khan <shuah.kh@samsung.com>, wagi@monom.org,
        yi1.li@linux.intel.com, takahiro.akashi@linaro.org,
        bjorn.andersson@linaro.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] firmware: request_firmware() should propagate
 -ERESTARTSYS
Message-ID: <20170524205658.GK8951@wotan.suse.de>
References: <20170523131607.28138.12306.stgit@localhost>
 <20170523133112.GA5059@kroah.com>
 <59244811.7080401@parkeon.com>
 <20170523195533.GR8951@wotan.suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170523195533.GR8951@wotan.suse.de>
User-Agent: Mutt/1.6.0 (2016-04-01)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, May 23, 2017 at 09:55:33PM +0200, Luis R. Rodriguez wrote:
> On Tue, May 23, 2017 at 04:32:49PM +0200, Martin Fuzzey wrote:
> > On 23/05/17 15:31, Greg Kroah-Hartman wrote:
> > > On Tue, May 23, 2017 at 03:16:07PM +0200, Martin Fuzzey wrote:
> > > > When -ERESTARTSYS is returned by wait_* due to a signal this should
> > > > be returned from request_firmware() so that the syscall may be
> > > > restarted if necessary.
> > > > 
> > > > 
> > > > Nice find, should this go to the stable kernels as well?
> > 
> > Yes I think it should.
> 
> Thanks for the patch !
> 
> Just a bit of more nose diving validating this. 
> 
> We actually used to send -ENOMEM for a long time always, and now you are
> special-casing to allow -ERESTARTSYS -- we we must ask ourselves -- why
> not other errors ?
> 
> Let us consider what is upstream only please and focus on stable later.
> 
> We use:
> 
> static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)         
> {                                                                               
>         long ret;                                                               
>                                                                                 
>         ret = swait_event_interruptible_timeout(fw_st->wq,                      
>                                 __fw_state_is_done(READ_ONCE(fw_st->status)),   
>                                 timeout);                                       
>         if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)                     
>                 return -ENOENT;                                                 
>         if (!ret)                                                               
>                 return -ETIMEDOUT;                                              
>                                                                                 
>         return ret < 0 ? ret : 0;                                               
> }
> 
> What can swait_event_interruptible_timeout() return ? It can return the value
> of a timeout or whatever __swait_event_interruptible_timeout() returns.
> __swait_event_interruptible_timeout() in turn uses ___swait_event() as
> follows:
> 
> #define __swait_event_interruptible_timeout(wq, condition, timeout)     \
>         ___swait_event(wq, ___wait_cond_timeout(condition),             \
>                       TASK_INTERRUPTIBLE, timeout,                      \
>                       __ret = schedule_timeout(__ret))
> 
> So this ultimately use ___swait_event():
> 
> /* as per ___wait_event() but for swait, therefore "exclusive == 0" */          
> #define ___swait_event(wq, condition, state, ret, cmd)                  \       
> ({                                                                      \       
>         struct swait_queue __wait;                                      \       
>         long __ret = ret;                                               \       
>                                                                         \       
>         INIT_LIST_HEAD(&__wait.task_list);                              \       
>         for (;;) {                                                      \       
>                 long __int = prepare_to_swait_event(&wq, &__wait, state);\      
>                                                                         \       
>                 if (condition)                                          \       
>                         break;                                          \       
>                                                                         \       
>                 if (___wait_is_interruptible(state) && __int) {         \       
>                         __ret = __int;                                  \       
>                         break;                                          \       
>                 }                                                       \       
>                                                                         \       
>                 cmd;                                                    \       
>         }                                                               \       
>         finish_swait(&wq, &__wait);                                     \       
>         __ret;                                                          \       
> }) 
> 
> And prepare_to_swait_event() can return -ERESTARTSYS on signal_pending_state()
> otherwise it returns 0 ! So indeed -ERESTARTSYS is possible.
> 
> But what about other errors ? Considering the above it would seem then we
> actually can only get -ERESTARTSYS or whatever schedule_timeout() returns.
> schedule_timeout() is documented indicating it returns 0 when the timer has
> expired otherwise the remaining time in jiffies will be returned.  It further
> clarifies that the return value is guaranteed to be be non-negative.  And
> ___wait_is_interruptible() is:
> 
> #define ___wait_is_interruptible(state)                                 \       
>         (!__builtin_constant_p(state) ||                                \       
>                 state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE)  \
> 
> So this piece of code:
>                                                                         \       
>                 if (___wait_is_interruptible(state) && __int) {         \       
>                         __ret = __int;                                  \       
>                         break;                                          \       
>                 }                                                       \       
> 
> Since ___wait_is_interruptible() is always true for us since we use
> __swait_event_interruptible_timeout() the -ERESTARTSYS will always be sent if a
> signal was sent.
> 
> In all this light then the patch is correct for upstream however let's consider
> stable now. At first glance this seems like a fix for an old patch,
> respectively commit 0542ad88fbdd81bb ("firmware loader: Fix
> _request_firmware_load() return val for fw load abort" by Shuah Khan which was
> merged since v3.17, *but* back then just used wait_for_completion() and ignored
> any signals here, they were just not part of our semantics. But its important
> to note then we always returned -ENOMEM before that patch and then at least
> returned -EAGAIN in other cases. The next thing to note is commit
> 5d47ec02c37ea632398c ("firmware: Correct handling of fw_state_wait() return
> value") by Bjorn Andersson. This took place *after* the swait changes.  Bjorn
> fixed an issue but also forgot to address the special case of -ERESTARTSYS,
> given right below fw_state_wait_timeout() on _request_firmware_load() the
> return value would be lost. It would seem Bjorn assumed the return value would
> be propagated but did not notice the error special casing below which would
> loose it.
> 
> So before and after Bjorn's changes we were still *trying* to propagate the
> -ERESTARTSYS error but it was still lost.
> 
> The -ERESTARTSYS from signals was still something we were capturing even prior
> to the swait changes, see the kernel commit prior to the swait changes (0430cafcc4fb
> "firmware: drop bit ops in favor of simple state machine"), if you git blame there,
> will find a series of commits with the -ERESTARTSYS handled... I can trace back
> to commit 68ff2a00dbf ("firmware_loader: handle timeout via
> wait_for_completion_interruptible_timeout()") as having the -ERESTARTSYS check but
> it had lost that error on _request_firmware_load() due to the :
> 
>         if (retval == -ERESTARTSYS || !retval) {
>                 mutex_lock(&fw_lock);
>                 fw_load_abort(fw_priv);
>                 mutex_unlock(&fw_lock);
>         }
>                                                                                 
>         if (is_fw_load_aborted(buf))
>                 retval = -EAGAIN;
>         else if (!buf->data)
>                 retval = -ENOMEM;
> 
> As noted earlier the above piece of code lost the error because of
> 0542ad88fbdd81bb ("firmware loader: Fix _request_firmware_load() return val for
> fw load abort" which was merged since v3.17, but back then it *did not*
> propagate the error. So it would be incorrect to say that your patch fixes
> commit 0542ad88fbdd81bb by Shuah Khan... We'd have to ask ourselves when
> such an error actually became relevant.
> 
> It would seem the -ERESTARTSYS signal took effect first via commit 0cb64249ca500
> ("firmware_loader: abort request if wait_for_completion is interrupted") added
> upstream via v4.0 and since it was introduced the error code was lost given the
> -EAGAIN overwrite added *earlier* by Shuah Khan when such error codes were not
> even relevant. So prior to v4.0 were we not even aborting due to signals.
> 
> As such this as far as I can tell this is a fix for a fix for this commit.
> 
> So we should use:
> 
> Fixes: 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion is interrupted")
> Cc: stable@vger.kernel.org # 4.0
> 
> Also a more reflective subject and commit log would be appreciated, you can add
> my Acked-by given I have also now tested it with all the test drivers and
> test scripts:
> 
> ==========================================================================
> firmware: fix sending -ERESTARTSYS due to signal on fallback
> 
> Commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
> is interrupted") added via 4.0 added support to abort the fallback mechanism
> when a signal was detected and wait_for_completion_interruptible() returned
> -ERESTARTSYS. Although the abort was effective we were unfortunately never
> really propagating this error though and as such userspace could not know
> why the abort happened.
> 
> The error code was always being lost to an even older change, commit
> 0542ad88fbdd81bb ("firmware loader: Fix _request_firmware_load() return val
> for fw load abort") by Shuah Khan which was merged since v3.17. Back then
> though we never were capturing these signals or bailing on a signal. After
> this change though only only -EAGAIN was being relayed back to userspace
> on non-memory errors including signals trying to interrupt our fallback
> process.
> 
> It only makes sense to fix capturing -ERESTARTSYS since we were capturing
> the error but when it was actually effective, since commit 0cb64249ca500
> ("firmware_loader: abort request if wait_for_completion  is interrupted").
> 
> Only distributions relying on the fallback mechanism are impacted. An
> example issue is on Android, when request_firmware() is called through
> the firmware fallback mechanism -- syfs write call, sysfs .store(), and
> Android sends a SIGCHLD to fail the write call -- in such cases the fact
> that we failed due to a signal is lost.
> 
> Fix this and ensure we propagate -ERESTARTSYS so that handlers can whether
> or not to restart write calls.
> 
> Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com>
> Cc: stable@vger.kernel.org # 4.0
> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ==========================================================================
> 
> > I have already applied a similar patch to my 4.4 tree.
> 
> You might also be intersted in commit 2e700f8d85975 ("firmware: fix usermode
> helper fallback loading".
> 
> > The exact same patch won't apply since the code has changed a bit since.
> > 
> > So I was planning on sending for -stable-4.4 once it's in mainline.
> 
> Appreciated, after this is merged of course.

I'll just send this with the commit log change myself.

  Luis