From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751115AbdAQVjp (ORCPT ); Tue, 17 Jan 2017 16:39:45 -0500 Received: from mx2.suse.de ([195.135.220.15]:51941 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750863AbdAQVjn (ORCPT ); Tue, 17 Jan 2017 16:39:43 -0500 Date: Tue, 17 Jan 2017 21:53:27 +0100 From: "Luis R. Rodriguez" To: Jakub Kicinski Cc: "Luis R. Rodriguez" , Chris Wilson , linux-kernel-dev@beckhoff.com, Greg Kroah-Hartman , Bjorn Andersson , Daniel Wagner , Ming Lei , "linux-kernel@vger.kernel.org" , oss-drivers@netronome.com Subject: Re: [PATCHv2] firmware: Correct handling of fw_state_wait_timeout() return value Message-ID: <20170117205327.GF13946@wotan.suse.de> References: <20170117153505.20308-1-jakub.kicinski@netronome.com> <20170117161512.GC13946@wotan.suse.de> <20170117173041.GE13946@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 17, 2017 at 10:04:20AM -0800, Jakub Kicinski wrote: > On Tue, Jan 17, 2017 at 9:30 AM, Luis R. Rodriguez wrote: > > On Tue, Jan 17, 2017 at 08:30:37AM -0800, Jakub Kicinski wrote: > >> On Tue, Jan 17, 2017 at 8:21 AM, Luis R. Rodriguez wrote: > >> >>> > >> >>> retval = fw_state_wait_timeout(&buf->fw_st, timeout); > >> >>> - if (retval < 0) { > >> >>> + if (retval == -ETIMEDOUT || retval == -ERESTARTSYS) { > >> >>> mutex_lock(&fw_lock); > >> >>> fw_load_abort(fw_priv); > >> >>> mutex_unlock(&fw_lock); > >> >> > >> >> This is a bit messy, two other similar issues were reported before > >> >> and upon review I suggested Patrick Bruenn's fix with a better commit > >> >> log seems best fit. Patrick sent a patch Jan 4, 2017 but never followed up > >> >> despite my feedback on a small change on the commit log message [0]. Can you > >> >> try that and if that fixes it can you adjust the commit log accordingly? Please > >> >> note the preferred solution would be: > >> >> > >> >> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > >> >> index b9ac348e8d33..c530f8b4af01 100644 > >> >> --- a/drivers/base/firmware_class.c > >> >> +++ b/drivers/base/firmware_class.c > >> >> @@ -542,6 +542,8 @@ static struct firmware_priv *to_firmware_priv(struct device *dev) > >> >> > >> >> static void __fw_load_abort(struct firmware_buf *buf) > >> >> { > >> >> + if (!buf) > >> >> + return; > >> > >> Allow me to try to persuade you one last time :) My patch makes the > >> code more logical and easier to follow. The code says: > >> in case no wake up happened - finish the wait (otherwise the waking > >> thread finishes it). > > > > Your patch is still wrong, as Patrick great commit log notes a null defer > > can also happen on a race with a case of -1 being sent and a -ENOENT error, > > so we'd have to adjust for when __fw_state_wait_common() returns also > > -ENOENT. > > Sorry, I don't follow. _Not_ calling abort on -ENOENT error is > exactly what my patch does. Yeah I see now what you mean. Your approach avoids the buf issue as well. Its still not addressing the real issue though, which is the chicken sloppy use of a status on the buf, which at one point gets set to NULL. This later practice makes it rather hard to make it correct to use a stateful check properly. > >> Adding a NULL-check would just paper over the > >> issue and can cause trouble down the line. > > > > We typically bail on errors and use similar code to bail out, and we > > typically do these things. Here its no different. The *real* issue > > is the fact that we have a waiting timeout which can fail race against > > a user imposed error out on the sysfs interface. There is one catch: > > > > We already lock with the big fw_lock and use this to be able to check > > for the status of the fw, so once aborted we technically should not have > > to abort again. A proper way to address then this would have been to check > > for the status of the fw prior to aborting again given we also lock on the > > big fw_lock. A problem with this though is the status is part of the buf > > which is set to NULL after we are done aborting. > > Yes, I've seen that too :\ This race seems to have been there prior > to 4.9, though. I guess we could fix both issues with the NULL-check > although I would prefer if we had both patches. > > FWIW I think the NULL-check could be put in the existing conditional: > > * There is a small window in which user can write to 'loading' > * between loading done and disappearance of 'loading' > */ > - if (fw_state_is_done(&buf->fw_st)) > + if (!buf || fw_state_is_done(&buf->fw_st)) > return; > > list_del_init(&buf->pending_list); > > Note that the comment above seems to be mentioning the race we're > trying to solve. Right, I think another approach is to *enable* the state of the buf to be used to avoid further use on the sysfs iterface instead. Fortunately other sysfs interfaces already use fw_state_is_done() to bail out, so all that would be needed I think would be: diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index b9ac348e8d33..30ccf7aea3ca 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -558,9 +558,6 @@ static void fw_load_abort(struct firmware_priv *fw_priv) struct firmware_buf *buf = fw_priv->buf; __fw_load_abort(buf); - - /* avoid user action after loading abort */ - fw_priv->buf = NULL; } static LIST_HEAD(pending_fw_head); @@ -713,7 +710,7 @@ static ssize_t firmware_loading_store(struct device *dev, mutex_lock(&fw_lock); fw_buf = fw_priv->buf; - if (!fw_buf) + if (!fw_buf || fw_state_is_aborted(&fw_buf->fw_st)) goto out; switch (loading) {