From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751171AbdAQVSB (ORCPT <rfc822;w@1wt.eu>);
        Tue, 17 Jan 2017 16:18:01 -0500
Received: from mail-qt0-f171.google.com ([209.85.216.171]:35727 "EHLO
        mail-qt0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751042AbdAQVR7 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 17 Jan 2017 16:17:59 -0500
MIME-Version: 1.0
In-Reply-To: <20170117205327.GF13946@wotan.suse.de>
References: <20170117153505.20308-1-jakub.kicinski@netronome.com>
 <20170117161512.GC13946@wotan.suse.de> <CAB=NE6Xj0TpwMVTDWtEaYAqSn8HdVapXqUf7j4a+i2f+zdkSZA@mail.gmail.com>
 <CAJpBn1wkUzNxQGy+d1Lq_7UCsgjvM65E+=cNZcP7NBSMyS157g@mail.gmail.com>
 <20170117173041.GE13946@wotan.suse.de> <CAJpBn1zg7AX9v93dtMpQyvip9zwUk+aAKU8U6bAaYP7gu-+bdA@mail.gmail.com>
 <20170117205327.GF13946@wotan.suse.de>
From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue, 17 Jan 2017 13:17:58 -0800
Message-ID: <CAJpBn1yngh7hgRN0FKPY=Qgk3s85dUL1Xpjb9ud8_YB8pbL2PA@mail.gmail.com>
Subject: Re: [PATCHv2] firmware: Correct handling of fw_state_wait_timeout()
 return value
To: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>, linux-kernel-dev@beckhoff.com,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Bjorn Andersson <bjorn.andersson@linaro.org>,
        Daniel Wagner <daniel.wagner@bmw-carit.de>,
        Ming Lei <ming.lei@canonical.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        oss-drivers@netronome.com
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Jan 17, 2017 at 12:53 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, Jan 17, 2017 at 10:04:20AM -0800, Jakub Kicinski wrote:
>> On Tue, Jan 17, 2017 at 9:30 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > On Tue, Jan 17, 2017 at 08:30:37AM -0800, Jakub Kicinski wrote:
>> >> Adding a NULL-check would just paper over the
>> >> issue and can cause trouble down the line.
>> >
>> > We typically bail on errors and use similar code to bail out, and we
>> > typically do these things. Here its no different. The *real* issue
>> > is the fact that we have a waiting timeout which can fail race against
>> > a user imposed error out on the sysfs interface. There is one catch:
>> >
>> > We already lock with the big fw_lock and use this to be able to check
>> > for the status of the fw, so once aborted we technically should not have
>> > to abort again. A proper way to address then this would have been to check
>> > for the status of the fw prior to aborting again given we also lock on the
>> > big fw_lock. A problem with this though is the status is part of the buf
>> > which is set to NULL after we are done aborting.
>>
>> Yes, I've seen that too :\  This race seems to have been there prior
>> to 4.9, though.  I guess we could fix both issues with the NULL-check
>> although I would prefer if we had both patches.
>>
>> FWIW I think the NULL-check could be put in the existing conditional:
>>
>>          * There is a small window in which user can write to 'loading'
>>          * between loading done and disappearance of 'loading'
>>          */
>> -       if (fw_state_is_done(&buf->fw_st))
>> +       if (!buf || fw_state_is_done(&buf->fw_st))
>>                 return;
>>
>>         list_del_init(&buf->pending_list);
>>
>> Note that the comment above seems to be mentioning the race we're
>> trying to solve.
>
> Right, I think another approach is to *enable* the state of the buf
> to be used to avoid further use on the sysfs iterface instead. Fortunately
> other sysfs interfaces already use fw_state_is_done() to bail out,
> so all that would be needed I think would be:
>
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index b9ac348e8d33..30ccf7aea3ca 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -558,9 +558,6 @@ static void fw_load_abort(struct firmware_priv *fw_priv)
>         struct firmware_buf *buf = fw_priv->buf;
>
>         __fw_load_abort(buf);
> -
> -       /* avoid user action after loading abort */
> -       fw_priv->buf = NULL;
>  }
>
>  static LIST_HEAD(pending_fw_head);
> @@ -713,7 +710,7 @@ static ssize_t firmware_loading_store(struct device *dev,
>
>         mutex_lock(&fw_lock);
>         fw_buf = fw_priv->buf;
> -       if (!fw_buf)
> +       if (!fw_buf || fw_state_is_aborted(&fw_buf->fw_st))
>                 goto out;
>
>         switch (loading) {

IMHO this one is nice!  I think you can even drop the !fw_buf check in
this case because AFAICS the only case where fw_buf is set to NULL is
in the abort function.

I was initially thinking that this could be a slight change of
behavior - note that if mapping pages failed the abort state is
entered with fw_state_aborted() which does not unlink the buffer so in
theory one could still restart the FW load by writing 0 to sysfs and
retrying?  But it would have to be done before the waiting thread gets
woken so it's really a race condition rather then something user space
can depend on.  Or at least that's the case if I'm reading the code
correctly.

Yet another way to change the condition in firmware_loading_store()
would be to check if fw_priv->buf->pending_list is still hooked onto
the pending_fw_head list.