All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: gregkh@linuxfoundation.org
Cc: wagi@monom.org, yi1.li@linux.intel.com,
	takahiro.akashi@linaro.org, bjorn.andersson@linaro.org,
	luto@kernel.org, ebiederm@xmission.com,
	dmitry.torokhov@gmail.com, arend.vanspriel@broadcom.com,
	dwmw2@infradead.org, rjw@rjwysocki.net, atull@kernel.org,
	moritz.fischer@ettus.com, pmladek@suse.com,
	johannes.berg@intel.com, emmanuel.grumbach@intel.com,
	luciano.coelho@intel.com, kvalo@codeaurora.org,
	torvalds@linux-foundation.org, keescook@chromium.org,
	dhowells@redhat.com, pjones@redhat.com, hdegoede@redhat.com,
	alan@linux.intel.com, tytso@mit.edu, dave@stgolabs.net,
	mawilcox@microsoft.com, tglx@linutronix.de, peterz@infradead.org,
	mfuzzey@parkeon.com, jakub.kicinski@netronome.com,
	nbroeking@me.com, jewalt@lgsinnovations.com,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	"stable # v3 . 14" <stable@vger.kernel.org>
Subject: [PATCH v4 2/3] firmware: fix batched requests - send wake up on failure on direct lookups
Date: Thu, 20 Jul 2017 13:13:10 -0700	[thread overview]
Message-ID: <20170720201311.7268-3-mcgrof@kernel.org> (raw)
In-Reply-To: <20170720201311.7268-1-mcgrof@kernel.org>

Fix batched requests from waiting forever on failure.

The firmware API batched requests feature has been broken since the API call
request_firmware_direct() was introduced on commit bba3a87e982ad ("firmware:
Introduce request_firmware_direct()"), added on v3.14 *iff* the firmware
being requested was not present in *certain kernel builds* [0].

When no firmware is found the worker which goes on to finish never informs
waiters queued up of this, so any batched request will stall in what seems
to be forever (MAX_SCHEDULE_TIMEOUT). Sadly, a reboot will also stall, as
the reboot notifier was only designed to kill custom fallback workers. The
issue seems to the user as a type of soft lockup, what *actually* happens
underneath the hood is a wait call which never completes as we failed to
issue a completion on error.

For device drivers with optional firmware schemes (ie, Intel iwlwifi, or
Netronome -- even though it uses request_firmware() and not
request_firmware_direct()), this could mean that when you boot a system with
multiple cards the firmware will seem to never load on the system, or that
the card is just not responsive even the driver initialization. Due to
differences in scheduling possible this should not always trigger --
one would need to to ensure that multiple requests are in place at the
right time for this to work, also release_firmware() must not be called
prior to any other incoming request. The complexity may not be worth
supporting batched requests in the future given the wait mechanism is
only used also for the fallback mechanism. We'll keep it for now and
just fix it.

Its reported that at least with the Intel WiFi cards on one system this
issue was creeping up 50% of the boots [0].

Before this commit batched requests testing revealed:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y

Most common Linux distribution setup.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     FAIL                OK
request_firmware_direct()              FAIL                OK
request_firmware_nowait(uevent=true)   FAIL                OK
request_firmware_nowait(uevent=false)  FAIL                OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n

Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     FAIL                OK
request_firmware_direct()              FAIL                OK
request_firmware_nowait(uevent=true)   FAIL                OK
request_firmware_nowait(uevent=false)  FAIL                OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y

Google Android setup.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     OK                  OK
request_firmware_direct()              FAIL                OK
request_firmware_nowait(uevent=true)   OK                  OK
request_firmware_nowait(uevent=false)  OK                  OK
============================================================================

Ater this commit batched testing results:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y

Most common Linux distribution setup.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     OK                  OK
request_firmware_direct()              OK                  OK
request_firmware_nowait(uevent=true)   OK                  OK
request_firmware_nowait(uevent=false)  OK                  OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n

Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     OK                  OK
request_firmware_direct()              OK                  OK
request_firmware_nowait(uevent=true)   OK                  OK
request_firmware_nowait(uevent=false)  OK                  OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y

Google Android setup.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     OK                  OK
request_firmware_direct()              OK                  OK
request_firmware_nowait(uevent=true)   OK                  OK
request_firmware_nowait(uevent=false)  OK                  OK
============================================================================

[0] https://bugzilla.kernel.org/show_bug.cgi?id=195477

Cc: stable <stable@vger.kernel.org> # v3.14
Fixes: bba3a87e982ad ("firmware: Introduce request_firmware_direct()"
Reported-by: Nicolas <nbroeking@me.com>
Reported-by: John Ewalt  <jewalt@lgsinnovations.com>
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 drivers/base/firmware_class.c | 38 ++++++++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index f50ec6e367bd..76f1b702bdd6 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -153,28 +153,27 @@ static void __fw_state_set(struct fw_state *fw_st,
 	__fw_state_set(fw_st, FW_STATUS_LOADING)
 #define fw_state_done(fw_st)					\
 	__fw_state_set(fw_st, FW_STATUS_DONE)
+#define fw_state_aborted(fw_st)					\
+	__fw_state_set(fw_st, FW_STATUS_ABORTED)
 #define fw_state_wait(fw_st)					\
 	__fw_state_wait_common(fw_st, MAX_SCHEDULE_TIMEOUT)
 
-#ifndef CONFIG_FW_LOADER_USER_HELPER
-
-#define fw_state_is_aborted(fw_st)	false
-
-#else /* CONFIG_FW_LOADER_USER_HELPER */
-
 static int __fw_state_check(struct fw_state *fw_st, enum fw_status status)
 {
 	return fw_st->status == status;
 }
 
+#define fw_state_is_aborted(fw_st)				\
+	__fw_state_check(fw_st, FW_STATUS_ABORTED)
+
+#ifdef CONFIG_FW_LOADER_USER_HELPER
+
 #define fw_state_aborted(fw_st)					\
 	__fw_state_set(fw_st, FW_STATUS_ABORTED)
 #define fw_state_is_done(fw_st)					\
 	__fw_state_check(fw_st, FW_STATUS_DONE)
 #define fw_state_is_loading(fw_st)				\
 	__fw_state_check(fw_st, FW_STATUS_LOADING)
-#define fw_state_is_aborted(fw_st)				\
-	__fw_state_check(fw_st, FW_STATUS_ABORTED)
 #define fw_state_wait_timeout(fw_st, timeout)			\
 	__fw_state_wait_common(fw_st, timeout)
 
@@ -1198,6 +1197,28 @@ _request_firmware_prepare(struct firmware **firmware_p, const char *name,
 	return 1; /* need to load */
 }
 
+/*
+ * Batched requests need only one wake, we need to do this step last due to the
+ * fallback mechanism. The buf is protected with kref_get(), and it won't be
+ * released until the last user calls release_firmware().
+ *
+ * Failed batched requests are possible as well, in such cases we just share
+ * the struct firmware_buf and won't release it until all requests are woken
+ * and have gone through this same path.
+ */
+static void fw_abort_batch_reqs(struct firmware *fw)
+{
+	struct firmware_buf *buf;
+
+	/* Loaded directly? */
+	if (!fw || !fw->priv)
+		return;
+
+	buf = fw->priv;
+	if (!fw_state_is_aborted(&buf->fw_st))
+		fw_state_aborted(&buf->fw_st);
+}
+
 /* called from request_firmware() and request_firmware_work_func() */
 static int
 _request_firmware(const struct firmware **firmware_p, const char *name,
@@ -1241,6 +1262,7 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
 
  out:
 	if (ret < 0) {
+		fw_abort_batch_reqs(fw);
 		release_firmware(fw);
 		fw = NULL;
 	}
-- 
2.11.0

  parent reply	other threads:[~2017-07-20 20:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-20 20:13 [PATCH v4 0/3] firmware: pending fixes for v4.13-final Luis R. Rodriguez
2017-07-20 20:13 ` [PATCH v4 1/3] firmware: fix batched requests - wake all waiters Luis R. Rodriguez
2017-07-20 20:13 ` Luis R. Rodriguez [this message]
2017-07-20 20:13 ` [PATCH v4 3/3] firmware: avoid invalid fallback aborts by using killable wait Luis R. Rodriguez
2017-08-02 20:55 ` [PATCH v4 0/3] firmware: pending fixes for v4.13-final Luis R. Rodriguez
2017-08-10  0:01   ` Luis R. Rodriguez
2017-08-10  0:06     ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170720201311.7268-3-mcgrof@kernel.org \
    --to=mcgrof@kernel.org \
    --cc=alan@linux.intel.com \
    --cc=arend.vanspriel@broadcom.com \
    --cc=atull@kernel.org \
    --cc=bjorn.andersson@linaro.org \
    --cc=dave@stgolabs.net \
    --cc=dhowells@redhat.com \
    --cc=dmitry.torokhov@gmail.com \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=emmanuel.grumbach@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hdegoede@redhat.com \
    --cc=jakub.kicinski@netronome.com \
    --cc=jewalt@lgsinnovations.com \
    --cc=johannes.berg@intel.com \
    --cc=keescook@chromium.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luciano.coelho@intel.com \
    --cc=luto@kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=mfuzzey@parkeon.com \
    --cc=moritz.fischer@ettus.com \
    --cc=nbroeking@me.com \
    --cc=peterz@infradead.org \
    --cc=pjones@redhat.com \
    --cc=pmladek@suse.com \
    --cc=rjw@rjwysocki.net \
    --cc=stable@vger.kernel.org \
    --cc=takahiro.akashi@linaro.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=wagi@monom.org \
    --cc=yi1.li@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.