netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.e.keller@intel.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, Jiri Pirko <jiri@nvidia.com>,
	Michael Chan <michael.chan@broadcom.com>,
	Shannon Nelson <snelson@pensando.io>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Boris Pismenny <borisp@nvidia.com>, Bin Luo <luobin9@huawei.com>
Subject: devlink userspace process appears stuck (was: Re: [net-next] devlink: move request_firmware out of driver)
Date: Fri, 13 Nov 2020 14:32:36 -0800	[thread overview]
Message-ID: <6352e9d3-02af-721e-3a54-ef99a666be29@intel.com> (raw)
In-Reply-To: <01c79a25-3826-d0f3-8ea3-aa31e338dabe@intel.com>



On 11/13/2020 1:34 PM, Jacob Keller wrote:
> Well, at least with ice, the experience is pretty bad. I tried out with
> a garbage file name on one of my test systems. This was on a slightly
> older kernel without this patch applied, and the device had a pending
> update that had not yet been finalized with a reset:
> 
> $ devlink dev flash pci/0000:af:00.0 file garbage_file_does_not_exist
> Canceling previous pending update
> 
> 
> The update looks like it got stuck, but actually it failed. Somehow the
> extack error over the socket didn't get handled by the application very
> well. Something buggy in the forked process probably.
> 
> I do get this in the dmesg though:
> 
> Nov 13 13:12:57 jekeller-stp-glorfindel kernel: ice 0000:af:00.0: Direct
> firmware load for garbage_file_does_not_exist failed with error -2
> 

I think I figured out what is going on here, but I'm not sure what the
best solution is.

in userspace devlink.c:3410, the condition for exiting the while loop
that monitors the flash update process is:

(!ctx.flash_done || (ctx.not_first && !ctx.received_end))

This condition means keep looping until flash is done *OR* we've
received a message but have not yet received the end.

In the ice driver implementation, we perform a check for a pending flash
update first, which could trigger a cancellation that causes us to send
back a "cancelling previous pending flash update" status message, which
was sent *before* the devlink_flash_update_begin_notify(). Then, after
this we request the firmware object, which fails, and results in an
error code being reported back..

However, we will never send either the begin or end notification at this
point. Thus, the devlink userspace application will never quit, and
won't display the extack message.

This occurs because we sent a status notify message before we actually
sent a "begin notify". I think the ordering was done because of trying
to avoid having a complicated cleanup logic.

In some sense, this is a bug in the ice driver.. but in another sense
this is the devlink application being too strict about the requirements
on ordering of these messages..

I guess one method if we really want to remain strict is moving the
"begin" and "end" notifications outside of the driver into core so that
it always sends a begin before calling the .flash_update handler, and
always sends an end before exiting.

I guess we could simply relax the restriction on "not first" so that
we'll always exit in the case where the forked process has quit on us,
even if we haven't received a proper flash end notification...

Thoughts?

  reply	other threads:[~2020-11-13 22:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-13  0:01 [net-next] devlink: move request_firmware out of driver Jacob Keller
2020-11-13 21:12 ` Jakub Kicinski
2020-11-13 21:34   ` Jacob Keller
2020-11-13 22:32     ` Jacob Keller [this message]
2020-11-13 22:51       ` devlink userspace process appears stuck (was: Re: [net-next] devlink: move request_firmware out of driver) Jacob Keller
2020-11-15  4:10         ` Jakub Kicinski
2020-11-17 17:45           ` Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6352e9d3-02af-721e-3a54-ef99a666be29@intel.com \
    --to=jacob.e.keller@intel.com \
    --cc=borisp@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=luobin9@huawei.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=snelson@pensando.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).