All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Miller <davem@davemloft.net>
To: felix.manlunas@cavium.com
Cc: netdev@vger.kernel.org, raghu.vatsavayi@cavium.com,
	derek.chickles@cavium.com
Subject: Re: [PATCH net-next] liquidio: fix Octeon core watchdog timeout false alarm
Date: Thu, 06 Apr 2017 12:32:08 -0700 (PDT)	[thread overview]
Message-ID: <20170406.123208.1810854461605308607.davem@davemloft.net> (raw)
In-Reply-To: <20170405022657.GA1064@felix-thinkpad.cavium.com>

From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Tue, 4 Apr 2017 19:26:57 -0700

> Detection of watchdog timeout of Octeon cores is flawed and susceptible to
> false alarms.  Refactor by removing the detection code, and in its place,
> leverage existing code that monitors for an indication from the NIC
> firmware that an Octeon core crashed; expand the meaning of the indication
> to "an Octeon core crashed or its watchdog timer expired".  Detection of
> watchdog timeout is now delegated to an exception handler in the NIC
> firmware; this is free of false alarms.
> 
> Also if there's an Octeon core crash or watchdog timeout:
> (1) Disable VF Ethernet links.
> (2) Decrement the module refcount by an amount equal to the number of
>     active VFs of the NIC whose Octeon core crashed or had a watchdog
>     timeout.  The refcount will continue to reflect the active VFs of
>     other liquidio NIC(s) (if present) whose Octeon cores are faultless.
> 
> Item (2) is needed to avoid the case of not being able to unload the driver
> because the module refcount is stuck at some non-zero number.  There is
> code that, in normal cases, decrements the refcount upon receiving a
> message from the firmware that a VF driver was unloaded.  But in
> exceptional cases like an Octeon core crash or watchdog timeout, arrival of
> that particular message from the firmware might be unreliable.  That normal
> case code is changed to not touch the refcount in the exceptional case to
> avoid contention (over the refcount) with the liquidio_watchdog kernel
> thread who will carry out item (2).
> 
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
> Signed-off-by: Derek Chickles <derek.chickles@cavium.com>

Applied, thanks.

      reply	other threads:[~2017-04-06 19:32 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-05  2:26 [PATCH net-next] liquidio: fix Octeon core watchdog timeout false alarm Felix Manlunas
2017-04-06 19:32 ` David Miller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170406.123208.1810854461605308607.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=derek.chickles@cavium.com \
    --cc=felix.manlunas@cavium.com \
    --cc=netdev@vger.kernel.org \
    --cc=raghu.vatsavayi@cavium.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.