linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* module refcount issues in the liquidio driver
@ 2021-03-11  7:37 Christoph Hellwig
  2021-03-11 17:43 ` [EXT] " Felix Manlunas
  0 siblings, 1 reply; 2+ messages in thread
From: Christoph Hellwig @ 2021-03-11  7:37 UTC (permalink / raw)
  To: Felix Manlunas, Derek Chickles, Satanand Burla; +Cc: netdev, linux-kernel

Hi all,

I just stumbled over the odd handling of module refcounts in the liquidio
driver.  The big red flag is the call to module_refcount in
liquidio_watchdog, which will do the wrong thing for any external module
refcount, like a userspace open.

But more importantly the whole concept of acquiring module refcounts from
inside the driver is pretty bogus.  What problem does this try to solve?



^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: [EXT] module refcount issues in the liquidio driver
  2021-03-11  7:37 module refcount issues in the liquidio driver Christoph Hellwig
@ 2021-03-11 17:43 ` Felix Manlunas
  0 siblings, 0 replies; 2+ messages in thread
From: Felix Manlunas @ 2021-03-11 17:43 UTC (permalink / raw)
  To: Christoph Hellwig, Derek Chickles, Satananda Burla; +Cc: netdev, linux-kernel

From: Christoph Hellwig <hch@infradead.org>
Date: Wed 3/10/2021 11:38 PM -0800

> Hi all,
> 
> I just stumbled over the odd handling of module refcounts in the liquidio
> driver.  The big red flag is the call to module_refcount in
> liquidio_watchdog, which will do the wrong thing for any external module
> refcount, like a userspace open.
>
> But more importantly the whole concept of acquiring module refcounts from
> inside the driver is pretty bogus.  What problem does this try to solve?

The problem is described in the commit log below, in "(2) Decrement the
module refcount ...".

commit bb54be589c7a8451a0504924703abdffeb06b79f
Author: Felix Manlunas <felix.manlunas@cavium.com>
Date:   Tue Apr 4 19:26:57 2017 -0700

    liquidio: fix Octeon core watchdog timeout false alarm
    
    Detection of watchdog timeout of Octeon cores is flawed and susceptible to
    false alarms.  Refactor by removing the detection code, and in its place,
    leverage existing code that monitors for an indication from the NIC
    firmware that an Octeon core crashed; expand the meaning of the indication
    to "an Octeon core crashed or its watchdog timer expired".  Detection of
    watchdog timeout is now delegated to an exception handler in the NIC
    firmware; this is free of false alarms.
    
    Also if there's an Octeon core crash or watchdog timeout:
    (1) Disable VF Ethernet links.
    (2) Decrement the module refcount by an amount equal to the number of
        active VFs of the NIC whose Octeon core crashed or had a watchdog
        timeout.  The refcount will continue to reflect the active VFs of
        other liquidio NIC(s) (if present) whose Octeon cores are faultless.
    
    Item (2) is needed to avoid the case of not being able to unload the driver
    because the module refcount is stuck at some non-zero number.  There is
    code that, in normal cases, decrements the refcount upon receiving a
    message from the firmware that a VF driver was unloaded.  But in
    exceptional cases like an Octeon core crash or watchdog timeout, arrival of
    that particular message from the firmware might be unreliable.  That normal
    case code is changed to not touch the refcount in the exceptional case to
    avoid contention (over the refcount) with the liquidio_watchdog kernel
    thread who will carry out item (2).
    
    Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
    Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-11 17:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11  7:37 module refcount issues in the liquidio driver Christoph Hellwig
2021-03-11 17:43 ` [EXT] " Felix Manlunas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).