Re: [PATCH 1/1] be2net: Detach interface for avoiding a system crash

From: Saeed Mahameed <saeedm@mellanox.com>
To: "firogm@gmail.com" <firogm@gmail.com>,
	"sathya.perla@broadcom.com" <sathya.perla@broadcom.com>,
	"sriharsha.basavapatna@broadcom.com" 
	<sriharsha.basavapatna@broadcom.com>,
	"fyang@suse.com" <fyang@suse.com>,
	"somnath.kotur@broadcom.com" <somnath.kotur@broadcom.com>,
	"ajit.khaparde@broadcom.com" <ajit.khaparde@broadcom.com>,
	"davem@davemloft.net" <davem@davemloft.net>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [PATCH 1/1] be2net: Detach interface for avoiding a system crash
Date: Thu, 25 Apr 2019 20:35:41 +0000	[thread overview]
Message-ID: <5155e07735ed6ebf4bce904cc666827c0573f13a.camel@mellanox.com> (raw)
In-Reply-To: <a6b9bbe1-3f3c-b798-56ab-8193d481e364@suse.com>

On Wed, 2019-04-24 at 13:53 +0800, Firo wrote:
> 
> On 4/20/19 6:31 AM, Saeed Mahameed wrote:
> > On Fri, 2019-04-19 at 21:07 +0800, Firo wrote:
> > > On 4/19/19 2:17 AM, Saeed Mahameed wrote:
> > > > On Thu, 2019-04-18 at 15:05 +0800, Firo wrote:
> > > > > On 4/2/19 12:25 AM, Saeed Mahameed wrote:
> > > > > > On Mon, 2019-04-01 at 20:24 +0800, Firo Yang wrote:
> > > > > > > This crash is triggered by a user-after-free since lake
> > > > > > > of
> > > > > > > the synchronization of a race condition between   
> > > > > > > be_update_queues() modifying multi-purpose channels of
> > > > > > > network device and be_tx_timeout().
> > > > > > > 
> > > > > > > BUG: unable to handle kernel NULL pointer dereference at
> > > > > > > (null)
> > > > > > > Call Trace:
> > > > > > > be_tx_timeout+0xa5/0x360 [be2net]
> > > > > > > dev_watchdog+0x1d8/0x210
> > > > > > > call_timer_fn+0x32/0x140
> > > > > > > 
> > > > > > > To fix it, detach the interface before modifying
> > > > > > > multi-purpose channels of network device.
> > > > > > > 
> > > > > > > Signed-off-by: Firo Yang <fyang@suse.com>
> > > > > > > ---
> > > > > > >  drivers/net/ethernet/emulex/benet/be_main.c | 12
> > > > > > > ++++++++---
> > > > > > > -
> > > > > > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/net/ethernet/emulex/benet/be_main.c
> > > > > > > b/drivers/net/ethernet/emulex/benet/be_main.c
> > > > > > > index d5026909dec5..25d0128bf684 100644
> > > > > > > --- a/drivers/net/ethernet/emulex/benet/be_main.c
> > > > > > > +++ b/drivers/net/ethernet/emulex/benet/be_main.c
> > > > > > > @@ -4705,6 +4705,8 @@ int be_update_queues(struct
> > > > > > > be_adapter
> > > > > > > *adapter)
> > > > > > >  	struct net_device *netdev = adapter->netdev;
> > > > > > >  	int status;
> > > > > > >    
> > > > > > > +	netif_device_detach(netdev);
> > > > > > > +
> > > > > > 
> > > > > > This will reduce the probability, but will not do the
> > > > > > trick.
> > > > > > since this will not guarantee that the dev_watchdog is
> > > > > > disabled.
> > > > > Hi Saeed,
> > > > > 
> > > > > What about using dev_watchdog_down/up() to temporarily
> > > > > disable
> > > > > the
> > > > > dev_watchdog?
> > > > > 
> > > > > +++ b/drivers/net/ethernet/emulex/benet/be_main.c
> > > > > @@ -4697,6 +4697,8 @@ int be_update_queues(struct be_adapter
> > > > > *adapter)
> > > > >         struct net_device *netdev = adapter->netdev;
> > > > >         int status;
> > > > > 
> > > > > +       dev_watchdog_down();
> > > > > +
> > > >   
> > > > there is no such API, currently this is a static function, and
> > > > i
> > > > don't
> > > > think it is a good idea to mess around with the watchdog.
> > > >   
> > > > If you want to avoid deferred work and explicit locking, you
> > > > need
> > > > something similar to what you did with the device_detach to
> > > > flag to
> > > > the
> > > > watchdog to not look at your tx queues.
> > > >   
> > > > by looking at be_close() I see that it calls
> > > > netif_tx_disable(netdev);
> > > > which provides some kind of state synchronization with the
> > > > watchdog.
> > > >   
> > > > so maybe netif_device_detach() will work, but it is a very
> > > > heavy
> > > > gun!
> > > 
> > > Sorry, I cannot fully understand why you think
> > > netif_device_detach()
> > > is
> > > very heavy. Except clearing __LINK_STATE_PRESENT,
> > > netif_device_detach()
> > > almost does the same thing as  netif_tx_disable(); could you
> > > please
> > > detail your thought?
> > > 
> > 
> > calling  on its own is not enough to make the
> > watchdog
> > back-off, you need the extra flag from carrier_off or
> > detach_device.
> I think your first comment is right. Locking mechanism is necessary
> here. Consider the following situation. In (), due to
> some
> reasons[1], netif_xmit_stopped(txq) && time_after(jiffies,
> (trans_start
> + dev->watchdog_timeo)) is true; and dev_watchdog() tests the
> outermost
> if-statement[2] just before be_update_queues() executes carrier_off
> or
> detach_device; then dev_watchdog() will be possible to hit a kernel
> oops.
> 
> [1]: For example, a real problem causes that NIC stopped and timed
> out.
> [2]: if (!qdisc_tx_is_noop(dev)) {
>                 if (netif_device_present(dev) &&
>                     netif_running(dev) &&
>                     netif_carrier_ok(dev))

Yes internal synchronization in the be2net driver is the best way to go
for now, 

as you already suggested we add new API or just disable watchdog on
netif_tx_disable(netdev); which is almost called by every driver.
but we don't have netif_tx_enable() equivalent, and each driver is
doing something else to restart tx queues, so it is not going to be an
easy task

> > 1) the documentation of netif_device_detach says 
> > "Mark device as removed from system and therefore no longer
> > available."
> > which is not true in your case.
> Indeed.
> > 2) netif_device_present(dev) check is used very widely in netdev
> > control paths, so if you detach the device while doing one config
> > you
> > will loose all other configs and they would wrongfully return
> > -ENODEV;
> I went through a few *ioctl and netlink code; it seems that they all
> under the protection of RTNL lock and won't be able to return
> -ENODEV.
> 
> > calling carrier_off will make the watch-dog back off but still
> > allow
> > other configurations.
> > 
> > the carrier is used to mark that the device is not available for
> > tx,
> > which is exactly your case since you are resetting the rings, but
> > still
> > available for anything else.
> > 
> > looking at other drivers, i couldn't find anyone using the
> > detach_device approach to disable the watchdog while resetting the
> > rings, but almost most of the drivers are doing the carrier_off.
> > 
> > > > netif_carrier_off(netdev) should do the work, and at the end
> > > > you
> > > > will
> > > > need to restore the original carrier state.
> > > 
> > > netif_carrier_off() might trigger a linkwatch event. From this
> > > point
> > > of
> > > view, maybe netif_device_detach() is better.
> > > 
> > 
> > Code-wise yes netif_device_detach is better, but the implications
> > can
> > be worse, as explained above.
> > 
> > My 2 cent is to copy the approach used by most drivers, and just
> > use
> > netif_carrier_off(), and maybe in the future we will introduce a
> > more
> > relaxed version of carrier off to shut-up the watchdog, and fix all
> > drivers at once :).>
> > I will leave the decision to you and to this driver maintainers.
> > 
> > Thanks for following up on my comments.
> Thank you for such detailed explanation!
> // Firo
> 
> > > Thanks,
> > > Firo
> > > 
> > > >   
> > > > carrier_ok = netif_carrier_ok(netdev);
> > > > /* must be called before netif_tx_disable() */
> > > > netif_carrier_off(netdev);   
> > > >   
> > > > // do stuff
> > > >   
> > > > if (carrier_ok)
> > > > 	netif_carrier_on(netdev);
> > > >   
> > > > >         if (netif_running(netdev))
> > > > >                 be_close(netdev);
> > > > > 
> > > > > @@ -4711,21 +4713,21 @@ int be_update_queues(struct
> > > > > be_adapter
> > > > > *adapter)
> > > > >         be_clear_queues(adapter);
> > > > >         status = be_cmd_if_destroy(adapter, adapter-
> > > > > > if_handle,  0);
> > > > >         if (status)
> > > > > -               return status;
> > > > > +               goto out;
> > > > > 
> > > > >         if (!msix_enabled(adapter)) {
> > > > >                 status = be_msix_enable(adapter);
> > > > >                 if (status)
> > > > > -                       return status;
> > > > > +                       goto out;
> > > > >         }
> > > > > 
> > > > >         status = be_if_create(adapter);
> > > > >         if (status)
> > > > > -               return status;
> > > > > +               goto out;
> > > > > 
> > > > >         status = be_setup_queues(adapter);
> > > > >         if (status)
> > > > > -               return status;
> > > > > +               goto out;
> > > > > 
> > > > >         be_schedule_worker(adapter);
> > > > > 
> > > > > @@ -4741,6 +4743,9 @@ int be_update_queues(struct be_adapter
> > > > > *adapter)
> > > > >         if (netif_running(netdev))
> > > > >                 status = be_open(netdev);
> > > > > 
> > > > > +out:
> > > > > +       dev_watchdog_up();
> > > > > 
> > > > > Thanks,
> > > > > Firo
> > > > > 
> > > > > > what you need is proper locking mechanism and/or scheduling
> > > > > > the
> > > > > > tx_timeout handling out of atomic context if a mutex will
> > > > > > be
> > > > > > required.
> > > > > > 
> > > > > > netif_device_detach is a too heavy hammer for such
> > > > > > synchronizations
> > > > > > tasks.
> > > > > > 
> > > > > > >  	if (netif_running(netdev))
> > > > > > >  		be_close(netdev);
> > > > > > >    
> > > > > > > @@ -4719,21 +4721,21 @@ int be_update_queues(struct
> > > > > > > be_adapter
> > > > > > > *adapter)
> > > > > > >  	be_clear_queues(adapter);
> > > > > > >  	status = be_cmd_if_destroy(adapter, adapter-
> > > > > > > > if_handle,  0);
> > > > > > >  	if (status)
> > > > > > > -		return status;
> > > > > > > +		goto out;
> > > > > > >    
> > > > > > >  	if (!msix_enabled(adapter)) {
> > > > > > >  		status = be_msix_enable(adapter);
> > > > > > >  		if (status)
> > > > > > > -			return status;
> > > > > > > +			goto out;
> > > > > > >  	}
> > > > > > >    
> > > > > > >  	status = be_if_create(adapter);
> > > > > > >  	if (status)
> > > > > > > -		return status;
> > > > > > > +		goto out;
> > > > > > >    
> > > > > > >  	status = be_setup_queues(adapter);
> > > > > > >  	if (status)
> > > > > > > -		return status;
> > > > > > > +		goto out;
> > > > > > >    
> > > > > > >  	be_schedule_worker(adapter);
> > > > > > >    
> > > > > > > @@ -4748,6 +4750,8 @@ int be_update_queues(struct
> > > > > > > be_adapter
> > > > > > > *adapter)
> > > > > > >  	if (netif_running(netdev))
> > > > > > >  		status = be_open(netdev);
> > > > > > >    
> > > > > > > +out:
> > > > > > > +	netif_device_attach(netdev);
> > > > > > >  	return status;
> > > > > > >  }
> > > > > > >