netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] igb: reinit_locked() should be called with rtnl_lock
@ 2020-06-29 21:18 Francesco Ruggeri
  2020-06-30  0:16 ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: Francesco Ruggeri @ 2020-06-29 21:18 UTC (permalink / raw)
  To: linux-kernel, netdev, intel-wired-lan, kuba, davem,
	jeffrey.t.kirsher, fruggeri

We observed a panic in igb_reset_task caused by this race condition
when doing a reboot -f:

	kworker			reboot -f

	igb_reset_task
	igb_reinit_locked
	igb_down
	napi_synchronize
				__igb_shutdown
				igb_clear_interrupt_scheme
				igb_free_q_vectors
				igb_free_q_vector
				adapter->q_vector[v_idx] = NULL;
	napi_disable
	Panics trying to access
	adapter->q_vector[v_idx].napi_state

This commit applies to igb the same changes that were applied to ixgbe
in commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with
rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in
reset subtask").

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8bb3db2cbd41..b79a78e102f3 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6224,9 +6224,11 @@ static void igb_reset_task(struct work_struct *work)
 	struct igb_adapter *adapter;
 	adapter = container_of(work, struct igb_adapter, reset_task);
 
+	rtnl_lock();
 	igb_dump(adapter);
 	netdev_err(adapter->netdev, "Reset adapter\n");
 	igb_reinit_locked(adapter);
+	rtnl_unlock();
 }
 
 /**


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
  2020-06-29 21:18 [PATCH] igb: reinit_locked() should be called with rtnl_lock Francesco Ruggeri
@ 2020-06-30  0:16 ` Jakub Kicinski
  2020-06-30  4:50   ` Francesco Ruggeri
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2020-06-30  0:16 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: linux-kernel, netdev, intel-wired-lan, davem, jeffrey.t.kirsher

On Mon, 29 Jun 2020 14:18:01 -0700 Francesco Ruggeri wrote:
> We observed a panic in igb_reset_task caused by this race condition
> when doing a reboot -f:
> 
> 	kworker			reboot -f
> 
> 	igb_reset_task
> 	igb_reinit_locked
> 	igb_down
> 	napi_synchronize
> 				__igb_shutdown
> 				igb_clear_interrupt_scheme
> 				igb_free_q_vectors
> 				igb_free_q_vector
> 				adapter->q_vector[v_idx] = NULL;
> 	napi_disable
> 	Panics trying to access
> 	adapter->q_vector[v_idx].napi_state
> 
> This commit applies to igb the same changes that were applied to ixgbe
> in commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with
> rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in
> reset subtask").
> 
> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>

Thanks for the patch..

Would you mind adding a fixes tag here? Probably:

Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")

And as a matter of fact it looks like e1000e and e1000 have the same
bug :/ Would you mind checking all Intel driver producing matches for
all the affected ones?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
  2020-06-30  0:16 ` Jakub Kicinski
@ 2020-06-30  4:50   ` Francesco Ruggeri
  2020-07-01  1:37     ` Kirsher, Jeffrey T
  0 siblings, 1 reply; 8+ messages in thread
From: Francesco Ruggeri @ 2020-06-30  4:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: open list, netdev, intel-wired-lan, David Miller, Jeff Kirsher

> Would you mind adding a fixes tag here? Probably:
>
> Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")

That seems to be the commit that introduced the driver in 2.6.25.
I am not familiar with the history of the driver to tell if this was a day 1
problem or if it became an issue later.

>
> And as a matter of fact it looks like e1000e and e1000 have the same
> bug :/ Would you mind checking all Intel driver producing matches for
> all the affected ones?

Do you mean identify all Intel drivers that may have the same issue?

Francesco

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] igb: reinit_locked() should be called with rtnl_lock
  2020-06-30  4:50   ` Francesco Ruggeri
@ 2020-07-01  1:37     ` Kirsher, Jeffrey T
  2020-07-02 19:35       ` Francesco Ruggeri
  0 siblings, 1 reply; 8+ messages in thread
From: Kirsher, Jeffrey T @ 2020-07-01  1:37 UTC (permalink / raw)
  To: Francesco Ruggeri, Jakub Kicinski, David Miller
  Cc: open list, netdev, intel-wired-lan

> -----Original Message-----
> From: Francesco Ruggeri <fruggeri@arista.com>
> Sent: Monday, June 29, 2020 21:51
> To: Jakub Kicinski <kuba@kernel.org>
> Cc: open list <linux-kernel@vger.kernel.org>; netdev
> <netdev@vger.kernel.org>; intel-wired-lan@lists.osuosl.org; David Miller
> <davem@davemloft.net>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
> 
> > Would you mind adding a fixes tag here? Probably:
> >
> > Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
> 
> That seems to be the commit that introduced the driver in 2.6.25.
> I am not familiar with the history of the driver to tell if this was a day 1
> problem or if it became an issue later.
> 
> >
> > And as a matter of fact it looks like e1000e and e1000 have the same
> > bug :/ Would you mind checking all Intel driver producing matches for
> > all the affected ones?
> 
> Do you mean identify all Intel drivers that may have the same issue?
> 

Do not worry about the other Intel drivers, I have our developers looking at each of our drivers for the locking issue.

@David Miller - I am picking up this patch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
  2020-07-01  1:37     ` Kirsher, Jeffrey T
@ 2020-07-02 19:35       ` Francesco Ruggeri
  2020-07-02 20:05         ` Kirsher, Jeffrey T
  0 siblings, 1 reply; 8+ messages in thread
From: Francesco Ruggeri @ 2020-07-02 19:35 UTC (permalink / raw)
  To: Kirsher, Jeffrey T
  Cc: Jakub Kicinski, David Miller, open list, netdev, intel-wired-lan

> Do not worry about the other Intel drivers, I have our developers looking at each of our drivers for the locking issue.
>
> @David Miller - I am picking up this patch

There seems to be a second race, independent from the
original one, that results in a divide error:

kworker         reboot -f       tx packet

igb_reset_task
                __igb_shutdown
                rtnl_lock()
                ...
                igb_clear_interrupt_scheme
                igb_free_q_vectors
                adapter->num_tx_queues = 0
                ...
                rtnl_unlock()
rtnl_lock()
igb_reinit_locked
igb_down
igb_up
netif_tx_start_all_queues
                                dev_hard_start_xmit
                                igb_xmit_frame
                                igb_tx_queue_mapping
                                Panics on
                                r_idx % adapter->num_tx_queues

Using in igb_reset_task a logic similar to the one in
ixgbe_reset_subtask (bailing if __IGB_DOWN or __IGB_RESETTING
is set) seems to avoid the panic.
That logic was first introduced in ixgbe as part of commit
2f90b8657ec ('ixgbe: this patch adds support for DCB to the
kernel and ixgbe driver').
Both fixes seem to be needed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] igb: reinit_locked() should be called with rtnl_lock
  2020-07-02 19:35       ` Francesco Ruggeri
@ 2020-07-02 20:05         ` Kirsher, Jeffrey T
  2020-07-02 20:20           ` Francesco Ruggeri
  0 siblings, 1 reply; 8+ messages in thread
From: Kirsher, Jeffrey T @ 2020-07-02 20:05 UTC (permalink / raw)
  To: Francesco Ruggeri, Nguyen, Anthony L
  Cc: Jakub Kicinski, David Miller, open list, netdev, intel-wired-lan

> -----Original Message-----
> From: Francesco Ruggeri <fruggeri@arista.com>
> Sent: Thursday, July 2, 2020 12:35
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Jakub Kicinski <kuba@kernel.org>; David Miller <davem@davemloft.net>;
> open list <linux-kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org>;
> intel-wired-lan@lists.osuosl.org
> Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
> 
> > Do not worry about the other Intel drivers, I have our developers looking at
> each of our drivers for the locking issue.
> >
> > @David Miller - I am picking up this patch
> 
> There seems to be a second race, independent from the original one, that
> results in a divide error:
> 
> kworker         reboot -f       tx packet
> 
> igb_reset_task
>                 __igb_shutdown
>                 rtnl_lock()
>                 ...
>                 igb_clear_interrupt_scheme
>                 igb_free_q_vectors
>                 adapter->num_tx_queues = 0
>                 ...
>                 rtnl_unlock()
> rtnl_lock()
> igb_reinit_locked
> igb_down
> igb_up
> netif_tx_start_all_queues
>                                 dev_hard_start_xmit
>                                 igb_xmit_frame
>                                 igb_tx_queue_mapping
>                                 Panics on
>                                 r_idx % adapter->num_tx_queues
> 
> Using in igb_reset_task a logic similar to the one in ixgbe_reset_subtask (bailing
> if __IGB_DOWN or __IGB_RESETTING is set) seems to avoid the panic.
> That logic was first introduced in ixgbe as part of commit 2f90b8657ec ('ixgbe:
> this patch adds support for DCB to the kernel and ixgbe driver').
> Both fixes seem to be needed.

So will you be sending a v2 of your patch to include the second fix?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
  2020-07-02 20:05         ` Kirsher, Jeffrey T
@ 2020-07-02 20:20           ` Francesco Ruggeri
  2020-07-02 21:26             ` Kirsher, Jeffrey T
  0 siblings, 1 reply; 8+ messages in thread
From: Francesco Ruggeri @ 2020-07-02 20:20 UTC (permalink / raw)
  To: Kirsher, Jeffrey T
  Cc: Nguyen, Anthony L, Jakub Kicinski, David Miller, open list,
	netdev, intel-wired-lan

>
> So will you be sending a v2 of your patch to include the second fix?

Yes, I am working on it. Just to confirm, v2 should include both fixes, right?

Thanks,
Francesco

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] igb: reinit_locked() should be called with rtnl_lock
  2020-07-02 20:20           ` Francesco Ruggeri
@ 2020-07-02 21:26             ` Kirsher, Jeffrey T
  0 siblings, 0 replies; 8+ messages in thread
From: Kirsher, Jeffrey T @ 2020-07-02 21:26 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: Nguyen, Anthony L, Jakub Kicinski, David Miller, open list,
	netdev, intel-wired-lan

> -----Original Message-----
> From: Francesco Ruggeri <fruggeri@arista.com>
> Sent: Thursday, July 2, 2020 13:20
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Jakub Kicinski
> <kuba@kernel.org>; David Miller <davem@davemloft.net>; open list <linux-
> kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org>; intel-wired-
> lan@lists.osuosl.org
> Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
> 
> >
> > So will you be sending a v2 of your patch to include the second fix?
> 
> Yes, I am working on it. Just to confirm, v2 should include both fixes, right?

Correct.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-02 21:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-29 21:18 [PATCH] igb: reinit_locked() should be called with rtnl_lock Francesco Ruggeri
2020-06-30  0:16 ` Jakub Kicinski
2020-06-30  4:50   ` Francesco Ruggeri
2020-07-01  1:37     ` Kirsher, Jeffrey T
2020-07-02 19:35       ` Francesco Ruggeri
2020-07-02 20:05         ` Kirsher, Jeffrey T
2020-07-02 20:20           ` Francesco Ruggeri
2020-07-02 21:26             ` Kirsher, Jeffrey T

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).