* [PATCH] igb: reinit_locked() should be called with rtnl_lock
@ 2020-06-29 21:18 Francesco Ruggeri
2020-06-30 0:16 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: Francesco Ruggeri @ 2020-06-29 21:18 UTC (permalink / raw)
To: linux-kernel, netdev, intel-wired-lan, kuba, davem,
jeffrey.t.kirsher, fruggeri
We observed a panic in igb_reset_task caused by this race condition
when doing a reboot -f:
kworker reboot -f
igb_reset_task
igb_reinit_locked
igb_down
napi_synchronize
__igb_shutdown
igb_clear_interrupt_scheme
igb_free_q_vectors
igb_free_q_vector
adapter->q_vector[v_idx] = NULL;
napi_disable
Panics trying to access
adapter->q_vector[v_idx].napi_state
This commit applies to igb the same changes that were applied to ixgbe
in commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with
rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in
reset subtask").
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8bb3db2cbd41..b79a78e102f3 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6224,9 +6224,11 @@ static void igb_reset_task(struct work_struct *work)
struct igb_adapter *adapter;
adapter = container_of(work, struct igb_adapter, reset_task);
+ rtnl_lock();
igb_dump(adapter);
netdev_err(adapter->netdev, "Reset adapter\n");
igb_reinit_locked(adapter);
+ rtnl_unlock();
}
/**
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
2020-06-29 21:18 [PATCH] igb: reinit_locked() should be called with rtnl_lock Francesco Ruggeri
@ 2020-06-30 0:16 ` Jakub Kicinski
2020-06-30 4:50 ` Francesco Ruggeri
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2020-06-30 0:16 UTC (permalink / raw)
To: Francesco Ruggeri
Cc: linux-kernel, netdev, intel-wired-lan, davem, jeffrey.t.kirsher
On Mon, 29 Jun 2020 14:18:01 -0700 Francesco Ruggeri wrote:
> We observed a panic in igb_reset_task caused by this race condition
> when doing a reboot -f:
>
> kworker reboot -f
>
> igb_reset_task
> igb_reinit_locked
> igb_down
> napi_synchronize
> __igb_shutdown
> igb_clear_interrupt_scheme
> igb_free_q_vectors
> igb_free_q_vector
> adapter->q_vector[v_idx] = NULL;
> napi_disable
> Panics trying to access
> adapter->q_vector[v_idx].napi_state
>
> This commit applies to igb the same changes that were applied to ixgbe
> in commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with
> rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in
> reset subtask").
>
> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Thanks for the patch..
Would you mind adding a fixes tag here? Probably:
Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
And as a matter of fact it looks like e1000e and e1000 have the same
bug :/ Would you mind checking all Intel driver producing matches for
all the affected ones?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
2020-06-30 0:16 ` Jakub Kicinski
@ 2020-06-30 4:50 ` Francesco Ruggeri
2020-07-01 1:37 ` Kirsher, Jeffrey T
0 siblings, 1 reply; 8+ messages in thread
From: Francesco Ruggeri @ 2020-06-30 4:50 UTC (permalink / raw)
To: Jakub Kicinski
Cc: open list, netdev, intel-wired-lan, David Miller, Jeff Kirsher
> Would you mind adding a fixes tag here? Probably:
>
> Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
That seems to be the commit that introduced the driver in 2.6.25.
I am not familiar with the history of the driver to tell if this was a day 1
problem or if it became an issue later.
>
> And as a matter of fact it looks like e1000e and e1000 have the same
> bug :/ Would you mind checking all Intel driver producing matches for
> all the affected ones?
Do you mean identify all Intel drivers that may have the same issue?
Francesco
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH] igb: reinit_locked() should be called with rtnl_lock
2020-06-30 4:50 ` Francesco Ruggeri
@ 2020-07-01 1:37 ` Kirsher, Jeffrey T
2020-07-02 19:35 ` Francesco Ruggeri
0 siblings, 1 reply; 8+ messages in thread
From: Kirsher, Jeffrey T @ 2020-07-01 1:37 UTC (permalink / raw)
To: Francesco Ruggeri, Jakub Kicinski, David Miller
Cc: open list, netdev, intel-wired-lan
> -----Original Message-----
> From: Francesco Ruggeri <fruggeri@arista.com>
> Sent: Monday, June 29, 2020 21:51
> To: Jakub Kicinski <kuba@kernel.org>
> Cc: open list <linux-kernel@vger.kernel.org>; netdev
> <netdev@vger.kernel.org>; intel-wired-lan@lists.osuosl.org; David Miller
> <davem@davemloft.net>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
>
> > Would you mind adding a fixes tag here? Probably:
> >
> > Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
>
> That seems to be the commit that introduced the driver in 2.6.25.
> I am not familiar with the history of the driver to tell if this was a day 1
> problem or if it became an issue later.
>
> >
> > And as a matter of fact it looks like e1000e and e1000 have the same
> > bug :/ Would you mind checking all Intel driver producing matches for
> > all the affected ones?
>
> Do you mean identify all Intel drivers that may have the same issue?
>
Do not worry about the other Intel drivers, I have our developers looking at each of our drivers for the locking issue.
@David Miller - I am picking up this patch
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
2020-07-01 1:37 ` Kirsher, Jeffrey T
@ 2020-07-02 19:35 ` Francesco Ruggeri
2020-07-02 20:05 ` Kirsher, Jeffrey T
0 siblings, 1 reply; 8+ messages in thread
From: Francesco Ruggeri @ 2020-07-02 19:35 UTC (permalink / raw)
To: Kirsher, Jeffrey T
Cc: Jakub Kicinski, David Miller, open list, netdev, intel-wired-lan
> Do not worry about the other Intel drivers, I have our developers looking at each of our drivers for the locking issue.
>
> @David Miller - I am picking up this patch
There seems to be a second race, independent from the
original one, that results in a divide error:
kworker reboot -f tx packet
igb_reset_task
__igb_shutdown
rtnl_lock()
...
igb_clear_interrupt_scheme
igb_free_q_vectors
adapter->num_tx_queues = 0
...
rtnl_unlock()
rtnl_lock()
igb_reinit_locked
igb_down
igb_up
netif_tx_start_all_queues
dev_hard_start_xmit
igb_xmit_frame
igb_tx_queue_mapping
Panics on
r_idx % adapter->num_tx_queues
Using in igb_reset_task a logic similar to the one in
ixgbe_reset_subtask (bailing if __IGB_DOWN or __IGB_RESETTING
is set) seems to avoid the panic.
That logic was first introduced in ixgbe as part of commit
2f90b8657ec ('ixgbe: this patch adds support for DCB to the
kernel and ixgbe driver').
Both fixes seem to be needed.
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH] igb: reinit_locked() should be called with rtnl_lock
2020-07-02 19:35 ` Francesco Ruggeri
@ 2020-07-02 20:05 ` Kirsher, Jeffrey T
2020-07-02 20:20 ` Francesco Ruggeri
0 siblings, 1 reply; 8+ messages in thread
From: Kirsher, Jeffrey T @ 2020-07-02 20:05 UTC (permalink / raw)
To: Francesco Ruggeri, Nguyen, Anthony L
Cc: Jakub Kicinski, David Miller, open list, netdev, intel-wired-lan
> -----Original Message-----
> From: Francesco Ruggeri <fruggeri@arista.com>
> Sent: Thursday, July 2, 2020 12:35
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Jakub Kicinski <kuba@kernel.org>; David Miller <davem@davemloft.net>;
> open list <linux-kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org>;
> intel-wired-lan@lists.osuosl.org
> Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
>
> > Do not worry about the other Intel drivers, I have our developers looking at
> each of our drivers for the locking issue.
> >
> > @David Miller - I am picking up this patch
>
> There seems to be a second race, independent from the original one, that
> results in a divide error:
>
> kworker reboot -f tx packet
>
> igb_reset_task
> __igb_shutdown
> rtnl_lock()
> ...
> igb_clear_interrupt_scheme
> igb_free_q_vectors
> adapter->num_tx_queues = 0
> ...
> rtnl_unlock()
> rtnl_lock()
> igb_reinit_locked
> igb_down
> igb_up
> netif_tx_start_all_queues
> dev_hard_start_xmit
> igb_xmit_frame
> igb_tx_queue_mapping
> Panics on
> r_idx % adapter->num_tx_queues
>
> Using in igb_reset_task a logic similar to the one in ixgbe_reset_subtask (bailing
> if __IGB_DOWN or __IGB_RESETTING is set) seems to avoid the panic.
> That logic was first introduced in ixgbe as part of commit 2f90b8657ec ('ixgbe:
> this patch adds support for DCB to the kernel and ixgbe driver').
> Both fixes seem to be needed.
So will you be sending a v2 of your patch to include the second fix?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
2020-07-02 20:05 ` Kirsher, Jeffrey T
@ 2020-07-02 20:20 ` Francesco Ruggeri
2020-07-02 21:26 ` Kirsher, Jeffrey T
0 siblings, 1 reply; 8+ messages in thread
From: Francesco Ruggeri @ 2020-07-02 20:20 UTC (permalink / raw)
To: Kirsher, Jeffrey T
Cc: Nguyen, Anthony L, Jakub Kicinski, David Miller, open list,
netdev, intel-wired-lan
>
> So will you be sending a v2 of your patch to include the second fix?
Yes, I am working on it. Just to confirm, v2 should include both fixes, right?
Thanks,
Francesco
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH] igb: reinit_locked() should be called with rtnl_lock
2020-07-02 20:20 ` Francesco Ruggeri
@ 2020-07-02 21:26 ` Kirsher, Jeffrey T
0 siblings, 0 replies; 8+ messages in thread
From: Kirsher, Jeffrey T @ 2020-07-02 21:26 UTC (permalink / raw)
To: Francesco Ruggeri
Cc: Nguyen, Anthony L, Jakub Kicinski, David Miller, open list,
netdev, intel-wired-lan
> -----Original Message-----
> From: Francesco Ruggeri <fruggeri@arista.com>
> Sent: Thursday, July 2, 2020 13:20
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Jakub Kicinski
> <kuba@kernel.org>; David Miller <davem@davemloft.net>; open list <linux-
> kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org>; intel-wired-
> lan@lists.osuosl.org
> Subject: Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock
>
> >
> > So will you be sending a v2 of your patch to include the second fix?
>
> Yes, I am working on it. Just to confirm, v2 should include both fixes, right?
Correct.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-07-02 21:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-29 21:18 [PATCH] igb: reinit_locked() should be called with rtnl_lock Francesco Ruggeri
2020-06-30 0:16 ` Jakub Kicinski
2020-06-30 4:50 ` Francesco Ruggeri
2020-07-01 1:37 ` Kirsher, Jeffrey T
2020-07-02 19:35 ` Francesco Ruggeri
2020-07-02 20:05 ` Kirsher, Jeffrey T
2020-07-02 20:20 ` Francesco Ruggeri
2020-07-02 21:26 ` Kirsher, Jeffrey T
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).