netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] net/tg3: Fix kernel crash
@ 2013-07-24  9:25 Gavin Shan
  2013-07-24  9:25 ` [PATCH 2/2] net/tg3: Fix warning from pci_disable_device() Gavin Shan
  2013-07-24 12:27 ` [PATCH 1/2] net/tg3: Fix kernel crash Nithin Nayak Sujir
  0 siblings, 2 replies; 6+ messages in thread
From: Gavin Shan @ 2013-07-24  9:25 UTC (permalink / raw)
  To: netdev; +Cc: nsujir, mchan, davem, Gavin Shan

While EEH error happens, we might not have network device instance
(struct net_device) yet. So we can't access the instance safely and
check its link state, which causes kernel crash. The patch fixes it.

EEH: Frozen PE#2 on PHB#3 detected
EEH: This PCI device has failed 1 times in the last hour
EEH: Notify device drivers to shutdown
(NULL net_device): PCI I/O error detected
Unable to handle kernel paging request for data at address 0x00000048
Faulting instruction address: 0xd00000001c9387a8
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
:
NIP [d00000001c9387a8] .tg3_io_error_detected+0x78/0x2a0 [tg3]
LR [d00000001c9387a4] .tg3_io_error_detected+0x74/0x2a0 [tg3]
Call Trace:
[c000003f93a0f960] [d00000001c9387a4] .tg3_io_error_detected+0x74/0x2a0 [tg3]
[c000003f93a0fa30] [c00000000003844c] .eeh_report_error+0xac/0x120
[c000003f93a0fac0] [c0000000000371bc] .eeh_pe_dev_traverse+0x8c/0x150
[c000003f93a0fb60] [c000000000038858] .eeh_handle_normal_event+0x128/0x3d0
[c000003f93a0fbf0] [c000000000038db8] .eeh_handle_event+0x2b8/0x2c0
[c000003f93a0fc90] [c000000000038e80] .eeh_event_handler+0xc0/0x170
[c000003f93a0fd30] [c0000000000cc000] .kthread+0xf0/0x100
[c000003f93a0fe30] [c00000000000a0dc] .ret_from_kernel_thread+0x5c/0x80

Reported-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/net/ethernet/broadcom/tg3.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index d964f30..aee1b9a 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -17773,7 +17773,8 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev,
 
 	rtnl_lock();
 
-	if (!netif_running(netdev))
+	/* We probably don't have netdev yet */
+	if (!netdev || !netif_running(netdev))
 		goto done;
 
 	tg3_phy_stop(tp);
-- 
1.7.5.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] net/tg3: Fix warning from pci_disable_device()
  2013-07-24  9:25 [PATCH 1/2] net/tg3: Fix kernel crash Gavin Shan
@ 2013-07-24  9:25 ` Gavin Shan
  2013-07-24 12:27   ` Nithin Nayak Sujir
  2013-07-24 12:27 ` [PATCH 1/2] net/tg3: Fix kernel crash Nithin Nayak Sujir
  1 sibling, 1 reply; 6+ messages in thread
From: Gavin Shan @ 2013-07-24  9:25 UTC (permalink / raw)
  To: netdev; +Cc: nsujir, mchan, davem, Gavin Shan

The patch fixes following warning. The PCI device might have been
disabled somewhere else when we have EEH errors during early stage.

Device tg3 disabling already-disabled device
WARNING: at drivers/pci/pci.c:1403
:
NIP [c00000000044fd5c] .pci_disable_device+0xcc/0xe0
LR [c00000000044fd58] .pci_disable_device+0xc8/0xe0
Call Trace:
[c000003f80bc7370] [c00000000044fd58] .pci_disable_device+0xc8/0xe0
[c000003f80bc73f0] [d00000001cfe8fc0] .tg3_init_one+0x2f0/0x19f0 [tg3]
[c000003f80bc74d0] [c0000000004534e8] .local_pci_probe+0x68/0xb0
[c000003f80bc7560] [c0000000004537c8] .pci_device_probe+0x198/0x1a0
[c000003f80bc7610] [c0000000004f9e98] .driver_probe_device+0xd8/0x450
[c000003f80bc76a0] [c0000000004fa3bc] .__driver_attach+0x10c/0x110
[c000003f80bc7730] [c0000000004f6e94] .bus_for_each_dev+0x94/0x100
[c000003f80bc77d0] [c0000000004f9634] .driver_attach+0x34/0x50
[c000003f80bc7850] [c0000000004f8f98] .bus_add_driver+0x288/0x380
[c000003f80bc78f0] [c0000000004fae2c] .driver_register+0x9c/0x200
[c000003f80bc7980] [c000000000453214] .__pci_register_driver+0x64/0x90
[c000003f80bc7a10] [d00000001cff7a60] .tg3_driver_init+0x2c/0x40 [tg3]
[c000003f80bc7a80] [c00000000000b424] .do_one_initcall+0x144/0x1f0
[c000003f80bc7b70] [c0000000001244a0] .load_module+0x1f30/0x2700
[c000003f80bc7d40] [c000000000124e80] .SyS_finit_module+0xc0/0x110
[c000003f80bc7e30] [c000000000009dd4] syscall_exit+0x0/0x98

Reported-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/net/ethernet/broadcom/tg3.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index aee1b9a..ddebc7a 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -17625,7 +17625,8 @@ err_out_free_res:
 	pci_release_regions(pdev);
 
 err_out_disable_pdev:
-	pci_disable_device(pdev);
+	if (pci_is_enabled(pdev))
+		pci_disable_device(pdev);
 	pci_set_drvdata(pdev, NULL);
 	return err;
 }
-- 
1.7.5.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] net/tg3: Fix warning from pci_disable_device()
  2013-07-24  9:25 ` [PATCH 2/2] net/tg3: Fix warning from pci_disable_device() Gavin Shan
@ 2013-07-24 12:27   ` Nithin Nayak Sujir
  2013-07-26 21:28     ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Nithin Nayak Sujir @ 2013-07-24 12:27 UTC (permalink / raw)
  To: Gavin Shan; +Cc: netdev, mchan, davem



On 7/24/2013 2:25 AM, Gavin Shan wrote:
> The patch fixes following warning. The PCI device might have been
> disabled somewhere else when we have EEH errors during early stage.
>
> Device tg3 disabling already-disabled device
> WARNING: at drivers/pci/pci.c:1403
> :
> NIP [c00000000044fd5c] .pci_disable_device+0xcc/0xe0
> LR [c00000000044fd58] .pci_disable_device+0xc8/0xe0
> Call Trace:
> [c000003f80bc7370] [c00000000044fd58] .pci_disable_device+0xc8/0xe0
> [c000003f80bc73f0] [d00000001cfe8fc0] .tg3_init_one+0x2f0/0x19f0 [tg3]
> [c000003f80bc74d0] [c0000000004534e8] .local_pci_probe+0x68/0xb0
> [c000003f80bc7560] [c0000000004537c8] .pci_device_probe+0x198/0x1a0
> [c000003f80bc7610] [c0000000004f9e98] .driver_probe_device+0xd8/0x450
> [c000003f80bc76a0] [c0000000004fa3bc] .__driver_attach+0x10c/0x110
> [c000003f80bc7730] [c0000000004f6e94] .bus_for_each_dev+0x94/0x100
> [c000003f80bc77d0] [c0000000004f9634] .driver_attach+0x34/0x50
> [c000003f80bc7850] [c0000000004f8f98] .bus_add_driver+0x288/0x380
> [c000003f80bc78f0] [c0000000004fae2c] .driver_register+0x9c/0x200
> [c000003f80bc7980] [c000000000453214] .__pci_register_driver+0x64/0x90
> [c000003f80bc7a10] [d00000001cff7a60] .tg3_driver_init+0x2c/0x40 [tg3]
> [c000003f80bc7a80] [c00000000000b424] .do_one_initcall+0x144/0x1f0
> [c000003f80bc7b70] [c0000000001244a0] .load_module+0x1f30/0x2700
> [c000003f80bc7d40] [c000000000124e80] .SyS_finit_module+0xc0/0x110
> [c000003f80bc7e30] [c000000000009dd4] syscall_exit+0x0/0x98
>
> Reported-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>   drivers/net/ethernet/broadcom/tg3.c |    3 ++-
>   1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index aee1b9a..ddebc7a 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -17625,7 +17625,8 @@ err_out_free_res:
>   	pci_release_regions(pdev);
>
>   err_out_disable_pdev:
> -	pci_disable_device(pdev);
> +	if (pci_is_enabled(pdev))
> +		pci_disable_device(pdev);
>   	pci_set_drvdata(pdev, NULL);
>   	return err;
>   }
>

Acked-by: Nithin Nayak Sujir <nsujir@broadcom.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] net/tg3: Fix kernel crash
  2013-07-24  9:25 [PATCH 1/2] net/tg3: Fix kernel crash Gavin Shan
  2013-07-24  9:25 ` [PATCH 2/2] net/tg3: Fix warning from pci_disable_device() Gavin Shan
@ 2013-07-24 12:27 ` Nithin Nayak Sujir
  2013-07-26 21:28   ` David Miller
  1 sibling, 1 reply; 6+ messages in thread
From: Nithin Nayak Sujir @ 2013-07-24 12:27 UTC (permalink / raw)
  To: Gavin Shan; +Cc: netdev, mchan, davem



On 7/24/2013 2:25 AM, Gavin Shan wrote:
> While EEH error happens, we might not have network device instance
> (struct net_device) yet. So we can't access the instance safely and
> check its link state, which causes kernel crash. The patch fixes it.
>
> EEH: Frozen PE#2 on PHB#3 detected
> EEH: This PCI device has failed 1 times in the last hour
> EEH: Notify device drivers to shutdown
> (NULL net_device): PCI I/O error detected
> Unable to handle kernel paging request for data at address 0x00000048
> Faulting instruction address: 0xd00000001c9387a8
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA PowerNV
> :
> NIP [d00000001c9387a8] .tg3_io_error_detected+0x78/0x2a0 [tg3]
> LR [d00000001c9387a4] .tg3_io_error_detected+0x74/0x2a0 [tg3]
> Call Trace:
> [c000003f93a0f960] [d00000001c9387a4] .tg3_io_error_detected+0x74/0x2a0 [tg3]
> [c000003f93a0fa30] [c00000000003844c] .eeh_report_error+0xac/0x120
> [c000003f93a0fac0] [c0000000000371bc] .eeh_pe_dev_traverse+0x8c/0x150
> [c000003f93a0fb60] [c000000000038858] .eeh_handle_normal_event+0x128/0x3d0
> [c000003f93a0fbf0] [c000000000038db8] .eeh_handle_event+0x2b8/0x2c0
> [c000003f93a0fc90] [c000000000038e80] .eeh_event_handler+0xc0/0x170
> [c000003f93a0fd30] [c0000000000cc000] .kthread+0xf0/0x100
> [c000003f93a0fe30] [c00000000000a0dc] .ret_from_kernel_thread+0x5c/0x80
>
> Reported-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>   drivers/net/ethernet/broadcom/tg3.c |    3 ++-
>   1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index d964f30..aee1b9a 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -17773,7 +17773,8 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev,
>
>   	rtnl_lock();
>
> -	if (!netif_running(netdev))
> +	/* We probably don't have netdev yet */
> +	if (!netdev || !netif_running(netdev))
>   		goto done;
>
>   	tg3_phy_stop(tp);
>

Acked-by: Nithin Nayak Sujir <nsujir@broadcom.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] net/tg3: Fix kernel crash
  2013-07-24 12:27 ` [PATCH 1/2] net/tg3: Fix kernel crash Nithin Nayak Sujir
@ 2013-07-26 21:28   ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2013-07-26 21:28 UTC (permalink / raw)
  To: nsujir; +Cc: shangw, netdev, mchan

From: "Nithin Nayak Sujir" <nsujir@broadcom.com>
Date: Wed, 24 Jul 2013 05:27:44 -0700

> On 7/24/2013 2:25 AM, Gavin Shan wrote:
>> While EEH error happens, we might not have network device instance
>> (struct net_device) yet. So we can't access the instance safely and
>> check its link state, which causes kernel crash. The patch fixes it.
 ...
>> Reported-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
...
> Acked-by: Nithin Nayak Sujir <nsujir@broadcom.com>

Applied.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] net/tg3: Fix warning from pci_disable_device()
  2013-07-24 12:27   ` Nithin Nayak Sujir
@ 2013-07-26 21:28     ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2013-07-26 21:28 UTC (permalink / raw)
  To: nsujir; +Cc: shangw, netdev, mchan

From: "Nithin Nayak Sujir" <nsujir@broadcom.com>
Date: Wed, 24 Jul 2013 05:27:28 -0700

> On 7/24/2013 2:25 AM, Gavin Shan wrote:
>> The patch fixes following warning. The PCI device might have been
>> disabled somewhere else when we have EEH errors during early stage.
 ...
>> Reported-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
 ...
> Acked-by: Nithin Nayak Sujir <nsujir@broadcom.com>

Applied.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-07-26 21:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-24  9:25 [PATCH 1/2] net/tg3: Fix kernel crash Gavin Shan
2013-07-24  9:25 ` [PATCH 2/2] net/tg3: Fix warning from pci_disable_device() Gavin Shan
2013-07-24 12:27   ` Nithin Nayak Sujir
2013-07-26 21:28     ` David Miller
2013-07-24 12:27 ` [PATCH 1/2] net/tg3: Fix kernel crash Nithin Nayak Sujir
2013-07-26 21:28   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).