All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
@ 2019-04-17  8:13 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2019-04-17  8:13 UTC (permalink / raw)
  To: netdev, intel-wired-lan, linux-kernel, Jeff Kirsher
  Cc: Sasha Levin, Joseph Yasi, Aaron Brown, Alexander Duyck, e1000-devel

This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.

That change cased false-positive warning about hardware hang:

e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
   TDH                  <0>
   TDT                  <1>
   next_to_use          <1>
   next_to_clean        <0>
buffer_info[next_to_clean]:
   time_stamp           <fffba7a7>
   next_to_watch        <0>
   jiffies              <fffbb140>
   next_to_watch.status <0>
MAC Status             <40080080>
PHY Status             <7949>
PHY 1000BASE-T Status  <0>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Besides warning everything works fine.
Original issue will be fixed property in following patch.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
---
 drivers/net/ethernet/intel/e1000e/netdev.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7acc61e4f645..ba96e52aa8d1 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5309,13 +5309,8 @@ static void e1000_watchdog_task(struct work_struct *work)
 			/* 8000ES2LAN requires a Rx packet buffer work-around
 			 * on link down event; reset the controller to flush
 			 * the Rx packet buffer.
-			 *
-			 * If the link is lost the controller stops DMA, but
-			 * if there is queued Tx work it cannot be done.  So
-			 * reset the controller to flush the Tx packet buffers.
 			 */
-			if ((adapter->flags & FLAG_RX_NEEDS_RESTART) ||
-			    e1000_desc_unused(tx_ring) + 1 < tx_ring->count)
+			if (adapter->flags & FLAG_RX_NEEDS_RESTART)
 				adapter->flags |= FLAG_RESTART_NOW;
 			else
 				pm_schedule_suspend(netdev->dev.parent,
@@ -5338,6 +5333,14 @@ static void e1000_watchdog_task(struct work_struct *work)
 	adapter->gotc_old = adapter->stats.gotc;
 	spin_unlock(&adapter->stats64_lock);
 
+	/* If the link is lost the controller stops DMA, but
+	 * if there is queued Tx work it cannot be done.  So
+	 * reset the controller to flush the Tx packet buffers.
+	 */
+	if (!netif_carrier_ok(netdev) &&
+	    (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
+		adapter->flags |= FLAG_RESTART_NOW;
+
 	/* If reset is necessary, do it outside of interrupt context. */
 	if (adapter->flags & FLAG_RESTART_NOW) {
 		schedule_work(&adapter->reset_task);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
@ 2019-04-17  8:13 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2019-04-17  8:13 UTC (permalink / raw)
  To: intel-wired-lan

This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.

That change cased false-positive warning about hardware hang:

e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
   TDH                  <0>
   TDT                  <1>
   next_to_use          <1>
   next_to_clean        <0>
buffer_info[next_to_clean]:
   time_stamp           <fffba7a7>
   next_to_watch        <0>
   jiffies              <fffbb140>
   next_to_watch.status <0>
MAC Status             <40080080>
PHY Status             <7949>
PHY 1000BASE-T Status  <0>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Besides warning everything works fine.
Original issue will be fixed property in following patch.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
---
 drivers/net/ethernet/intel/e1000e/netdev.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7acc61e4f645..ba96e52aa8d1 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5309,13 +5309,8 @@ static void e1000_watchdog_task(struct work_struct *work)
 			/* 8000ES2LAN requires a Rx packet buffer work-around
 			 * on link down event; reset the controller to flush
 			 * the Rx packet buffer.
-			 *
-			 * If the link is lost the controller stops DMA, but
-			 * if there is queued Tx work it cannot be done.  So
-			 * reset the controller to flush the Tx packet buffers.
 			 */
-			if ((adapter->flags & FLAG_RX_NEEDS_RESTART) ||
-			    e1000_desc_unused(tx_ring) + 1 < tx_ring->count)
+			if (adapter->flags & FLAG_RX_NEEDS_RESTART)
 				adapter->flags |= FLAG_RESTART_NOW;
 			else
 				pm_schedule_suspend(netdev->dev.parent,
@@ -5338,6 +5333,14 @@ static void e1000_watchdog_task(struct work_struct *work)
 	adapter->gotc_old = adapter->stats.gotc;
 	spin_unlock(&adapter->stats64_lock);
 
+	/* If the link is lost the controller stops DMA, but
+	 * if there is queued Tx work it cannot be done.  So
+	 * reset the controller to flush the Tx packet buffers.
+	 */
+	if (!netif_carrier_ok(netdev) &&
+	    (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
+		adapter->flags |= FLAG_RESTART_NOW;
+
 	/* If reset is necessary, do it outside of interrupt context. */
 	if (adapter->flags & FLAG_RESTART_NOW) {
 		schedule_work(&adapter->reset_task);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] e1000e: start network tx queue only when link is up
  2019-04-17  8:13 ` [Intel-wired-lan] " Konstantin Khlebnikov
@ 2019-04-17  8:13   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2019-04-17  8:13 UTC (permalink / raw)
  To: netdev, intel-wired-lan, linux-kernel, Jeff Kirsher
  Cc: Sasha Levin, Joseph Yasi, Aaron Brown, Alexander Duyck, e1000-devel

Driver does not want to keep packets in tx queue when link is lost.
But present code only reset NIC to flush them, but does not prevent
queuing new packets. Moreover reset sequence itself could generate
new packets via netconsole and NIC falls into endless reset loop.

This patch wakes tx queue only when NIC is ready to send packets.

This is proper fix for problem addressed by commit 0f9e980bf5ee
("e1000e: fix cyclic resets at link up with active tx").

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ba96e52aa8d1..fe643d66aa10 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4209,7 +4209,7 @@ void e1000e_up(struct e1000_adapter *adapter)
 		e1000_configure_msix(adapter);
 	e1000_irq_enable(adapter);
 
-	netif_start_queue(adapter->netdev);
+	/* tx queue started by watchdog timer when link is up */
 
 	e1000e_trigger_lsc(adapter);
 }
@@ -4607,6 +4607,7 @@ int e1000e_open(struct net_device *netdev)
 	pm_runtime_get_sync(&pdev->dev);
 
 	netif_carrier_off(netdev);
+	netif_stop_queue(netdev);
 
 	/* allocate transmit descriptors */
 	err = e1000e_setup_tx_resources(adapter->tx_ring);
@@ -4667,7 +4668,6 @@ int e1000e_open(struct net_device *netdev)
 	e1000_irq_enable(adapter);
 
 	adapter->tx_hang_recheck = false;
-	netif_start_queue(netdev);
 
 	hw->mac.get_link_status = true;
 	pm_runtime_put(&pdev->dev);
@@ -5289,6 +5289,7 @@ static void e1000_watchdog_task(struct work_struct *work)
 			if (phy->ops.cfg_on_link_up)
 				phy->ops.cfg_on_link_up(hw);
 
+			netif_wake_queue(netdev);
 			netif_carrier_on(netdev);
 
 			if (!test_bit(__E1000_DOWN, &adapter->state))
@@ -5302,6 +5303,7 @@ static void e1000_watchdog_task(struct work_struct *work)
 			/* Link status message must follow this format */
 			pr_info("%s NIC Link is Down\n", adapter->netdev->name);
 			netif_carrier_off(netdev);
+			netif_stop_queue(netdev);
 			if (!test_bit(__E1000_DOWN, &adapter->state))
 				mod_timer(&adapter->phy_info_timer,
 					  round_jiffies(jiffies + 2 * HZ));


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 2/2] e1000e: start network tx queue only when link is up
@ 2019-04-17  8:13   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2019-04-17  8:13 UTC (permalink / raw)
  To: intel-wired-lan

Driver does not want to keep packets in tx queue when link is lost.
But present code only reset NIC to flush them, but does not prevent
queuing new packets. Moreover reset sequence itself could generate
new packets via netconsole and NIC falls into endless reset loop.

This patch wakes tx queue only when NIC is ready to send packets.

This is proper fix for problem addressed by commit 0f9e980bf5ee
("e1000e: fix cyclic resets at link up with active tx").

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ba96e52aa8d1..fe643d66aa10 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4209,7 +4209,7 @@ void e1000e_up(struct e1000_adapter *adapter)
 		e1000_configure_msix(adapter);
 	e1000_irq_enable(adapter);
 
-	netif_start_queue(adapter->netdev);
+	/* tx queue started by watchdog timer when link is up */
 
 	e1000e_trigger_lsc(adapter);
 }
@@ -4607,6 +4607,7 @@ int e1000e_open(struct net_device *netdev)
 	pm_runtime_get_sync(&pdev->dev);
 
 	netif_carrier_off(netdev);
+	netif_stop_queue(netdev);
 
 	/* allocate transmit descriptors */
 	err = e1000e_setup_tx_resources(adapter->tx_ring);
@@ -4667,7 +4668,6 @@ int e1000e_open(struct net_device *netdev)
 	e1000_irq_enable(adapter);
 
 	adapter->tx_hang_recheck = false;
-	netif_start_queue(netdev);
 
 	hw->mac.get_link_status = true;
 	pm_runtime_put(&pdev->dev);
@@ -5289,6 +5289,7 @@ static void e1000_watchdog_task(struct work_struct *work)
 			if (phy->ops.cfg_on_link_up)
 				phy->ops.cfg_on_link_up(hw);
 
+			netif_wake_queue(netdev);
 			netif_carrier_on(netdev);
 
 			if (!test_bit(__E1000_DOWN, &adapter->state))
@@ -5302,6 +5303,7 @@ static void e1000_watchdog_task(struct work_struct *work)
 			/* Link status message must follow this format */
 			pr_info("%s NIC Link is Down\n", adapter->netdev->name);
 			netif_carrier_off(netdev);
+			netif_stop_queue(netdev);
 			if (!test_bit(__E1000_DOWN, &adapter->state))
 				mod_timer(&adapter->phy_info_timer,
 					  round_jiffies(jiffies + 2 * HZ));


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
  2019-04-17  8:13 ` [Intel-wired-lan] " Konstantin Khlebnikov
  (?)
  (?)
@ 2019-04-23  0:49 ` Joseph Yasi
  -1 siblings, 0 replies; 18+ messages in thread
From: Joseph Yasi @ 2019-04-23  0:49 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Apr 17, 2019 at 4:13 AM Konstantin Khlebnikov <
khlebnikov@yandex-team.ru> wrote:

> This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.
>
> That change cased false-positive warning about hardware hang:
>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
>    TDH                  <0>
>    TDT                  <1>
>    next_to_use          <1>
>    next_to_clean        <0>
> buffer_info[next_to_clean]:
>    time_stamp           <fffba7a7>
>    next_to_watch        <0>
>    jiffies              <fffbb140>
>    next_to_watch.status <0>
> MAC Status             <40080080>
> PHY Status             <7949>
> PHY 1000BASE-T Status  <0>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
>
> Besides warning everything works fine.
> Original issue will be fixed property in following patch.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Reported-by: Joseph Yasi <joe.yasi@gmail.com>
>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>

> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 7acc61e4f645..ba96e52aa8d1 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -5309,13 +5309,8 @@ static void e1000_watchdog_task(struct work_struct
> *work)
>                         /* 8000ES2LAN requires a Rx packet buffer
> work-around
>                          * on link down event; reset the controller to
> flush
>                          * the Rx packet buffer.
> -                        *
> -                        * If the link is lost the controller stops DMA,
> but
> -                        * if there is queued Tx work it cannot be done.
> So
> -                        * reset the controller to flush the Tx packet
> buffers.
>                          */
> -                       if ((adapter->flags & FLAG_RX_NEEDS_RESTART) ||
> -                           e1000_desc_unused(tx_ring) + 1 <
> tx_ring->count)
> +                       if (adapter->flags & FLAG_RX_NEEDS_RESTART)
>                                 adapter->flags |= FLAG_RESTART_NOW;
>                         else
>                                 pm_schedule_suspend(netdev->dev.parent,
> @@ -5338,6 +5333,14 @@ static void e1000_watchdog_task(struct work_struct
> *work)
>         adapter->gotc_old = adapter->stats.gotc;
>         spin_unlock(&adapter->stats64_lock);
>
> +       /* If the link is lost the controller stops DMA, but
> +        * if there is queued Tx work it cannot be done.  So
> +        * reset the controller to flush the Tx packet buffers.
> +        */
> +       if (!netif_carrier_ok(netdev) &&
> +           (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
> +               adapter->flags |= FLAG_RESTART_NOW;
> +
>         /* If reset is necessary, do it outside of interrupt context. */
>         if (adapter->flags & FLAG_RESTART_NOW) {
>                 schedule_work(&adapter->reset_task);
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190422/778e89b0/attachment-0001.html>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 2/2] e1000e: start network tx queue only when link is up
  2019-04-17  8:13   ` [Intel-wired-lan] " Konstantin Khlebnikov
  (?)
@ 2019-04-23  0:49   ` Joseph Yasi
  -1 siblings, 0 replies; 18+ messages in thread
From: Joseph Yasi @ 2019-04-23  0:49 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Apr 17, 2019 at 4:13 AM Konstantin Khlebnikov <
khlebnikov@yandex-team.ru> wrote:

> Driver does not want to keep packets in tx queue when link is lost.
> But present code only reset NIC to flush them, but does not prevent
> queuing new packets. Moreover reset sequence itself could generate
> new packets via netconsole and NIC falls into endless reset loop.
>
> This patch wakes tx queue only when NIC is ready to send packets.
>
> This is proper fix for problem addressed by commit 0f9e980bf5ee
> ("e1000e: fix cyclic resets at link up with active tx").
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>

> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ba96e52aa8d1..fe643d66aa10 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -4209,7 +4209,7 @@ void e1000e_up(struct e1000_adapter *adapter)
>                 e1000_configure_msix(adapter);
>         e1000_irq_enable(adapter);
>
> -       netif_start_queue(adapter->netdev);
> +       /* tx queue started by watchdog timer when link is up */
>
>         e1000e_trigger_lsc(adapter);
>  }
> @@ -4607,6 +4607,7 @@ int e1000e_open(struct net_device *netdev)
>         pm_runtime_get_sync(&pdev->dev);
>
>         netif_carrier_off(netdev);
> +       netif_stop_queue(netdev);
>
>         /* allocate transmit descriptors */
>         err = e1000e_setup_tx_resources(adapter->tx_ring);
> @@ -4667,7 +4668,6 @@ int e1000e_open(struct net_device *netdev)
>         e1000_irq_enable(adapter);
>
>         adapter->tx_hang_recheck = false;
> -       netif_start_queue(netdev);
>
>         hw->mac.get_link_status = true;
>         pm_runtime_put(&pdev->dev);
> @@ -5289,6 +5289,7 @@ static void e1000_watchdog_task(struct work_struct
> *work)
>                         if (phy->ops.cfg_on_link_up)
>                                 phy->ops.cfg_on_link_up(hw);
>
> +                       netif_wake_queue(netdev);
>                         netif_carrier_on(netdev);
>
>                         if (!test_bit(__E1000_DOWN, &adapter->state))
> @@ -5302,6 +5303,7 @@ static void e1000_watchdog_task(struct work_struct
> *work)
>                         /* Link status message must follow this format */
>                         pr_info("%s NIC Link is Down\n",
> adapter->netdev->name);
>                         netif_carrier_off(netdev);
> +                       netif_stop_queue(netdev);
>                         if (!test_bit(__E1000_DOWN, &adapter->state))
>                                 mod_timer(&adapter->phy_info_timer,
>                                           round_jiffies(jiffies + 2 * HZ));
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190422/8d0147c7/attachment-0001.html>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] e1000e: start network tx queue only when link is up
  2019-04-17  8:13   ` [Intel-wired-lan] " Konstantin Khlebnikov
@ 2019-04-23  0:50     ` Joseph Yasi
  -1 siblings, 0 replies; 18+ messages in thread
From: Joseph Yasi @ 2019-04-23  0:50 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: netdev, intel-wired-lan, linux-kernel, Jeff Kirsher, Sasha Levin,
	Aaron Brown, Alexander Duyck, e1000-devel

On Wed, Apr 17, 2019 at 4:13 AM Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
>
> Driver does not want to keep packets in tx queue when link is lost.
> But present code only reset NIC to flush them, but does not prevent
> queuing new packets. Moreover reset sequence itself could generate
> new packets via netconsole and NIC falls into endless reset loop.
>
> This patch wakes tx queue only when NIC is ready to send packets.
>
> This is proper fix for problem addressed by commit 0f9e980bf5ee
> ("e1000e: fix cyclic resets at link up with active tx").
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ba96e52aa8d1..fe643d66aa10 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -4209,7 +4209,7 @@ void e1000e_up(struct e1000_adapter *adapter)
>                 e1000_configure_msix(adapter);
>         e1000_irq_enable(adapter);
>
> -       netif_start_queue(adapter->netdev);
> +       /* tx queue started by watchdog timer when link is up */
>
>         e1000e_trigger_lsc(adapter);
>  }
> @@ -4607,6 +4607,7 @@ int e1000e_open(struct net_device *netdev)
>         pm_runtime_get_sync(&pdev->dev);
>
>         netif_carrier_off(netdev);
> +       netif_stop_queue(netdev);
>
>         /* allocate transmit descriptors */
>         err = e1000e_setup_tx_resources(adapter->tx_ring);
> @@ -4667,7 +4668,6 @@ int e1000e_open(struct net_device *netdev)
>         e1000_irq_enable(adapter);
>
>         adapter->tx_hang_recheck = false;
> -       netif_start_queue(netdev);
>
>         hw->mac.get_link_status = true;
>         pm_runtime_put(&pdev->dev);
> @@ -5289,6 +5289,7 @@ static void e1000_watchdog_task(struct work_struct *work)
>                         if (phy->ops.cfg_on_link_up)
>                                 phy->ops.cfg_on_link_up(hw);
>
> +                       netif_wake_queue(netdev);
>                         netif_carrier_on(netdev);
>
>                         if (!test_bit(__E1000_DOWN, &adapter->state))
> @@ -5302,6 +5303,7 @@ static void e1000_watchdog_task(struct work_struct *work)
>                         /* Link status message must follow this format */
>                         pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>                         netif_carrier_off(netdev);
> +                       netif_stop_queue(netdev);
>                         if (!test_bit(__E1000_DOWN, &adapter->state))
>                                 mod_timer(&adapter->phy_info_timer,
>                                           round_jiffies(jiffies + 2 * HZ));
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 2/2] e1000e: start network tx queue only when link is up
@ 2019-04-23  0:50     ` Joseph Yasi
  0 siblings, 0 replies; 18+ messages in thread
From: Joseph Yasi @ 2019-04-23  0:50 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Apr 17, 2019 at 4:13 AM Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
>
> Driver does not want to keep packets in tx queue when link is lost.
> But present code only reset NIC to flush them, but does not prevent
> queuing new packets. Moreover reset sequence itself could generate
> new packets via netconsole and NIC falls into endless reset loop.
>
> This patch wakes tx queue only when NIC is ready to send packets.
>
> This is proper fix for problem addressed by commit 0f9e980bf5ee
> ("e1000e: fix cyclic resets at link up with active tx").
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ba96e52aa8d1..fe643d66aa10 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -4209,7 +4209,7 @@ void e1000e_up(struct e1000_adapter *adapter)
>                 e1000_configure_msix(adapter);
>         e1000_irq_enable(adapter);
>
> -       netif_start_queue(adapter->netdev);
> +       /* tx queue started by watchdog timer when link is up */
>
>         e1000e_trigger_lsc(adapter);
>  }
> @@ -4607,6 +4607,7 @@ int e1000e_open(struct net_device *netdev)
>         pm_runtime_get_sync(&pdev->dev);
>
>         netif_carrier_off(netdev);
> +       netif_stop_queue(netdev);
>
>         /* allocate transmit descriptors */
>         err = e1000e_setup_tx_resources(adapter->tx_ring);
> @@ -4667,7 +4668,6 @@ int e1000e_open(struct net_device *netdev)
>         e1000_irq_enable(adapter);
>
>         adapter->tx_hang_recheck = false;
> -       netif_start_queue(netdev);
>
>         hw->mac.get_link_status = true;
>         pm_runtime_put(&pdev->dev);
> @@ -5289,6 +5289,7 @@ static void e1000_watchdog_task(struct work_struct *work)
>                         if (phy->ops.cfg_on_link_up)
>                                 phy->ops.cfg_on_link_up(hw);
>
> +                       netif_wake_queue(netdev);
>                         netif_carrier_on(netdev);
>
>                         if (!test_bit(__E1000_DOWN, &adapter->state))
> @@ -5302,6 +5303,7 @@ static void e1000_watchdog_task(struct work_struct *work)
>                         /* Link status message must follow this format */
>                         pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>                         netif_carrier_off(netdev);
> +                       netif_stop_queue(netdev);
>                         if (!test_bit(__E1000_DOWN, &adapter->state))
>                                 mod_timer(&adapter->phy_info_timer,
>                                           round_jiffies(jiffies + 2 * HZ));
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
  2019-04-17  8:13 ` [Intel-wired-lan] " Konstantin Khlebnikov
@ 2019-04-23  0:51   ` Joseph Yasi
  -1 siblings, 0 replies; 18+ messages in thread
From: Joseph Yasi @ 2019-04-23  0:51 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: netdev, intel-wired-lan, linux-kernel, Jeff Kirsher, Sasha Levin,
	Aaron Brown, Alexander Duyck, e1000-devel

On Wed, Apr 17, 2019 at 4:13 AM Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
>
> This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.
>
> That change cased false-positive warning about hardware hang:
>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
>    TDH                  <0>
>    TDT                  <1>
>    next_to_use          <1>
>    next_to_clean        <0>
> buffer_info[next_to_clean]:
>    time_stamp           <fffba7a7>
>    next_to_watch        <0>
>    jiffies              <fffbb140>
>    next_to_watch.status <0>
> MAC Status             <40080080>
> PHY Status             <7949>
> PHY 1000BASE-T Status  <0>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
>
> Besides warning everything works fine.
> Original issue will be fixed property in following patch.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 7acc61e4f645..ba96e52aa8d1 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -5309,13 +5309,8 @@ static void e1000_watchdog_task(struct work_struct *work)
>                         /* 8000ES2LAN requires a Rx packet buffer work-around
>                          * on link down event; reset the controller to flush
>                          * the Rx packet buffer.
> -                        *
> -                        * If the link is lost the controller stops DMA, but
> -                        * if there is queued Tx work it cannot be done.  So
> -                        * reset the controller to flush the Tx packet buffers.
>                          */
> -                       if ((adapter->flags & FLAG_RX_NEEDS_RESTART) ||
> -                           e1000_desc_unused(tx_ring) + 1 < tx_ring->count)
> +                       if (adapter->flags & FLAG_RX_NEEDS_RESTART)
>                                 adapter->flags |= FLAG_RESTART_NOW;
>                         else
>                                 pm_schedule_suspend(netdev->dev.parent,
> @@ -5338,6 +5333,14 @@ static void e1000_watchdog_task(struct work_struct *work)
>         adapter->gotc_old = adapter->stats.gotc;
>         spin_unlock(&adapter->stats64_lock);
>
> +       /* If the link is lost the controller stops DMA, but
> +        * if there is queued Tx work it cannot be done.  So
> +        * reset the controller to flush the Tx packet buffers.
> +        */
> +       if (!netif_carrier_ok(netdev) &&
> +           (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
> +               adapter->flags |= FLAG_RESTART_NOW;
> +
>         /* If reset is necessary, do it outside of interrupt context. */
>         if (adapter->flags & FLAG_RESTART_NOW) {
>                 schedule_work(&adapter->reset_task);
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
@ 2019-04-23  0:51   ` Joseph Yasi
  0 siblings, 0 replies; 18+ messages in thread
From: Joseph Yasi @ 2019-04-23  0:51 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Apr 17, 2019 at 4:13 AM Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
>
> This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.
>
> That change cased false-positive warning about hardware hang:
>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
>    TDH                  <0>
>    TDT                  <1>
>    next_to_use          <1>
>    next_to_clean        <0>
> buffer_info[next_to_clean]:
>    time_stamp           <fffba7a7>
>    next_to_watch        <0>
>    jiffies              <fffbb140>
>    next_to_watch.status <0>
> MAC Status             <40080080>
> PHY Status             <7949>
> PHY 1000BASE-T Status  <0>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
>
> Besides warning everything works fine.
> Original issue will be fixed property in following patch.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 7acc61e4f645..ba96e52aa8d1 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -5309,13 +5309,8 @@ static void e1000_watchdog_task(struct work_struct *work)
>                         /* 8000ES2LAN requires a Rx packet buffer work-around
>                          * on link down event; reset the controller to flush
>                          * the Rx packet buffer.
> -                        *
> -                        * If the link is lost the controller stops DMA, but
> -                        * if there is queued Tx work it cannot be done.  So
> -                        * reset the controller to flush the Tx packet buffers.
>                          */
> -                       if ((adapter->flags & FLAG_RX_NEEDS_RESTART) ||
> -                           e1000_desc_unused(tx_ring) + 1 < tx_ring->count)
> +                       if (adapter->flags & FLAG_RX_NEEDS_RESTART)
>                                 adapter->flags |= FLAG_RESTART_NOW;
>                         else
>                                 pm_schedule_suspend(netdev->dev.parent,
> @@ -5338,6 +5333,14 @@ static void e1000_watchdog_task(struct work_struct *work)
>         adapter->gotc_old = adapter->stats.gotc;
>         spin_unlock(&adapter->stats64_lock);
>
> +       /* If the link is lost the controller stops DMA, but
> +        * if there is queued Tx work it cannot be done.  So
> +        * reset the controller to flush the Tx packet buffers.
> +        */
> +       if (!netif_carrier_ok(netdev) &&
> +           (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
> +               adapter->flags |= FLAG_RESTART_NOW;
> +
>         /* If reset is necessary, do it outside of interrupt context. */
>         if (adapter->flags & FLAG_RESTART_NOW) {
>                 schedule_work(&adapter->reset_task);
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
  2019-04-17  8:13 ` [Intel-wired-lan] " Konstantin Khlebnikov
@ 2019-04-26  0:00   ` Brown, Aaron F
  -1 siblings, 0 replies; 18+ messages in thread
From: Brown, Aaron F @ 2019-04-26  0:00 UTC (permalink / raw)
  To: Konstantin Khlebnikov, netdev, intel-wired-lan, linux-kernel,
	Kirsher, Jeffrey T
  Cc: Sasha Levin, Joseph Yasi, Alexander Duyck, e1000-devel

> From: Konstantin Khlebnikov [mailto:khlebnikov@yandex-team.ru]
> Sent: Wednesday, April 17, 2019 1:13 AM
> To: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; linux-
> kernel@vger.kernel.org; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Sasha Levin <sashal@kernel.org>; Joseph Yasi <joe.yasi@gmail.com>;
> Brown, Aaron F <aaron.f.brown@intel.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; e1000-devel@lists.sourceforge.net
> Subject: [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
> 
> This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.
> 
> That change cased false-positive warning about hardware hang:
> 
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
>    TDH                  <0>
>    TDT                  <1>
>    next_to_use          <1>
>    next_to_clean        <0>
> buffer_info[next_to_clean]:
>    time_stamp           <fffba7a7>
>    next_to_watch        <0>
>    jiffies              <fffbb140>
>    next_to_watch.status <0>
> MAC Status             <40080080>
> PHY Status             <7949>
> PHY 1000BASE-T Status  <0>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> 
> Besides warning everything works fine.
> Original issue will be fixed property in following patch.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Reported-by: Joseph Yasi <joe.yasi@gmail.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>
This was more of a regression check as I never did manage to replicate the tx hang, even with seemingly the same hardware.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
@ 2019-04-26  0:00   ` Brown, Aaron F
  0 siblings, 0 replies; 18+ messages in thread
From: Brown, Aaron F @ 2019-04-26  0:00 UTC (permalink / raw)
  To: intel-wired-lan

> From: Konstantin Khlebnikov [mailto:khlebnikov at yandex-team.ru]
> Sent: Wednesday, April 17, 2019 1:13 AM
> To: netdev at vger.kernel.org; intel-wired-lan at lists.osuosl.org; linux-
> kernel at vger.kernel.org; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Sasha Levin <sashal@kernel.org>; Joseph Yasi <joe.yasi@gmail.com>;
> Brown, Aaron F <aaron.f.brown@intel.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; e1000-devel at lists.sourceforge.net
> Subject: [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
> 
> This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.
> 
> That change cased false-positive warning about hardware hang:
> 
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
>    TDH                  <0>
>    TDT                  <1>
>    next_to_use          <1>
>    next_to_clean        <0>
> buffer_info[next_to_clean]:
>    time_stamp           <fffba7a7>
>    next_to_watch        <0>
>    jiffies              <fffbb140>
>    next_to_watch.status <0>
> MAC Status             <40080080>
> PHY Status             <7949>
> PHY 1000BASE-T Status  <0>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> 
> Besides warning everything works fine.
> Original issue will be fixed property in following patch.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Reported-by: Joseph Yasi <joe.yasi@gmail.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>
This was more of a regression check as I never did manage to replicate the tx hang, even with seemingly the same hardware.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/2] e1000e: start network tx queue only when link is up
  2019-04-17  8:13   ` [Intel-wired-lan] " Konstantin Khlebnikov
@ 2019-04-26  0:05     ` Brown, Aaron F
  -1 siblings, 0 replies; 18+ messages in thread
From: Brown, Aaron F @ 2019-04-26  0:05 UTC (permalink / raw)
  To: Konstantin Khlebnikov, netdev, intel-wired-lan, linux-kernel,
	Kirsher, Jeffrey T
  Cc: Sasha Levin, Joseph Yasi, Alexander Duyck, e1000-devel

> From: Konstantin Khlebnikov [mailto:khlebnikov@yandex-team.ru]
> Sent: Wednesday, April 17, 2019 1:13 AM
> To: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; linux-
> kernel@vger.kernel.org; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Sasha Levin <sashal@kernel.org>; Joseph Yasi <joe.yasi@gmail.com>;
> Brown, Aaron F <aaron.f.brown@intel.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; e1000-devel@lists.sourceforge.net
> Subject: [PATCH 2/2] e1000e: start network tx queue only when link is up
> 
> Driver does not want to keep packets in tx queue when link is lost.
> But present code only reset NIC to flush them, but does not prevent
> queuing new packets. Moreover reset sequence itself could generate
> new packets via netconsole and NIC falls into endless reset loop.
> 
> This patch wakes tx queue only when NIC is ready to send packets.
> 
> This is proper fix for problem addressed by commit 0f9e980bf5ee
> ("e1000e: fix cyclic resets at link up with active tx").
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Again, more of a regression check than a test that the patch solves the problem as I did not manage to trigger the hang.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 2/2] e1000e: start network tx queue only when link is up
@ 2019-04-26  0:05     ` Brown, Aaron F
  0 siblings, 0 replies; 18+ messages in thread
From: Brown, Aaron F @ 2019-04-26  0:05 UTC (permalink / raw)
  To: intel-wired-lan

> From: Konstantin Khlebnikov [mailto:khlebnikov at yandex-team.ru]
> Sent: Wednesday, April 17, 2019 1:13 AM
> To: netdev at vger.kernel.org; intel-wired-lan at lists.osuosl.org; linux-
> kernel at vger.kernel.org; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Sasha Levin <sashal@kernel.org>; Joseph Yasi <joe.yasi@gmail.com>;
> Brown, Aaron F <aaron.f.brown@intel.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; e1000-devel at lists.sourceforge.net
> Subject: [PATCH 2/2] e1000e: start network tx queue only when link is up
> 
> Driver does not want to keep packets in tx queue when link is lost.
> But present code only reset NIC to flush them, but does not prevent
> queuing new packets. Moreover reset sequence itself could generate
> new packets via netconsole and NIC falls into endless reset loop.
> 
> This patch wakes tx queue only when NIC is ready to send packets.
> 
> This is proper fix for problem addressed by commit 0f9e980bf5ee
> ("e1000e: fix cyclic resets at link up with active tx").
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Again, more of a regression check than a test that the patch solves the problem as I did not manage to trigger the hang.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
  2019-04-17  8:13 ` [Intel-wired-lan] " Konstantin Khlebnikov
@ 2019-04-26 14:11   ` Oleksandr Natalenko
  -1 siblings, 0 replies; 18+ messages in thread
From: Oleksandr Natalenko @ 2019-04-26 14:11 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: netdev, intel-wired-lan, linux-kernel, Jeff Kirsher, Sasha Levin,
	Joseph Yasi, Aaron Brown, Alexander Duyck, e1000-devel

On Wed, Apr 17, 2019 at 11:13:16AM +0300, Konstantin Khlebnikov wrote:
> This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.
> 
> That change cased false-positive warning about hardware hang:
> 
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
>    TDH                  <0>
>    TDT                  <1>
>    next_to_use          <1>
>    next_to_clean        <0>
> buffer_info[next_to_clean]:
>    time_stamp           <fffba7a7>
>    next_to_watch        <0>
>    jiffies              <fffbb140>
>    next_to_watch.status <0>
> MAC Status             <40080080>
> PHY Status             <7949>
> PHY 1000BASE-T Status  <0>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> 
> Besides warning everything works fine.
> Original issue will be fixed property in following patch.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Reported-by: Joseph Yasi <joe.yasi@gmail.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 7acc61e4f645..ba96e52aa8d1 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -5309,13 +5309,8 @@ static void e1000_watchdog_task(struct work_struct *work)
>  			/* 8000ES2LAN requires a Rx packet buffer work-around
>  			 * on link down event; reset the controller to flush
>  			 * the Rx packet buffer.
> -			 *
> -			 * If the link is lost the controller stops DMA, but
> -			 * if there is queued Tx work it cannot be done.  So
> -			 * reset the controller to flush the Tx packet buffers.
>  			 */
> -			if ((adapter->flags & FLAG_RX_NEEDS_RESTART) ||
> -			    e1000_desc_unused(tx_ring) + 1 < tx_ring->count)
> +			if (adapter->flags & FLAG_RX_NEEDS_RESTART)
>  				adapter->flags |= FLAG_RESTART_NOW;
>  			else
>  				pm_schedule_suspend(netdev->dev.parent,
> @@ -5338,6 +5333,14 @@ static void e1000_watchdog_task(struct work_struct *work)
>  	adapter->gotc_old = adapter->stats.gotc;
>  	spin_unlock(&adapter->stats64_lock);
>  
> +	/* If the link is lost the controller stops DMA, but
> +	 * if there is queued Tx work it cannot be done.  So
> +	 * reset the controller to flush the Tx packet buffers.
> +	 */
> +	if (!netif_carrier_ok(netdev) &&
> +	    (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
> +		adapter->flags |= FLAG_RESTART_NOW;
> +
>  	/* If reset is necessary, do it outside of interrupt context. */
>  	if (adapter->flags & FLAG_RESTART_NOW) {
>  		schedule_work(&adapter->reset_task);
> 

Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx"
@ 2019-04-26 14:11   ` Oleksandr Natalenko
  0 siblings, 0 replies; 18+ messages in thread
From: Oleksandr Natalenko @ 2019-04-26 14:11 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Apr 17, 2019 at 11:13:16AM +0300, Konstantin Khlebnikov wrote:
> This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.
> 
> That change cased false-positive warning about hardware hang:
> 
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
>    TDH                  <0>
>    TDT                  <1>
>    next_to_use          <1>
>    next_to_clean        <0>
> buffer_info[next_to_clean]:
>    time_stamp           <fffba7a7>
>    next_to_watch        <0>
>    jiffies              <fffbb140>
>    next_to_watch.status <0>
> MAC Status             <40080080>
> PHY Status             <7949>
> PHY 1000BASE-T Status  <0>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> 
> Besides warning everything works fine.
> Original issue will be fixed property in following patch.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Reported-by: Joseph Yasi <joe.yasi@gmail.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 7acc61e4f645..ba96e52aa8d1 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -5309,13 +5309,8 @@ static void e1000_watchdog_task(struct work_struct *work)
>  			/* 8000ES2LAN requires a Rx packet buffer work-around
>  			 * on link down event; reset the controller to flush
>  			 * the Rx packet buffer.
> -			 *
> -			 * If the link is lost the controller stops DMA, but
> -			 * if there is queued Tx work it cannot be done.  So
> -			 * reset the controller to flush the Tx packet buffers.
>  			 */
> -			if ((adapter->flags & FLAG_RX_NEEDS_RESTART) ||
> -			    e1000_desc_unused(tx_ring) + 1 < tx_ring->count)
> +			if (adapter->flags & FLAG_RX_NEEDS_RESTART)
>  				adapter->flags |= FLAG_RESTART_NOW;
>  			else
>  				pm_schedule_suspend(netdev->dev.parent,
> @@ -5338,6 +5333,14 @@ static void e1000_watchdog_task(struct work_struct *work)
>  	adapter->gotc_old = adapter->stats.gotc;
>  	spin_unlock(&adapter->stats64_lock);
>  
> +	/* If the link is lost the controller stops DMA, but
> +	 * if there is queued Tx work it cannot be done.  So
> +	 * reset the controller to flush the Tx packet buffers.
> +	 */
> +	if (!netif_carrier_ok(netdev) &&
> +	    (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
> +		adapter->flags |= FLAG_RESTART_NOW;
> +
>  	/* If reset is necessary, do it outside of interrupt context. */
>  	if (adapter->flags & FLAG_RESTART_NOW) {
>  		schedule_work(&adapter->reset_task);
> 

Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] e1000e: start network tx queue only when link is up
  2019-04-17  8:13   ` [Intel-wired-lan] " Konstantin Khlebnikov
@ 2019-04-26 14:12     ` Oleksandr Natalenko
  -1 siblings, 0 replies; 18+ messages in thread
From: Oleksandr Natalenko @ 2019-04-26 14:12 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: netdev, intel-wired-lan, linux-kernel, Jeff Kirsher, Sasha Levin,
	Joseph Yasi, Aaron Brown, Alexander Duyck, e1000-devel

On Wed, Apr 17, 2019 at 11:13:20AM +0300, Konstantin Khlebnikov wrote:
> Driver does not want to keep packets in tx queue when link is lost.
> But present code only reset NIC to flush them, but does not prevent
> queuing new packets. Moreover reset sequence itself could generate
> new packets via netconsole and NIC falls into endless reset loop.
> 
> This patch wakes tx queue only when NIC is ready to send packets.
> 
> This is proper fix for problem addressed by commit 0f9e980bf5ee
> ("e1000e: fix cyclic resets at link up with active tx").
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ba96e52aa8d1..fe643d66aa10 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -4209,7 +4209,7 @@ void e1000e_up(struct e1000_adapter *adapter)
>  		e1000_configure_msix(adapter);
>  	e1000_irq_enable(adapter);
>  
> -	netif_start_queue(adapter->netdev);
> +	/* tx queue started by watchdog timer when link is up */
>  
>  	e1000e_trigger_lsc(adapter);
>  }
> @@ -4607,6 +4607,7 @@ int e1000e_open(struct net_device *netdev)
>  	pm_runtime_get_sync(&pdev->dev);
>  
>  	netif_carrier_off(netdev);
> +	netif_stop_queue(netdev);
>  
>  	/* allocate transmit descriptors */
>  	err = e1000e_setup_tx_resources(adapter->tx_ring);
> @@ -4667,7 +4668,6 @@ int e1000e_open(struct net_device *netdev)
>  	e1000_irq_enable(adapter);
>  
>  	adapter->tx_hang_recheck = false;
> -	netif_start_queue(netdev);
>  
>  	hw->mac.get_link_status = true;
>  	pm_runtime_put(&pdev->dev);
> @@ -5289,6 +5289,7 @@ static void e1000_watchdog_task(struct work_struct *work)
>  			if (phy->ops.cfg_on_link_up)
>  				phy->ops.cfg_on_link_up(hw);
>  
> +			netif_wake_queue(netdev);
>  			netif_carrier_on(netdev);
>  
>  			if (!test_bit(__E1000_DOWN, &adapter->state))
> @@ -5302,6 +5303,7 @@ static void e1000_watchdog_task(struct work_struct *work)
>  			/* Link status message must follow this format */
>  			pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>  			netif_carrier_off(netdev);
> +			netif_stop_queue(netdev);
>  			if (!test_bit(__E1000_DOWN, &adapter->state))
>  				mod_timer(&adapter->phy_info_timer,
>  					  round_jiffies(jiffies + 2 * HZ));
> 

Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Intel-wired-lan] [PATCH 2/2] e1000e: start network tx queue only when link is up
@ 2019-04-26 14:12     ` Oleksandr Natalenko
  0 siblings, 0 replies; 18+ messages in thread
From: Oleksandr Natalenko @ 2019-04-26 14:12 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Apr 17, 2019 at 11:13:20AM +0300, Konstantin Khlebnikov wrote:
> Driver does not want to keep packets in tx queue when link is lost.
> But present code only reset NIC to flush them, but does not prevent
> queuing new packets. Moreover reset sequence itself could generate
> new packets via netconsole and NIC falls into endless reset loop.
> 
> This patch wakes tx queue only when NIC is ready to send packets.
> 
> This is proper fix for problem addressed by commit 0f9e980bf5ee
> ("e1000e: fix cyclic resets at link up with active tx").
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ba96e52aa8d1..fe643d66aa10 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -4209,7 +4209,7 @@ void e1000e_up(struct e1000_adapter *adapter)
>  		e1000_configure_msix(adapter);
>  	e1000_irq_enable(adapter);
>  
> -	netif_start_queue(adapter->netdev);
> +	/* tx queue started by watchdog timer when link is up */
>  
>  	e1000e_trigger_lsc(adapter);
>  }
> @@ -4607,6 +4607,7 @@ int e1000e_open(struct net_device *netdev)
>  	pm_runtime_get_sync(&pdev->dev);
>  
>  	netif_carrier_off(netdev);
> +	netif_stop_queue(netdev);
>  
>  	/* allocate transmit descriptors */
>  	err = e1000e_setup_tx_resources(adapter->tx_ring);
> @@ -4667,7 +4668,6 @@ int e1000e_open(struct net_device *netdev)
>  	e1000_irq_enable(adapter);
>  
>  	adapter->tx_hang_recheck = false;
> -	netif_start_queue(netdev);
>  
>  	hw->mac.get_link_status = true;
>  	pm_runtime_put(&pdev->dev);
> @@ -5289,6 +5289,7 @@ static void e1000_watchdog_task(struct work_struct *work)
>  			if (phy->ops.cfg_on_link_up)
>  				phy->ops.cfg_on_link_up(hw);
>  
> +			netif_wake_queue(netdev);
>  			netif_carrier_on(netdev);
>  
>  			if (!test_bit(__E1000_DOWN, &adapter->state))
> @@ -5302,6 +5303,7 @@ static void e1000_watchdog_task(struct work_struct *work)
>  			/* Link status message must follow this format */
>  			pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>  			netif_carrier_off(netdev);
> +			netif_stop_queue(netdev);
>  			if (!test_bit(__E1000_DOWN, &adapter->state))
>  				mod_timer(&adapter->phy_info_timer,
>  					  round_jiffies(jiffies + 2 * HZ));
> 

Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-04-26 14:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-17  8:13 [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx" Konstantin Khlebnikov
2019-04-17  8:13 ` [Intel-wired-lan] " Konstantin Khlebnikov
2019-04-17  8:13 ` [PATCH 2/2] e1000e: start network tx queue only when link is up Konstantin Khlebnikov
2019-04-17  8:13   ` [Intel-wired-lan] " Konstantin Khlebnikov
2019-04-23  0:49   ` Joseph Yasi
2019-04-23  0:50   ` Joseph Yasi
2019-04-23  0:50     ` [Intel-wired-lan] " Joseph Yasi
2019-04-26  0:05   ` Brown, Aaron F
2019-04-26  0:05     ` [Intel-wired-lan] " Brown, Aaron F
2019-04-26 14:12   ` Oleksandr Natalenko
2019-04-26 14:12     ` [Intel-wired-lan] " Oleksandr Natalenko
2019-04-23  0:49 ` [Intel-wired-lan] [PATCH 1/2] Revert "e1000e: fix cyclic resets at link up with active tx" Joseph Yasi
2019-04-23  0:51 ` Joseph Yasi
2019-04-23  0:51   ` [Intel-wired-lan] " Joseph Yasi
2019-04-26  0:00 ` Brown, Aaron F
2019-04-26  0:00   ` [Intel-wired-lan] " Brown, Aaron F
2019-04-26 14:11 ` Oleksandr Natalenko
2019-04-26 14:11   ` [Intel-wired-lan] " Oleksandr Natalenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.