netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] e1000: Reset rx ring index on receive overrun
@ 2012-05-17 23:01 Samuel Thibault
  2012-05-17 23:22 ` Dave, Tushar N
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Thibault @ 2012-05-17 23:01 UTC (permalink / raw)
  To: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny,
	Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck,
	John Ronciak, David S. Miller, Jiri Pirko, Dean Nelson,
	e1000-devel, netdev
  Cc: linux-kernel

At high traffic rate, the rx ring may get completely filled before we
manage to consume it.  After it is filled, the kernel and device indexes
are not synchronized any more, so we have to reset them, otherwise the
kernel will be stuck waiting for the wrong slot to be filled.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>

---
This is just a patch suggestion, I'm not an expert in network drivers, I
leave to actual driver authors to bake a better version.

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 37caa88..77c8dbc 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3759,6 +3759,21 @@ static irqreturn_t e1000_intr(int irq, void *data)
 	if (unlikely(test_bit(__E1000_DOWN, &adapter->flags)))
 		return IRQ_HANDLED;
 
+	if (unlikely(icr & E1000_ICR_RXO)) {
+		/* Receive Overrun */
+		u32 rctl;
+		int i;
+		rctl = er32(RCTL);
+		ew32(RCTL, rctl & ~E1000_RCTL_EN);
+		for (i = 0; i < adapter->num_rx_queues; i++) {
+			memset(adapter->rx_ring[i].desc, 0, adapter->rx_ring[i].size);
+			adapter->rx_ring[i].next_to_clean = 0;
+		}
+		ew32(RDH, 0);
+		ew32(RCTL, rctl);
+		adapter->netdev->stats.rx_fifo_errors++;
+	}
+
 	if (unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
 		hw->get_link_status = 1;
 		/* guard against interrupt when we're going down */

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] e1000: Reset rx ring index on receive overrun
  2012-05-17 23:01 [PATCH] e1000: Reset rx ring index on receive overrun Samuel Thibault
@ 2012-05-17 23:22 ` Dave, Tushar N
  2012-05-17 23:28   ` Samuel Thibault
  0 siblings, 1 reply; 8+ messages in thread
From: Dave, Tushar N @ 2012-05-17 23:22 UTC (permalink / raw)
  To: Samuel Thibault, Kirsher, Jeffrey T, Brandeburg, Jesse, Allan,
	Bruce W, Wyborny, Carolyn, Skidmore, Donald C, Rose, Gregory V,
	Waskiewicz Jr, Peter P, Duyck, Alexander H, Ronciak, John,
	David S. Miller, Jiri Pirko, Dean Nelson, e1000-devel, netdev
  Cc: linux-kernel

I am interested in to see if you have actual test case and more importantly test data that shows that kernel and device indexes are not synchronized any more.


-Tushar

>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>On Behalf Of Samuel Thibault
>Sent: Thursday, May 17, 2012 4:02 PM
>To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Wyborny,
>Carolyn; Skidmore, Donald C; Rose, Gregory V; Waskiewicz Jr, Peter P;
>Duyck, Alexander H; Ronciak, John; David S. Miller; Jiri Pirko; Dean
>Nelson; e1000-devel@lists.sourceforge.net; netdev@vger.kernel.org
>Cc: linux-kernel@vger.kernel.org
>Subject: [PATCH] e1000: Reset rx ring index on receive overrun
>
>At high traffic rate, the rx ring may get completely filled before we
>manage to consume it.  After it is filled, the kernel and device indexes
>are not synchronized any more, so we have to reset them, otherwise the
>kernel will be stuck waiting for the wrong slot to be filled.
>
>Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
>
>---
>This is just a patch suggestion, I'm not an expert in network drivers, I
>leave to actual driver authors to bake a better version.
>
>diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
>b/drivers/net/ethernet/intel/e1000/e1000_main.c
>index 37caa88..77c8dbc 100644
>--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>@@ -3759,6 +3759,21 @@ static irqreturn_t e1000_intr(int irq, void *data)
> 	if (unlikely(test_bit(__E1000_DOWN, &adapter->flags)))
> 		return IRQ_HANDLED;
>
>+	if (unlikely(icr & E1000_ICR_RXO)) {
>+		/* Receive Overrun */
>+		u32 rctl;
>+		int i;
>+		rctl = er32(RCTL);
>+		ew32(RCTL, rctl & ~E1000_RCTL_EN);
>+		for (i = 0; i < adapter->num_rx_queues; i++) {
>+			memset(adapter->rx_ring[i].desc, 0, adapter-
>>rx_ring[i].size);
>+			adapter->rx_ring[i].next_to_clean = 0;
>+		}
>+		ew32(RDH, 0);
>+		ew32(RCTL, rctl);
>+		adapter->netdev->stats.rx_fifo_errors++;
>+	}
>+
> 	if (unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
> 		hw->get_link_status = 1;
> 		/* guard against interrupt when we're going down */
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in the
>body of a message to majordomo@vger.kernel.org More majordomo info at
>http://vger.kernel.org/majordomo-info.html

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] e1000: Reset rx ring index on receive overrun
  2012-05-17 23:22 ` Dave, Tushar N
@ 2012-05-17 23:28   ` Samuel Thibault
  2012-05-17 23:31     ` Samuel Thibault
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Thibault @ 2012-05-17 23:28 UTC (permalink / raw)
  To: Dave, Tushar N
  Cc: Jiri Pirko, e1000-devel, Dean Nelson, Allan, Bruce W, Brandeburg,
	Jesse, linux-kernel, Ronciak, John, netdev, David S. Miller

Dave, Tushar N, le Thu 17 May 2012 23:22:31 +0000, a écrit :
> I am interested in to see if you have actual test case and more importantly test data that shows that kernel and device indexes are not synchronized any more.

Well, it's not with an actual physical device, but with the kvm
emulation. Printing indexes from the clean_rx handler, I'm getting:

(status linux index/total size/device index)

status 7 2/256/19
status 7 3/256/19
...
status 7 18/256/19
status 7 19/256/19
status 7 20/256/19
status 7 21/256/19
...
since the status is still 7, linux continues on.

...
status 7 254/256/19
status 7 255/256/19
status 7 0/256/19
status 7 1/256/19
status 0 2/256/19

here it stops, and on next interrupts,

status 0 2/256/20

status 0 2/256/21
etc. and no progress is made any more, until

status 0 2/256/253
status 0 2/256/254

and it gets stuck there:

status 0 2/256/254
status 0 2/256/254
status 0 2/256/254

Samuel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] e1000: Reset rx ring index on receive overrun
  2012-05-17 23:28   ` Samuel Thibault
@ 2012-05-17 23:31     ` Samuel Thibault
  2012-05-18  0:04       ` Brandeburg, Jesse
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Thibault @ 2012-05-17 23:31 UTC (permalink / raw)
  To: Dave, Tushar N, Kirsher, Jeffrey T, Brandeburg, Jesse, Allan,
	Bruce W, Wyborny, Carolyn, Skidmore, Donald C, Rose, Gregory V,
	Waskiewicz Jr, Peter P, Duyck, Alexander H, Ronciak, John,
	David S. Miller, Jiri Pirko, Dean Nelson, e1000-devel, netdev,
	linux-kernel

Samuel Thibault, le Fri 18 May 2012 01:28:21 +0200, a écrit :
> Dave, Tushar N, le Thu 17 May 2012 23:22:31 +0000, a écrit :
> > I am interested in to see if you have actual test case and more importantly test data that shows that kernel and device indexes are not synchronized any more.
> 
> Well, it's not with an actual physical device, but with the kvm
> emulation.

BTW, it also happens easily when request_irq takes some time to
complete: since we enable E1000_TCTL_EN before that, the card can have
time to fill the ring before irqs are processed.

Samuel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] e1000: Reset rx ring index on receive overrun
  2012-05-17 23:31     ` Samuel Thibault
@ 2012-05-18  0:04       ` Brandeburg, Jesse
  2012-05-18  0:12         ` Samuel Thibault
  0 siblings, 1 reply; 8+ messages in thread
From: Brandeburg, Jesse @ 2012-05-18  0:04 UTC (permalink / raw)
  To: Samuel Thibault
  Cc: Jiri Pirko, e1000-devel, Dean Nelson, Allan, Bruce W,
	linux-kernel, David S. Miller, Ronciak, John, netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 736 bytes --]



On Thu, 17 May 2012, Samuel Thibault wrote:

> Samuel Thibault, le Fri 18 May 2012 01:28:21 +0200, a écrit :
> > Dave, Tushar N, le Thu 17 May 2012 23:22:31 +0000, a écrit :
> > > I am interested in to see if you have actual test case and more 
> > > importantly test data that shows that kernel and device indexes are 
> > > not synchronized any more.
> > 
> > Well, it's not with an actual physical device, but with the kvm
> > emulation.
> 
> BTW, it also happens easily when request_irq takes some time to
> complete: since we enable E1000_TCTL_EN before that, the card can have
> time to fill the ring before irqs are processed.

I think there may well be a bug in the implementation in kvm.  The 
hardware doesn't have this bug.

[-- Attachment #2: Type: text/plain, Size: 395 bytes --]

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] e1000: Reset rx ring index on receive overrun
  2012-05-18  0:04       ` Brandeburg, Jesse
@ 2012-05-18  0:12         ` Samuel Thibault
  2012-05-18  0:26           ` Brandeburg, Jesse
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Thibault @ 2012-05-18  0:12 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: Jiri Pirko, e1000-devel, Dean Nelson, Allan, Bruce W,
	linux-kernel, David S. Miller, Ronciak, John, netdev

Brandeburg, Jesse, le Thu 17 May 2012 17:04:04 -0700, a écrit :
> On Thu, 17 May 2012, Samuel Thibault wrote:
> 
> > Samuel Thibault, le Fri 18 May 2012 01:28:21 +0200, a écrit :
> > > Dave, Tushar N, le Thu 17 May 2012 23:22:31 +0000, a écrit :
> > > > I am interested in to see if you have actual test case and more 
> > > > importantly test data that shows that kernel and device indexes are 
> > > > not synchronized any more.
> > > 
> > > Well, it's not with an actual physical device, but with the kvm
> > > emulation.
> > 
> > BTW, it also happens easily when request_irq takes some time to
> > complete: since we enable E1000_TCTL_EN before that, the card can have
> > time to fill the ring before irqs are processed.
> 
> I think there may well be a bug in the implementation in kvm.  The 
> hardware doesn't have this bug.

How does it avoid filling the ring?  What is the purpose of the RXO flag
if it does avoid them?

Samuel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] e1000: Reset rx ring index on receive overrun
  2012-05-18  0:12         ` Samuel Thibault
@ 2012-05-18  0:26           ` Brandeburg, Jesse
  2012-05-18 13:51             ` e1000 rx emulation bug (Was: [PATCH] e1000: Reset rx ring index on receive overrun) Samuel Thibault
  0 siblings, 1 reply; 8+ messages in thread
From: Brandeburg, Jesse @ 2012-05-18  0:26 UTC (permalink / raw)
  To: Samuel Thibault
  Cc: Jiri Pirko, e1000-devel, Dean Nelson, Allan, Bruce W,
	linux-kernel, David S. Miller, Ronciak, John, netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1195 bytes --]



On Thu, 17 May 2012, Samuel Thibault wrote:
> Brandeburg, Jesse, le Thu 17 May 2012 17:04:04 -0700, a écrit :
> > > BTW, it also happens easily when request_irq takes some time to
> > > complete: since we enable E1000_TCTL_EN before that, the card can have
> > > time to fill the ring before irqs are processed.
> > 
> > I think there may well be a bug in the implementation in kvm.  The 
> > hardware doesn't have this bug.
> 
> How does it avoid filling the ring?  What is the purpose of the RXO flag
> if it does avoid them?

RXO is only used to let the driver know "information" that the rx fifo is 
overflowing.  As it turns out the flag isn't very useful and none of our 
drivers currently use it for anything.

If the hardware fills all the available receive *descriptors* then the 
hardware's RDH (head) register will advance all the way to the RDT (tail) 
value. The tail always points one past the rx descriptors available to 
hardware to use.  RDH==RDT means there are no software provided 
descriptors in the ring available to be used by the hardware.  Our drivers 
typically allow for 1-2 descriptors to be unused in the ring to help avoid 
any confusion.

Hope this helps,
 Jesse

[-- Attachment #2: Type: text/plain, Size: 395 bytes --]

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

* e1000 rx emulation bug (Was: [PATCH] e1000: Reset rx ring index on receive overrun)
  2012-05-18  0:26           ` Brandeburg, Jesse
@ 2012-05-18 13:51             ` Samuel Thibault
  0 siblings, 0 replies; 8+ messages in thread
From: Samuel Thibault @ 2012-05-18 13:51 UTC (permalink / raw)
  To: Brandeburg, Jesse, qemu-devel, dlaor
  Cc: Jiri Pirko, e1000-devel, Dean Nelson, Allan, Bruce W,
	linux-kernel, David S. Miller, Ronciak, John, netdev

Hello,

There seems to be a bug in qemu in the e1000 emulation, which triggers
an issue with the Linux driver.

What happens in Linux is the following:

- e1000_open 
  - e1000_configure
    - e1000_setup_rctl
      - enables E1000_RCTL_EN
    - e1000_configure_rx
      - sets RDT/RDH to 0
    - alloc_rx_buf
      - pushes buffers to the ring

with bad luck, or on high traffic of small packets, what is observed is
that between setting RDT/RDH and pushing buffers, the ring fills up in
qemu. Here is what happens there on the qemu side:

- e1000_receive
  - e1000_has_rxbufs
    - total_size <= s->rxbuf_size (because it's small)
      return s->mac_reg[RDH] != s->mac_reg[RDT] || !s->check_rxov;

although RDH == RDT == 0, it returns 1, because since RDT/RDH have
just been set to 0, set_rdt has cleared check_rxov. e1000_receive
thus believes there is room, and proceeds with filling the ring.
Unfortunately, since no buffer was pushed, desc.buffer_addr is NULL, and
thus the do loop skips all these nul rx descriptors of the ring, but
marking each of them with E1000_RXD_STAT_DD, and eventually wrapping
around.  From then on, since check_rxov has been set by the do loop,
nothing more is pushed, until the linux driver pushes buffers to the
ring. qemu can then fill some descriptors, and Linux read them, but
since the whole ring was filled with E1000_RXD_STAT_DD, Linux goes on
reading, and thus gets completely desynchronized with the device.

That raises two questions:

- what is the role of the check_rxov flag?  Is hardware really allowed
  to push in some cases, even when RDH==RDT?  Removing it makes things
  work just fine.
- BTW, when skipping a descriptor because of NULL address, does
  E1000_RXD_STAT_DD have to be set?

Samuel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-05-18 13:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-17 23:01 [PATCH] e1000: Reset rx ring index on receive overrun Samuel Thibault
2012-05-17 23:22 ` Dave, Tushar N
2012-05-17 23:28   ` Samuel Thibault
2012-05-17 23:31     ` Samuel Thibault
2012-05-18  0:04       ` Brandeburg, Jesse
2012-05-18  0:12         ` Samuel Thibault
2012-05-18  0:26           ` Brandeburg, Jesse
2012-05-18 13:51             ` e1000 rx emulation bug (Was: [PATCH] e1000: Reset rx ring index on receive overrun) Samuel Thibault

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).