All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sander Eikelenboom <linux@eikelenboom.it>
To: Wei Liu <wei.liu2@citrix.com>
Cc: annie li <annie.li@oracle.com>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	Zoltan Kiss <zoltan.kiss@citrix.com>,
	xen-devel@lists.xen.org
Subject: Re: Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"
Date: Tue, 18 Mar 2014 16:21:27 +0100	[thread overview]
Message-ID: <1744594108.20140318162127@eikelenboom.it> (raw)
In-Reply-To: <20140318120401.GX16807@zion.uk.xensource.com>


Tuesday, March 18, 2014, 1:04:01 PM, you wrote:

> On Mon, Mar 17, 2014 at 11:33:18PM +0100, Sander Eikelenboom wrote:
>> 
>> Monday, March 17, 2014, 11:35:24 AM, you wrote:
>> 
>> > On Wed, Mar 12, 2014 at 05:47:29PM +0100, Sander Eikelenboom wrote:
>> >> 
>> >> Wednesday, March 12, 2014, 4:45:01 PM, you wrote:
>> >> 
>> >> > On Wed, Mar 12, 2014 at 04:20:03PM +0100, Sander Eikelenboom wrote:
>> >> > [...]
>> >> >> 
>> >> >> > Sorry, remove the trailing "S". Actually you only need to look at netback.c.
>> >> >> 
>> >> >> What producer index to compare with .. there are quite some RING_GET_REQUESTS .. and i see:
>> >> >> npo->meta_prod
>> >> >> vif->rx.sring->req_prod
>> >> >> vif->pending_prod
>> >> >> 
>> >> >> to name a few ..
>> >> >> Any particular RING_GET_REQUESTS call and particular producer index you are interested in ?
>> >> >> 
>> >> 
>> >> > There are two macros you can use
>> >> 
>> >> > RING_REQUEST_CONS_OVERFLOW and RING_REQUEST_PROD_OVERFLOW.
>> >> 
>> >> Ah i already produced my own .. diff to netback is attached ..
>> >> 
>> >> Netback:
>> >> Mar 12 17:41:26 serveerstertje kernel: [  464.778614] vif vif-7-0 vif7.0: ?!? npo->meta_prod:37 vif->rx.sring->req_prod:431006 vif->rx.req_cons:431007
>> >> Mar 12 17:41:26 serveerstertje kernel: [  464.786203] vif vif-7-0 vif7.0: ?!? npo->meta_prod:38 vif->rx.sring->req_prod:431006 vif->rx.req_cons:431008
>> 
>> > req_prod < req_cons, so there's an overflow. I'm actually curious how
>> > this could happen.
>> 
>> > Back to the code, before netback enqueues SKB to its internal queue, it
>> > will check if there's enough room in the ring. Before Paul's changeset,
>> > it checks against a static number (the possible maximum slots that can
>> > be consumed by an SKB). Paul's changeset made it check against the
>> > actual slots the incoming SKB consumes. See
>> > interface.c:xenvif_start_xmit.  Another interesting site would be when
>> > the SKB is broken down later on in internal queue. See
>> > netback.c:xenvif_rx_action. The routine to break down an SKB is
>> > xenvif_gop_skb.
>> 
>> > Although they look alright to me, but you might want to instrument them
>> > a bit more to see what triggers that overflow. It's a bit frustrating,
>> > but a bug that cannot be easily reproduced is indeed extremely hard to
>> > fix.
>> 
>> [  554.166914] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req npo->meta_prod:7 vif->rx.sring->req_prod:301750 vif->rx.req_cons:301751
>> [  571.143788] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req npo->meta_prod:37 vif->rx.sring->req_prod:364822 vif->rx.req_cons:364823
>> [  571.150762] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req npo->meta_prod:37 vif->rx.sring->req_prod:364822 vif->rx.req_cons:364823 req->gref:19791875 req->id:22
>> [  571.164691] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here 2  npo->meta_prod:38 vif->rx.sring->req_prod:364822 vif->rx.req_cons:364823 npo->copy_gref:19791875
>> [  571.179474] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 3 npo->meta_prod:38 old_meta_prod:30 vif->rx.sring->req_prod:364822 vif->rx.req_cons:364823 gso_type:1 gso_size:1448 nr_frags:1 req->gref:576 req->id:14
>> [  571.194844] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 2 before req npo->meta_prod:38 vif->rx.sring->req_prod:364822 vif->rx.req_cons:364824 gso_type:0 gso_size:0 nr_frags:0
>> [  571.211215] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 2 after req npo->meta_prod:38 vif->rx.sring->req_prod:364822 vif->rx.req_cons:364824 gso_type:0 gso_size:0 nr_frags:0 req->gref:4325379 req->id:23
>> [  571.228471] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 3 npo->meta_prod:39 old_meta_prod:38 vif->rx.sring->req_prod:364822 vif->rx.req_cons:364824 gso_type:0 gso_size:0 nr_frags:0 req->gref:4325379 req->id:23
>> [  571.246412] vif vif-7-0 vif7.0: Bad status -3 from copy to DOM7.
>> [  571.255531] vif vif-7-0 vif7.0: ?!? xenvif_check_gop status err? status:-1 i:7 nr_meta_slots:8 copy_op->status:-3 npo->copy_cons:38
>> [  571.264804] vif vif-7-0 vif7.0: ?!? xenvif_rx_action status err? status:-1 meta_slots_used:8 flags:7 size:1 ip_summed:1 is_gso:1 skb_shinfo(skb)->gso_type:1634 vif->meta[npo.meta_cons].gso_type:3 nr_frags:1 npo.copy_prod:39, npo.meta_cons:30
>> 
>> It seems in 'get_next_rx_buffer' you can get the situation that cons > prod, but the first time somehow the code after the actual RING_GET_REQUEST isn't ran (why?!? queue stop ?) .. and it doesn't lead to an immediate problem ..

> You've rate limited the output. get_next_rx_buffer won't sleep
> whatsoever. Stopping the queue won't affect a running RX kthread. You
> can think of the start_xmit and kthread run in parallel.

> get_next_rx_buffer doens't check if the ring is overflowed. It's the
> caller's fault. It might be worth checking start_new_rx_buffer

Added even more warns ...

[  297.885969] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  min_slots_needed:4 vif->rx.sring->req_prod:21764 vif->rx.req_cons:21762
[  298.760555] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  min_slots_needed:3 vif->rx.sring->req_prod:22488 vif->rx.req_cons:22486

[  306.376176] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer before req npo->meta_prod:30 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28313
[  306.376556] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  min_slots_needed:1 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28313
[  306.391863] vif vif-7-0 vif7.0: ?!? get_next_rx_buffer after req npo->meta_prod:30 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28314 req->gref:4325377 req->id:153

[  306.407599] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here 2  npo->meta_prod:31 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28314 npo->copy_gref:4325377  npo->copy_off:0  MAX_BUFFER_OFFSET:4096 bytes:640 size:640 i:4
[  306.423913] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here end npo->meta_prod:31 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28314 npo->copy_gref:4325377 npo->copy_off:640  MAX_BUFFER_OFFSET:4096 bytes:640 size:0 i:5


[  306.440941] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 4 npo->meta_prod:31 old_meta_prod:25 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28314 gso_type:1 gso_size:1448 nr_frags:1 req->gref:638 req->id:147
[  306.458334] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 2 before req npo->meta_prod:31 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28315 gso_type:0 gso_size:0 nr_frags:0
[  306.476097] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 2 after req npo->meta_prod:31 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28315 gso_type:0 gso_size:0 nr_frags:0 req->gref:4325377 req->id:154
[  306.494462] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 3 before npo->meta_prod:32 old_meta_prod:31 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28315 gso_type:0 gso_size:0 nr_frags:0 req->gref:4325377 req->id:154 j:0
[  306.513424] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here start   npo->meta_prod:32 vif->rx.sring->req_prod:28313 vif->rx.req_cons:28315 npo->copy_gref:4325377 npo->copy_off:0  MAX_BUFFER_OFFSET:4096 bytes:0 size:66 i:0
[  311.390883] net_ratelimit: 317 callbacks suppressed
[  311.400901] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  min_slots_needed:3 vif->rx.sring->req_prod:32386 vif->rx.req_cons:32322

- So in this case we are in the 3rd iteration of the loop in xenvif_gop_frag_copy ...
- Xenvif_start_xmit stop the queue since it has detected it needs one more slot which is unavailable at that time.
- The current rx thread however doesn't know and doesn't check  (neither in the loop in xenvif_gop_frag_copy nor in get_next_rx_buffer that the ring if full) .. while prod == cons now .. consumes another one ..
- That ring request leads to the bad grant references reported by the hypervisor

(XEN) [2014-03-18 15:02:58.928] grant_table.c:1857:d0v2 Bad grant reference 4325377

So should there be a check added there ... or should the callers  "xenvif_gop_frag_copy" and the caller of that one "xenvif_gop_skb" already have anticipated that what the were about
to do wasn't going to fit anyway ?

And of course .. how made Paul's change trigger this ?


>> The second time it does get to the code after the RING_GET_REQUEST in 'get_next_rx_buffer' and that leads to mayhem ...
>> 
>> So added a netdev_warn to xenvif_start_xmit for the "stop queue" case .. unfortunately it now triggers net_ratelimit at the end:
>> 
>> [  402.909693] vif vif-7-0 vif7.0: ?!? xenvif_start_xmit stopping queue !  min_slots_needed:7 vif->rx.sring->req_prod:189228 vif->rx.req_cons:189222

> I think xenvif_rx_ring_slots_available is doing its job. If req_prod -
> req_cons < needed, the queue is stopeed.

> Wei.

  reply	other threads:[~2014-03-18 15:21 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18 21:25 Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles Sander Eikelenboom
2014-02-19 11:28 ` Wei Liu
2014-02-19 11:33   ` Sander Eikelenboom
2014-02-20  9:49 ` annie li
2014-02-20 11:18   ` Sander Eikelenboom
2014-02-21  6:32     ` annie li
2014-02-26  9:14       ` Sander Eikelenboom
2014-02-26 15:11         ` Sander Eikelenboom
2014-02-27 14:18           ` Wei Liu
2014-02-27 14:43             ` Sander Eikelenboom
2014-02-27 15:15               ` Wei Liu
2014-02-27 15:26                 ` Sander Eikelenboom
2014-02-27 15:57                   ` Wei Liu
2014-03-07 10:33                     ` Sander Eikelenboom
2014-03-07 11:19                       ` Wei Liu
2014-03-07 11:55                         ` Sander Eikelenboom
2014-03-10 23:00                           ` Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected" Sander Eikelenboom
2014-03-11 10:19                             ` Wei Liu
2014-03-11 10:21                               ` Sander Eikelenboom
2014-03-11 12:31                               ` Sander Eikelenboom
2014-03-11 12:38                                 ` Wei Liu
2014-03-11 12:56                                   ` Wei Liu
2014-03-11 13:00                                   ` Sander Eikelenboom
2014-03-11 15:36                                     ` Wei Liu
2014-03-11 16:28                                       ` Sander Eikelenboom
2014-03-12  1:42                                       ` Sander Eikelenboom
2014-03-12  1:50                                         ` Sander Eikelenboom
2014-03-12 11:35                                         ` Wei Liu
2014-03-12 11:45                                           ` Sander Eikelenboom
2014-03-12 12:04                                             ` Wei Liu
2014-03-12 14:23                                               ` Sander Eikelenboom
2014-03-12 14:48                                                 ` Wei Liu
2014-03-12 14:49                                                   ` Sander Eikelenboom
2014-03-12 14:59                                                     ` Wei Liu
2014-03-12 15:01                                                       ` Sander Eikelenboom
2014-03-12 15:04                                                         ` Wei Liu
2014-03-12 15:20                                                           ` Sander Eikelenboom
2014-03-12 15:45                                                             ` Wei Liu
2014-03-12 16:47                                                               ` Sander Eikelenboom
2014-03-14  9:53                                                                 ` Sander Eikelenboom
2014-03-17 10:35                                                                 ` Wei Liu
2014-03-17 22:33                                                                   ` Sander Eikelenboom
2014-03-18 12:04                                                                     ` Wei Liu
2014-03-18 15:21                                                                       ` Sander Eikelenboom [this message]
2014-03-18 16:04                                                                         ` Wei Liu
2014-03-18 20:14                                                                           ` Sander Eikelenboom
2014-03-18 21:18                                                                             ` Sander Eikelenboom
2014-03-18 23:11                                                                               ` Sander Eikelenboom
2014-03-19 11:35                                                                                 ` Wei Liu
2014-03-19 21:07                                                                                   ` Sander Eikelenboom
2014-03-21 16:49                                                                                     ` Wei Liu
2014-03-21 17:27                                                                                       ` Sander Eikelenboom
2014-03-22 18:28                                                                                         ` Sander Eikelenboom
2014-03-25 14:26                                                                                           ` Sander Eikelenboom
2014-03-25 15:15                                                                                           ` Wei Liu
2014-03-25 15:29                                                                                             ` Sander Eikelenboom
2014-03-26 11:11                                                                                               ` [Xen-devel] " Sander Eikelenboom
2014-03-26 14:44                                                                                                 ` Paul Durrant
2014-03-26 15:22                                                                                                   ` Sander Eikelenboom
2014-03-26 15:22                                                                                                   ` [Xen-devel] " Sander Eikelenboom
2014-03-26 15:50                                                                                                     ` Paul Durrant
2014-03-26 15:50                                                                                                     ` [Xen-devel] " Paul Durrant
2014-03-26 16:06                                                                                                       ` Sander Eikelenboom
2014-03-26 16:06                                                                                                       ` [Xen-devel] " Sander Eikelenboom
2014-03-26 16:25                                                                                                         ` Paul Durrant
2014-03-26 16:25                                                                                                         ` [Xen-devel] " Paul Durrant
2014-03-26 16:53                                                                                                           ` Sander Eikelenboom
2014-03-26 16:53                                                                                                           ` [Xen-devel] " Sander Eikelenboom
2014-03-26 17:16                                                                                                             ` Paul Durrant
2014-03-26 17:33                                                                                                               ` Sander Eikelenboom
2014-03-26 17:46                                                                                                                 ` Paul Durrant
2014-03-26 17:46                                                                                                                 ` [Xen-devel] " Paul Durrant
2014-03-26 18:07                                                                                                                   ` Sander Eikelenboom
2014-03-26 18:15                                                                                                                     ` Paul Durrant
2014-03-26 18:15                                                                                                                     ` [Xen-devel] " Paul Durrant
2014-03-26 18:42                                                                                                                       ` Paul Durrant
2014-03-26 18:42                                                                                                                       ` Paul Durrant
2014-03-26 20:17                                                                                                                       ` [Xen-devel] " Sander Eikelenboom
2014-03-27  9:54                                                                                                                         ` Paul Durrant
2014-03-27 10:05                                                                                                                           ` Sander Eikelenboom
2014-03-27 10:05                                                                                                                           ` [Xen-devel] " Sander Eikelenboom
2014-03-27  9:54                                                                                                                         ` Paul Durrant
2014-03-26 20:17                                                                                                                       ` Sander Eikelenboom
2014-03-26 18:07                                                                                                                   ` Sander Eikelenboom
2014-03-26 17:48                                                                                                                 ` [Xen-devel] " Paul Durrant
2014-03-26 19:57                                                                                                                   ` Sander Eikelenboom
2014-03-26 19:57                                                                                                                   ` [Xen-devel] " Sander Eikelenboom
2014-03-27  9:47                                                                                                                     ` Paul Durrant
2014-03-27  9:47                                                                                                                     ` [Xen-devel] " Paul Durrant
2014-03-27 10:00                                                                                                                       ` Sander Eikelenboom
2014-03-27 10:00                                                                                                                       ` [Xen-devel] " Sander Eikelenboom
2014-03-26 17:48                                                                                                                 ` Paul Durrant
2014-03-26 17:33                                                                                                               ` Sander Eikelenboom
2014-03-26 17:16                                                                                                             ` Paul Durrant
2014-03-26 14:44                                                                                                 ` Paul Durrant
2014-03-26 11:11                                                                                               ` Sander Eikelenboom
2014-03-26 15:10                                                                                               ` Paul Durrant
2014-03-12 15:03                                                       ` Sander Eikelenboom
2014-02-27 15:36             ` Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles Roger Pau Monné
2014-02-27 15:45               ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1744594108.20140318162127@eikelenboom.it \
    --to=linux@eikelenboom.it \
    --cc=Paul.Durrant@citrix.com \
    --cc=annie.li@oracle.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    --cc=zoltan.kiss@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.