All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: soft-lockups in sunvnet
@ 2014-08-08 18:46 David Miller
  2014-08-08 18:55 ` Sowmini Varadhan
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: David Miller @ 2014-08-08 18:46 UTC (permalink / raw)
  To: sparclinux

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Fri, 8 Aug 2014 14:39:39 -0400

> The tasklet mechanims for kicking of netif_wake_queue works quite
> well, and is simple enough to do. 

So you are able to successfully trigger the tasklet from vnet_event(),
and have that tasklet do the queue wakeups?

> But once I removed the heuristic exponential backoff/retry for
> vnet_send_ack(), I'm freqently not able to send any DRING_STOPPED 
> messages, and that seems to freeze all access even over the switch-port
> to the VM  (even though, afaict, netif_stop_queue has not been called.
> 
> If we can't send the LDC ack from vnet_event, we need to reset
> this peer, but vio_conn_reset() is a no-op. Recovering from here
> is going to be quite sticky.

But removing the backoff logic from __vnet_tx_trigger() does work,
right?

I don't think vnet_walk_rx() is really able to handle any kind of real
failures from vnet_send_ack() properly.  If we send one or more
VIO_DRING_ACTIVE ACKs and then can't send the VIO_DRING_STOPPED one
out, the ring will likely be left in an inconsistent state.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: soft-lockups in sunvnet
  2014-08-08 18:46 soft-lockups in sunvnet David Miller
@ 2014-08-08 18:55 ` Sowmini Varadhan
  2014-08-08 19:59 ` David Miller
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Sowmini Varadhan @ 2014-08-08 18:55 UTC (permalink / raw)
  To: sparclinux

On (08/08/14 11:46), David Miller wrote:
> Date: Fri, 08 Aug 2014 11:46:01 -0700 (PDT)
> From: David Miller <davem@davemloft.net>
> To: sowmini.varadhan@oracle.com
> Cc: david.stevens@oracle.com, karl.volz@oracle.com,
>  sparclinux@vger.kernel.org
> Subject: Re: soft-lockups in sunvnet
> X-Mailer: Mew version 6.5 on Emacs 24.1 / Mule 6.0 (HANACHIRUSATO)
> 
> From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Date: Fri, 8 Aug 2014 14:39:39 -0400
> 
> 
> So you are able to successfully trigger the tasklet from vnet_event(),
> and have that tasklet do the queue wakeups?

yes.

> But removing the backoff logic from __vnet_tx_trigger() does work,
> right?

It "works" to the extent that it recovers. You get a lot more 
errors, much more easily, though -  thus throughput sinks. 
I dont know how the heuristics were determined, but they seem to help...
 
> I don't think vnet_walk_rx() is really able to handle any kind of real
> failures from vnet_send_ack() properly.  If we send one or more
> VIO_DRING_ACTIVE ACKs and then can't send the VIO_DRING_STOPPED one
> out, the ring will likely be left in an inconsistent state.

I just found out last week that you dont actually need to set the
VIO_ACK_ENABLE (and thus trigger the ACTIVE acks)- evidently the protocol
is such that the STOPPED ldc message is sufficient. 

So one patch that I'm working on lining up (after due testing etc)
is to not set VIO_ACK_ENABLE in vnet_start_xmit- it also helps perf
slightly because it reduces the trips through ldc (and potentail
for filling up the ldc ring).

--Sowmini


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: soft-lockups in sunvnet
  2014-08-08 18:46 soft-lockups in sunvnet David Miller
  2014-08-08 18:55 ` Sowmini Varadhan
@ 2014-08-08 19:59 ` David Miller
  2014-08-08 20:47 ` Sowmini Varadhan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2014-08-08 19:59 UTC (permalink / raw)
  To: sparclinux

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Fri, 8 Aug 2014 14:55:22 -0400

> On (08/08/14 11:46), David Miller wrote:
>> I don't think vnet_walk_rx() is really able to handle any kind of real
>> failures from vnet_send_ack() properly.  If we send one or more
>> VIO_DRING_ACTIVE ACKs and then can't send the VIO_DRING_STOPPED one
>> out, the ring will likely be left in an inconsistent state.
> 
> I just found out last week that you dont actually need to set the
> VIO_ACK_ENABLE (and thus trigger the ACTIVE acks)- evidently the protocol
> is such that the STOPPED ldc message is sufficient. 
> 
> So one patch that I'm working on lining up (after due testing etc)
> is to not set VIO_ACK_ENABLE in vnet_start_xmit- it also helps perf
> slightly because it reduces the trips through ldc (and potentail
> for filling up the ldc ring).

This only works because we free the packet in the ->ndo_start_xmit()
method.  If we didn't do that, and we freed it up at ACK time, we'd
need to force the ACKs in order to guarentee that we release the SKB
in a finite amount of time.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: soft-lockups in sunvnet
  2014-08-08 18:46 soft-lockups in sunvnet David Miller
  2014-08-08 18:55 ` Sowmini Varadhan
  2014-08-08 19:59 ` David Miller
@ 2014-08-08 20:47 ` Sowmini Varadhan
  2014-08-10 19:56 ` Sowmini Varadhan
  2014-08-11 20:58 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Sowmini Varadhan @ 2014-08-08 20:47 UTC (permalink / raw)
  To: sparclinux

On (08/08/14 12:59), David Miller wrote:
> 
> This only works because we free the packet in the ->ndo_start_xmit()
> method.  If we didn't do that, and we freed it up at ACK time, we'd
> need to force the ACKs in order to guarentee that we release the SKB
> in a finite amount of time.
> 

I see. I missed that subtlety. I'll make a note in the comments for
my proposed change, in case this piece of code changes in the future.

--Sowmini


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: soft-lockups in sunvnet
  2014-08-08 18:46 soft-lockups in sunvnet David Miller
                   ` (2 preceding siblings ...)
  2014-08-08 20:47 ` Sowmini Varadhan
@ 2014-08-10 19:56 ` Sowmini Varadhan
  2014-08-11 20:58 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Sowmini Varadhan @ 2014-08-10 19:56 UTC (permalink / raw)
  To: sparclinux


To wrap up this thread..

I just sent out a tentative patch with the first round of fixes
to netdev. These fixes take care of the bare minimum of making
sure we don't soft-lockup when the sink does not receive packets.

I can still see at least 2 areas of improvement, that I'd like
to address separately, since the changes are non-trivial and
have to be done carefully

1. finer granularity of flow-control in vnet_start_xmit(): instead
   of doing a netif_stop_queue() when any single peer is congested,
   try to track flow-control for that peer only, and let the others
   continue Tx/Rx

2. better recovery from vnet_send_ack() failure: I have a somewhat
   odd printk there today, just to let the admin know that help
   is needed. I've tried calling ldc_disconnect() here, but it
   doesn't really reset the peer, though a module-reload fixes it.
   So what's needed is to trigger just the unregister/register of
   the problematic port, and this will need more than a few lines
   of change (I think it has to be triggered by ds?)

I'll take a look at those two over the next few weeks, but didnt
want to hold up these changes hostage while that's happening.

--Sowmini


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: soft-lockups in sunvnet
  2014-08-08 18:46 soft-lockups in sunvnet David Miller
                   ` (3 preceding siblings ...)
  2014-08-10 19:56 ` Sowmini Varadhan
@ 2014-08-11 20:58 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2014-08-11 20:58 UTC (permalink / raw)
  To: sparclinux

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Sun, 10 Aug 2014 15:56:22 -0400

> 1. finer granularity of flow-control in vnet_start_xmit(): instead
>    of doing a netif_stop_queue() when any single peer is congested,
>    try to track flow-control for that peer only, and let the others
>    continue Tx/Rx

The big blocker is that you can't send to other peers if you receive
a packet destined for the stopped peer.

It's just not legal to reorder packets in this manner.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-08-11 20:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-08 18:46 soft-lockups in sunvnet David Miller
2014-08-08 18:55 ` Sowmini Varadhan
2014-08-08 19:59 ` David Miller
2014-08-08 20:47 ` Sowmini Varadhan
2014-08-10 19:56 ` Sowmini Varadhan
2014-08-11 20:58 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.