* Re: soft-lockups in sunvnet
@ 2014-08-08 18:46 David Miller
2014-08-08 18:55 ` Sowmini Varadhan
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: David Miller @ 2014-08-08 18:46 UTC (permalink / raw)
To: sparclinux
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Fri, 8 Aug 2014 14:39:39 -0400
> The tasklet mechanims for kicking of netif_wake_queue works quite
> well, and is simple enough to do.
So you are able to successfully trigger the tasklet from vnet_event(),
and have that tasklet do the queue wakeups?
> But once I removed the heuristic exponential backoff/retry for
> vnet_send_ack(), I'm freqently not able to send any DRING_STOPPED
> messages, and that seems to freeze all access even over the switch-port
> to the VM (even though, afaict, netif_stop_queue has not been called.
>
> If we can't send the LDC ack from vnet_event, we need to reset
> this peer, but vio_conn_reset() is a no-op. Recovering from here
> is going to be quite sticky.
But removing the backoff logic from __vnet_tx_trigger() does work,
right?
I don't think vnet_walk_rx() is really able to handle any kind of real
failures from vnet_send_ack() properly. If we send one or more
VIO_DRING_ACTIVE ACKs and then can't send the VIO_DRING_STOPPED one
out, the ring will likely be left in an inconsistent state.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: soft-lockups in sunvnet
2014-08-08 18:46 soft-lockups in sunvnet David Miller
@ 2014-08-08 18:55 ` Sowmini Varadhan
2014-08-08 19:59 ` David Miller
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Sowmini Varadhan @ 2014-08-08 18:55 UTC (permalink / raw)
To: sparclinux
On (08/08/14 11:46), David Miller wrote:
> Date: Fri, 08 Aug 2014 11:46:01 -0700 (PDT)
> From: David Miller <davem@davemloft.net>
> To: sowmini.varadhan@oracle.com
> Cc: david.stevens@oracle.com, karl.volz@oracle.com,
> sparclinux@vger.kernel.org
> Subject: Re: soft-lockups in sunvnet
> X-Mailer: Mew version 6.5 on Emacs 24.1 / Mule 6.0 (HANACHIRUSATO)
>
> From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Date: Fri, 8 Aug 2014 14:39:39 -0400
>
>
> So you are able to successfully trigger the tasklet from vnet_event(),
> and have that tasklet do the queue wakeups?
yes.
> But removing the backoff logic from __vnet_tx_trigger() does work,
> right?
It "works" to the extent that it recovers. You get a lot more
errors, much more easily, though - thus throughput sinks.
I dont know how the heuristics were determined, but they seem to help...
> I don't think vnet_walk_rx() is really able to handle any kind of real
> failures from vnet_send_ack() properly. If we send one or more
> VIO_DRING_ACTIVE ACKs and then can't send the VIO_DRING_STOPPED one
> out, the ring will likely be left in an inconsistent state.
I just found out last week that you dont actually need to set the
VIO_ACK_ENABLE (and thus trigger the ACTIVE acks)- evidently the protocol
is such that the STOPPED ldc message is sufficient.
So one patch that I'm working on lining up (after due testing etc)
is to not set VIO_ACK_ENABLE in vnet_start_xmit- it also helps perf
slightly because it reduces the trips through ldc (and potentail
for filling up the ldc ring).
--Sowmini
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: soft-lockups in sunvnet
2014-08-08 18:46 soft-lockups in sunvnet David Miller
2014-08-08 18:55 ` Sowmini Varadhan
@ 2014-08-08 19:59 ` David Miller
2014-08-08 20:47 ` Sowmini Varadhan
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2014-08-08 19:59 UTC (permalink / raw)
To: sparclinux
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Fri, 8 Aug 2014 14:55:22 -0400
> On (08/08/14 11:46), David Miller wrote:
>> I don't think vnet_walk_rx() is really able to handle any kind of real
>> failures from vnet_send_ack() properly. If we send one or more
>> VIO_DRING_ACTIVE ACKs and then can't send the VIO_DRING_STOPPED one
>> out, the ring will likely be left in an inconsistent state.
>
> I just found out last week that you dont actually need to set the
> VIO_ACK_ENABLE (and thus trigger the ACTIVE acks)- evidently the protocol
> is such that the STOPPED ldc message is sufficient.
>
> So one patch that I'm working on lining up (after due testing etc)
> is to not set VIO_ACK_ENABLE in vnet_start_xmit- it also helps perf
> slightly because it reduces the trips through ldc (and potentail
> for filling up the ldc ring).
This only works because we free the packet in the ->ndo_start_xmit()
method. If we didn't do that, and we freed it up at ACK time, we'd
need to force the ACKs in order to guarentee that we release the SKB
in a finite amount of time.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: soft-lockups in sunvnet
2014-08-08 18:46 soft-lockups in sunvnet David Miller
2014-08-08 18:55 ` Sowmini Varadhan
2014-08-08 19:59 ` David Miller
@ 2014-08-08 20:47 ` Sowmini Varadhan
2014-08-10 19:56 ` Sowmini Varadhan
2014-08-11 20:58 ` David Miller
4 siblings, 0 replies; 6+ messages in thread
From: Sowmini Varadhan @ 2014-08-08 20:47 UTC (permalink / raw)
To: sparclinux
On (08/08/14 12:59), David Miller wrote:
>
> This only works because we free the packet in the ->ndo_start_xmit()
> method. If we didn't do that, and we freed it up at ACK time, we'd
> need to force the ACKs in order to guarentee that we release the SKB
> in a finite amount of time.
>
I see. I missed that subtlety. I'll make a note in the comments for
my proposed change, in case this piece of code changes in the future.
--Sowmini
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: soft-lockups in sunvnet
2014-08-08 18:46 soft-lockups in sunvnet David Miller
` (2 preceding siblings ...)
2014-08-08 20:47 ` Sowmini Varadhan
@ 2014-08-10 19:56 ` Sowmini Varadhan
2014-08-11 20:58 ` David Miller
4 siblings, 0 replies; 6+ messages in thread
From: Sowmini Varadhan @ 2014-08-10 19:56 UTC (permalink / raw)
To: sparclinux
To wrap up this thread..
I just sent out a tentative patch with the first round of fixes
to netdev. These fixes take care of the bare minimum of making
sure we don't soft-lockup when the sink does not receive packets.
I can still see at least 2 areas of improvement, that I'd like
to address separately, since the changes are non-trivial and
have to be done carefully
1. finer granularity of flow-control in vnet_start_xmit(): instead
of doing a netif_stop_queue() when any single peer is congested,
try to track flow-control for that peer only, and let the others
continue Tx/Rx
2. better recovery from vnet_send_ack() failure: I have a somewhat
odd printk there today, just to let the admin know that help
is needed. I've tried calling ldc_disconnect() here, but it
doesn't really reset the peer, though a module-reload fixes it.
So what's needed is to trigger just the unregister/register of
the problematic port, and this will need more than a few lines
of change (I think it has to be triggered by ds?)
I'll take a look at those two over the next few weeks, but didnt
want to hold up these changes hostage while that's happening.
--Sowmini
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: soft-lockups in sunvnet
2014-08-08 18:46 soft-lockups in sunvnet David Miller
` (3 preceding siblings ...)
2014-08-10 19:56 ` Sowmini Varadhan
@ 2014-08-11 20:58 ` David Miller
4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2014-08-11 20:58 UTC (permalink / raw)
To: sparclinux
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Sun, 10 Aug 2014 15:56:22 -0400
> 1. finer granularity of flow-control in vnet_start_xmit(): instead
> of doing a netif_stop_queue() when any single peer is congested,
> try to track flow-control for that peer only, and let the others
> continue Tx/Rx
The big blocker is that you can't send to other peers if you receive
a packet destined for the stopped peer.
It's just not legal to reorder packets in this manner.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-08-11 20:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-08 18:46 soft-lockups in sunvnet David Miller
2014-08-08 18:55 ` Sowmini Varadhan
2014-08-08 19:59 ` David Miller
2014-08-08 20:47 ` Sowmini Varadhan
2014-08-10 19:56 ` Sowmini Varadhan
2014-08-11 20:58 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.