[PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress.
@ 2014-08-01 13:50 Sowmini Varadhan
  2014-08-05  3:22 ` David Miller
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Sowmini Varadhan @ 2014-08-01 13:50 UTC (permalink / raw)
  To: sparclinux



The LDC handshake could have been asynchronously triggered
after ldc_bind() enables the ldc_rx() receive interrupt-handler
(and thus intercepts incoming control packets)
and before vio_port_up() calls ldc_connect(). If that is the case,
ldc_connect() should return 0 and let the state-machine
progress.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Karl Volz <karl.volz@oracle.com>
---
Resending to the correct list this time.

 arch/sparc/kernel/ldc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sparc/kernel/ldc.c b/arch/sparc/kernel/ldc.c
index e01d75d..66dacd5 100644
--- a/arch/sparc/kernel/ldc.c
+++ b/arch/sparc/kernel/ldc.c
@@ -1336,7 +1336,7 @@ int ldc_connect(struct ldc_channel *lp)
 	if (!(lp->flags & LDC_FLAG_ALLOCED_QUEUES) ||
 	    !(lp->flags & LDC_FLAG_REGISTERED_QUEUES) ||
 	    lp->hs_state != LDC_HS_OPEN)
-		err = -EINVAL;
+		err = ((lp->hs_state > LDC_HS_OPEN) ? 0 : -EINVAL);
 	else
 		err = start_handshake(lp);
 
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress.
  2014-08-01 13:50 [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress Sowmini Varadhan
@ 2014-08-05  3:22 ` David Miller
  2014-08-05 13:52 ` Sowmini Varadhan
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2014-08-05  3:22 UTC (permalink / raw)
  To: sparclinux

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Fri, 1 Aug 2014 09:50:40 -0400

> The LDC handshake could have been asynchronously triggered
> after ldc_bind() enables the ldc_rx() receive interrupt-handler
> (and thus intercepts incoming control packets)
> and before vio_port_up() calls ldc_connect(). If that is the case,
> ldc_connect() should return 0 and let the state-machine
> progress.
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Acked-by: Karl Volz <karl.volz@oracle.com>
> ---
> Resending to the correct list this time.

Applied and queued up for -stable, thanks.

Please, in the future, format your Subject line something like
the following:

[PATCH] $subsystem: $description

Everything after "[xxx] " goes into the commit log header line, and
putting a subsystem prefix allows people scanning the shortlog to
get a good idea what area a change touches.

In this case I added "sparc64: " for the subsystem prefix when
commiting your change.

BTW, you'll probably come to notice that not a lot of work has gone
into handling hard disconnect failures properly in LDC and the drivers
that use it.  It's the one area where the LDC framework is very weak.
Unfortunately a lot of cooperation is required by the drivers
themselves, because what to do when a LDC connection goes down is
different in different drivers (networking drivers can drop packets,
but block drivers have to redo the I/O when the LDC link comes back
up, for example)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress.
  2014-08-01 13:50 [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress Sowmini Varadhan
  2014-08-05  3:22 ` David Miller
@ 2014-08-05 13:52 ` Sowmini Varadhan
  2014-08-07  6:00 ` David Miller
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Sowmini Varadhan @ 2014-08-05 13:52 UTC (permalink / raw)
  To: sparclinux

On (08/04/14 20:22), David Miller wrote:
> 
> In this case I added "sparc64: " for the subsystem prefix when
> commiting your change.

Thanks, I'll keep that in mind for the future- I wasnt sure of
the taxonomy for ldc.

> BTW, you'll probably come to notice that not a lot of work has gone
> into handling hard disconnect failures properly in LDC and the drivers
> that use it.  It's the one area where the LDC framework is very weak.
> Unfortunately a lot of cooperation is required by the drivers
> themselves, because what to do when a LDC connection goes down is
> different in different drivers (networking drivers can drop packets,
> but block drivers have to redo the I/O when the LDC link comes back
> up, for example)

Noted - I've not run into this personally, but we'll add it to the
list.

FWIW, what I'm trying to sort out at the moment is the potential for 
triggering a soft lockup on a peer (i.e, a DoS vector). 
If a node sends a burst of data and then never drains its incoming 
ldc channel, the targetted peer will eventually encounter a 
tx_has_no_space_for()/EWOULDBLOCK from __vnet_tx_trigger/vnet_send_ack 
(and also __vdc_tx_trigger?)  - all of these are infinite loops, 
and vnet_send_ack is particularly vicious because it's actually a 
paradoxical blocking Tx in the Rx path (due to the vnet protocol design, 
I suppose).  

Carefully asserting netif_stop_queue() for __vnet_tx_trigger() might 
possibly help that case (though it flow-controls all ldc_channels, 
instead of just the blocked one), but recovering gracefully
for the other cases needs thought.

--Sowmini

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress.
  2014-08-01 13:50 [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress Sowmini Varadhan
  2014-08-05  3:22 ` David Miller
  2014-08-05 13:52 ` Sowmini Varadhan
@ 2014-08-07  6:00 ` David Miller
  2014-08-07 20:17 ` Sowmini Varadhan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2014-08-07  6:00 UTC (permalink / raw)
  To: sparclinux

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Tue, 5 Aug 2014 09:52:38 -0400

> FWIW, what I'm trying to sort out at the moment is the potential for 
> triggering a soft lockup on a peer (i.e, a DoS vector). 
> If a node sends a burst of data and then never drains its incoming 
> ldc channel, the targetted peer will eventually encounter a 
> tx_has_no_space_for()/EWOULDBLOCK from __vnet_tx_trigger/vnet_send_ack 
> (and also __vdc_tx_trigger?)  - all of these are infinite loops, 
> and vnet_send_ack is particularly vicious because it's actually a 
> paradoxical blocking Tx in the Rx path (due to the vnet protocol design, 
> I suppose).  
> 
> Carefully asserting netif_stop_queue() for __vnet_tx_trigger() might 
> possibly help that case (though it flow-controls all ldc_channels, 
> instead of just the blocked one), but recovering gracefully
> for the other cases needs thought.

Other virtualization drivers tend to have this issue.  In fact this
was just being discussed over the past week on the netdev list
wrt. the xen-netback driver.

The scheme used there basically is to mark the carrier off when
the peer stops sinking packets we are TX'ing to it, and queue
up a timer.

If the timer fires, we free up all of the packets in the transmit
queue so they don't linger forever and take up resources.

Usually there is some "event" that can let us know that data is
being sinked again, and we can use that to turn the carrier on
and start transmitting again.  Otherwise we'll have to poll
for such things in the timer.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress.
  2014-08-01 13:50 [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress Sowmini Varadhan
                   ` (2 preceding siblings ...)
  2014-08-07  6:00 ` David Miller
@ 2014-08-07 20:17 ` Sowmini Varadhan
  2014-08-08  5:26 ` David Miller
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Sowmini Varadhan @ 2014-08-07 20:17 UTC (permalink / raw)
  To: sparclinux

(Maybe this discussion belongs in netdev?)

On (08/06/14 23:00), David Miller wrote:
> The scheme used there basically is to mark the carrier off when
> the peer stops sinking packets we are TX'ing to it, and queue
> up a timer.
> 
> If the timer fires, we free up all of the packets in the transmit
> queue so they don't linger forever and take up resources.
> 
> Usually there is some "event" that can let us know that data is
> being sinked again, and we can use that to turn the carrier on
> and start transmitting again.  Otherwise we'll have to poll
> for such things in the timer.

so in the case of sunvnet, there are 2 writes that can block-
either !vnet_tx_dring_avail() (i.e., no descriptor rings) , or 
!tx_has_space_for() (i.e., cannot send the ldc trigger)

vnet_start_xmit() already seems to have most of the smarts to 
handle both failures, the one missing piece is that __vnet_tx_trigger()
should not loop infinitely on the ldc_write, but give up with 
EWOULDBLOCK after some finite number of tries- at that point
vnet_start_xmit() recovers correctly (afaict).

But then there's another problem at the tail of vnet_event()-
if we hit the maybe_tx_wakeup() condition, we try to take the
netif_tx_lock() in the recv-interrupt-context and can deadlock
with dev_watchdog(). 

I could move the contents of maybe_tx_wakeup() into 
vnet_tx_timeout(), and that avoids deadlocks. However I think that now 
matches the description of polling-to-recover-from-blocked-tx above. 
And one side-effect of moving from the synchronous maybe_tx_wakeup()
to the reliance on watchdog-timer to unblock things is that 
things like iperf throughput can fluctuate wildly. But given that
this actually *is* a performance weakness, perhaps its not worth
over-optimizing this case?

Another data-point in favor of not over-optimizing the wrong thing:
in separate experiments, when I try to move the data-handling code in
vnet_event off to a bh-handler, performance improves noticeably- I 
rarely encounter the netif_queue_stopped(), so relying on 
vnet_tx_timeout() for recovery is not an issue with that prototype.

Unless there's a simple way to kick off netif_wake_queue()
from vnet_event(), it seems better to just make vnet_tx_timeout()
do this job.

--Sowmini

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress.
  2014-08-01 13:50 [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress Sowmini Varadhan
                   ` (3 preceding siblings ...)
  2014-08-07 20:17 ` Sowmini Varadhan
@ 2014-08-08  5:26 ` David Miller
  2014-08-08 13:55 ` David L Stevens
  2014-08-08 17:33 ` David Miller
  6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2014-08-08  5:26 UTC (permalink / raw)
  To: sparclinux

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Thu, 7 Aug 2014 16:17:08 -0400

> vnet_start_xmit() already seems to have most of the smarts to 
> handle both failures, the one missing piece is that __vnet_tx_trigger()
> should not loop infinitely on the ldc_write, but give up with 
> EWOULDBLOCK after some finite number of tries- at that point
> vnet_start_xmit() recovers correctly (afaict).

Actually, if we look at the vio_ldc_send() code path, it does it's
own delay loop waiting for ldc_write() to stop signalling -EAGAIN.
It loops up to 1000 times, doing a udelay(1) each iteration.

So firstly the backoff based delay loop in __vnet_tx_trigger() should
probably be removed entirely.

Now, besides taking the error path in vnet_start_xmit() and dropping
the packet, we need to think about what we should do subsequently when
this condition triggers.

We probably want to take the carrier down via netif_carrier_off(), and
then when we see the head pointer advancing (I guess via
vnet_event()'s logic), the networking driver can netif_carrir_on() in
response.

> But then there's another problem at the tail of vnet_event()-
> if we hit the maybe_tx_wakeup() condition, we try to take the
> netif_tx_lock() in the recv-interrupt-context and can deadlock
> with dev_watchdog(). 
> 
> I could move the contents of maybe_tx_wakeup() into 
> vnet_tx_timeout(), and that avoids deadlocks. However I think that now 
> matches the description of polling-to-recover-from-blocked-tx above. 

Indeed, that's a bad deadlock.

You can just use a tasklet to make it execute in a software interrupt
"right now".

> Another data-point in favor of not over-optimizing the wrong thing:
> in separate experiments, when I try to move the data-handling code in
> vnet_event off to a bh-handler, performance improves noticeably- I 
> rarely encounter the netif_queue_stopped(), so relying on 
> vnet_tx_timeout() for recovery is not an issue with that prototype.
> 
> Unless there's a simple way to kick off netif_wake_queue()
> from vnet_event(), it seems better to just make vnet_tx_timeout()
> do this job.

I think just the time it takes to unwind the HW interrupt handler and
enter the software interrupt code path is enough time to let the other
side of the LDC link make a bunch of progresss.

And yes, hitting stop queue conditions less often will improve performance
if only because it means we have less overhead going in and out of stopped
queue state.

Anyways, see my tasklet suggestion above.

Also it's a real pain that one port backing up can head-of-line block
the entire interface.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress.
  2014-08-01 13:50 [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress Sowmini Varadhan
                   ` (4 preceding siblings ...)
  2014-08-08  5:26 ` David Miller
@ 2014-08-08 13:55 ` David L Stevens
  2014-08-08 17:33 ` David Miller
  6 siblings, 0 replies; 8+ messages in thread
From: David L Stevens @ 2014-08-08 13:55 UTC (permalink / raw)
  To: sparclinux



On 08/08/2014 01:26 AM, David Miller wrote:

> Now, besides taking the error path in vnet_start_xmit() and dropping
> the packet, we need to think about what we should do subsequently when
> this condition triggers.
> 
> We probably want to take the carrier down via netif_carrier_off(), and
> then when we see the head pointer advancing (I guess via
> vnet_event()'s logic), the networking driver can netif_carrir_on() in
> response.

Aren't we talking about one ldc port (out of many) for the device here?
I'm not sure we want to drop carrier if only one port isn't making
progress, unless maybe that's the only switch port.

						+-DLS


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress.
  2014-08-01 13:50 [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress Sowmini Varadhan
                   ` (5 preceding siblings ...)
  2014-08-08 13:55 ` David L Stevens
@ 2014-08-08 17:33 ` David Miller
  6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2014-08-08 17:33 UTC (permalink / raw)
  To: sparclinux

From: David L Stevens <david.stevens@oracle.com>
Date: Fri, 08 Aug 2014 09:55:55 -0400

> On 08/08/2014 01:26 AM, David Miller wrote:
> 
>> Now, besides taking the error path in vnet_start_xmit() and dropping
>> the packet, we need to think about what we should do subsequently when
>> this condition triggers.
>> 
>> We probably want to take the carrier down via netif_carrier_off(), and
>> then when we see the head pointer advancing (I guess via
>> vnet_event()'s logic), the networking driver can netif_carrir_on() in
>> response.
> 
> Aren't we talking about one ldc port (out of many) for the device here?
> I'm not sure we want to drop carrier if only one port isn't making
> progress, unless maybe that's the only switch port.

Right now we already are stopping the entire TX queue of the device if
just one port backs up.

Effectively the link is down at that point, and as long as it persists
no traffic is going to pass over the device.

The problem with not taking the carrier off is that the qdisc layer is
going to spam messages into the log each time the TX watchdog timer
goes off.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-08-08 17:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-01 13:50 [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress Sowmini Varadhan
2014-08-05  3:22 ` David Miller
2014-08-05 13:52 ` Sowmini Varadhan
2014-08-07  6:00 ` David Miller
2014-08-07 20:17 ` Sowmini Varadhan
2014-08-08  5:26 ` David Miller
2014-08-08 13:55 ` David L Stevens
2014-08-08 17:33 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.