All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
@ 2014-09-02 16:20 Sowmini Varadhan
  2014-09-02 16:27 ` David L Stevens
  0 siblings, 1 reply; 10+ messages in thread
From: Sowmini Varadhan @ 2014-09-02 16:20 UTC (permalink / raw)
  To: davem, raghuram.kothakota, sowmini.varadhan; +Cc: netdev


Upon encountering the first !VIO_DESC_READY in vnet_walk_rx(),
it is frequently worthwhile to re-check the descriptor status
after a short microsecond delay, as a bursty sender could
be actively populating the descriptors, and the short udelay()
is far less expensive than rolling back to ldc_rx() and having
to wake up and read data on another LDC message.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Raghuram Kothakota <raghuram.kothakota@oracle.com>
---
changes since v1: moved label `again', variable `retries' to this patchset.

 drivers/net/ethernet/sun/sunvnet.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index 8f90b57..7b1f320 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -390,11 +390,28 @@ static int vnet_walk_rx(struct vnet_port *port, struct vio_dring_state *dr,
 	viodbg(DATA, "vnet_walk_rx start[%08x] end[%08x]\n", start, end);
 
 	while (start != end) {
-		int ack = 0, err = vnet_walk_rx_one(port, dr, start, &ack);
+		int retries;
+		int ack = 0, err;
+
+		retries = 0;
+again:
+		err = vnet_walk_rx_one(port, dr, start, &ack);
 		if (err == -ECONNRESET)
 			return err;
-		if (err != 0)
-			break;
+		if (err != 0)  {
+			/* The descriptor was not READY. Retry with a
+			 * small delay, in case we have a bursty sender
+			 * that is actively populating the descriptors, to
+			 * reduce the overhead of stopping and re-entering
+			 * which would involve expensive LDC messages.
+			 */
+			if (retries++ < 3) {
+				udelay(4);
+				goto again;
+			} else {
+				break;
+			}
+		}
 		if (ack_start == -1)
 			ack_start = start;
 		ack_end = start;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
  2014-09-02 16:20 [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay() Sowmini Varadhan
@ 2014-09-02 16:27 ` David L Stevens
  2014-09-02 16:32   ` Sowmini Varadhan
  2014-09-05  5:36   ` David Miller
  0 siblings, 2 replies; 10+ messages in thread
From: David L Stevens @ 2014-09-02 16:27 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev

Sowmini,
	One of the things I found while looking at the code is that the
write memory barrier is in the wrong place. I have a patch to fix it, but
it means the last descriptor will NOT be marked as ready until the cache
is flushed, which may be delayed.
	That fix may make this one moot. Maybe not, but certainly there is
an unnecessary delay in notifying the peer because of that bug. The fix is
simply to move the "wmb()" after, instead of before, setting VIO_DESC_READY.
I mentioned it in our stand-up last week, which you couldn't attend, because
I pointed it out for the VDC driver, which also has the same problem.

	You may want to retry with that change to see if the delay still
helps.
							+-DLS

On 09/02/2014 12:20 PM, Sowmini Varadhan wrote:
> 
> Upon encountering the first !VIO_DESC_READY in vnet_walk_rx(),
> it is frequently worthwhile to re-check the descriptor status
> after a short microsecond delay, as a bursty sender could
> be actively populating the descriptors, and the short udelay()
> is far less expensive than rolling back to ldc_rx() and having
> to wake up and read data on another LDC message.
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Acked-by: Raghuram Kothakota <raghuram.kothakota@oracle.com>
> ---
> changes since v1: moved label `again', variable `retries' to this patchset.
> 
>  drivers/net/ethernet/sun/sunvnet.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
> index 8f90b57..7b1f320 100644
> --- a/drivers/net/ethernet/sun/sunvnet.c
> +++ b/drivers/net/ethernet/sun/sunvnet.c
> @@ -390,11 +390,28 @@ static int vnet_walk_rx(struct vnet_port *port, struct vio_dring_state *dr,
>  	viodbg(DATA, "vnet_walk_rx start[%08x] end[%08x]\n", start, end);
>  
>  	while (start != end) {
> -		int ack = 0, err = vnet_walk_rx_one(port, dr, start, &ack);
> +		int retries;
> +		int ack = 0, err;
> +
> +		retries = 0;
> +again:
> +		err = vnet_walk_rx_one(port, dr, start, &ack);
>  		if (err == -ECONNRESET)
>  			return err;
> -		if (err != 0)
> -			break;
> +		if (err != 0)  {
> +			/* The descriptor was not READY. Retry with a
> +			 * small delay, in case we have a bursty sender
> +			 * that is actively populating the descriptors, to
> +			 * reduce the overhead of stopping and re-entering
> +			 * which would involve expensive LDC messages.
> +			 */
> +			if (retries++ < 3) {
> +				udelay(4);
> +				goto again;
> +			} else {
> +				break;
> +			}
> +		}
>  		if (ack_start == -1)
>  			ack_start = start;
>  		ack_end = start;
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
  2014-09-02 16:27 ` David L Stevens
@ 2014-09-02 16:32   ` Sowmini Varadhan
  2014-09-05  5:36   ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: Sowmini Varadhan @ 2014-09-02 16:32 UTC (permalink / raw)
  To: David L Stevens; +Cc: netdev

On (09/02/14 12:27), David L Stevens wrote:
> Sowmini,
> 	One of the things I found while looking at the code is that the
> write memory barrier is in the wrong place. I have a patch to fix it, but
> it means the last descriptor will NOT be marked as ready until the cache
> is flushed, which may be delayed.

I see.

I can certainly try that (could you please share that patch offline?)
But this fudge-factored delay (which, fwiw, I too cringed to add)
was actually motivated by some observations on other non-linux OS-es.

> 	That fix may make this one moot. Maybe not, but certainly there is
> an unnecessary delay in notifying the peer because of that bug. The fix is
> simply to move the "wmb()" after, instead of before, setting VIO_DESC_READY.
> I mentioned it in our stand-up last week, which you couldn't attend, because
> I pointed it out for the VDC driver, which also has the same problem.
> 
> 	You may want to retry with that change to see if the delay still
> helps.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
  2014-09-02 16:27 ` David L Stevens
  2014-09-02 16:32   ` Sowmini Varadhan
@ 2014-09-05  5:36   ` David Miller
  2014-09-05 13:47     ` Sowmini Varadhan
  1 sibling, 1 reply; 10+ messages in thread
From: David Miller @ 2014-09-05  5:36 UTC (permalink / raw)
  To: david.stevens; +Cc: sowmini.varadhan, netdev

From: David L Stevens <david.stevens@oracle.com>
Date: Tue, 02 Sep 2014 12:27:19 -0400

> 	That fix may make this one moot. Maybe not, but certainly there is
> an unnecessary delay in notifying the peer because of that bug. The fix is
> simply to move the "wmb()" after, instead of before, setting VIO_DESC_READY.
> I mentioned it in our stand-up last week, which you couldn't attend, because
> I pointed it out for the VDC driver, which also has the same problem.

The memory barrier exists in order to make sure the cookies et al. are
globally visible before the VIO_DESC_READY.  We don't want stores to
be reordered such that the VIO_DESC_READY is seen too early.

I'm having a hard time imagining that putting the wmb() afterwards
would help, except by adding a new delay in a different place.

I say that because the TX trigger is going to make a hypervisor call,
and I bet that hypervisor trap syncs the store buffer of invoking cpu
thread.

Thinking further, another issue to consider is that the wmb() might no
be necessary considering all of the above, because the remote entity
should not look at this descriptor until the tx trigger occurs.

Anyways, if the hypervisor trap to signal the tx trigger does not
flush the local cpu thread's store buffer, then yes moving the memory
barrier might make sense.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
  2014-09-05  5:36   ` David Miller
@ 2014-09-05 13:47     ` Sowmini Varadhan
  2014-09-06 21:02       ` Sowmini Varadhan
  0 siblings, 1 reply; 10+ messages in thread
From: Sowmini Varadhan @ 2014-09-05 13:47 UTC (permalink / raw)
  To: David Miller; +Cc: david.stevens, netdev

On (09/04/14 22:36), David Miller wrote:
> 
> The memory barrier exists in order to make sure the cookies et al. are
> globally visible before the VIO_DESC_READY.  We don't want stores to
> be reordered such that the VIO_DESC_READY is seen too early.

Ok, though David (dls) was just pointing out that a rmb() might
be missing in vnet_walk_rx_one() before checking for READY descriptor

> I'm having a hard time imagining that putting the wmb() afterwards
> would help, except by adding a new delay in a different place.

correct.

> I say that because the TX trigger is going to make a hypervisor call,
> and I bet that hypervisor trap syncs the store buffer of invoking cpu
> thread.
> 
> Thinking further, another issue to consider is that the wmb() might no
> be necessary considering all of the above, because the remote entity
> should not look at this descriptor until the tx trigger occurs.

So, today the consumer already reads a series of descriptors based on
a single LDC start trigger already in vnet_walk_rx() - it doesnt
actually wait for an LDC "start" per descriptor.

Which is as it should be- the first "start" trigger of a burst 
is just an async signal from producer to consumer that data is ready 
for consumption.

A wmb() (and maybe a missing rmb()) should itself synchronize the critical
state more efficiently than an ldc exchange. The last bit
about VIO_DESC_READY may get flushed at some later point, and
the delay in patch 2/2 slightly helps work around that fudge factor,
but moving the wmb() before/after the VIO_DESC_READY (in my testing)
doesn't really make a difference to correctness - the consumer first
checks for VIO_DESC_READY, and by that time, the rest of the cookie
info should have been correctly sync'ed by the memory barrier.

If we get the memory-barriers right,
the trigger-per-vnet_start_xmit is superfluous, and only 
serves to flood the ldc_channel (and make the env ripe for
!tx_has_space_for()).

(And a side-effect of this is that avoids severl other udelay() loops
that we'd have otherwise undergone as part of __vnet_tx_trigger(),
which is also a healthy thing to avoid)

> Anyways, if the hypervisor trap to signal the tx trigger does not
> flush the local cpu thread's store buffer, then yes moving the memory
> barrier might make sense.

As such, it's expensive and unnecessary to cripple ourselves down to
one LDC exchange per descriptor - the wmb() itself should ensure this
correctly in a cheaper way.

BTW, even if you set the READY state before (not after) the wmb() 
there's still a small iperf-boost to be obtained by lingering
around in vnet_walk_rx() (as in patch 2/2)- that boost is from avoiding
what would otherwise be a ldc stop/start exchange between consumer
and producer. 

But I'm not too attached to this boost (aka patch 2/2)- 
benchmark-special code is always ugly.

--Sowmini

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
  2014-09-05 13:47     ` Sowmini Varadhan
@ 2014-09-06 21:02       ` Sowmini Varadhan
       [not found]         ` <20140907181510.GA23753@oracle.com>
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Sowmini Varadhan @ 2014-09-06 21:02 UTC (permalink / raw)
  To: David Miller; +Cc: david.stevens, netdev

On (09/05/14 09:47), Sowmini Varadhan wrote:
> > The memory barrier exists in order to make sure the cookies et al. are
> > globally visible before the VIO_DESC_READY.  We don't want stores to
> > be reordered such that the VIO_DESC_READY is seen too early.
> 
> Ok, though David (dls) was just pointing out that a rmb() might
> be missing in vnet_walk_rx_one() before checking for READY descriptor

Stared at this a bit over the last two days, checked
the documentation, discussed with dls offline - looks like 
(a) the rmb() thing was mostly a red-herring/fud 
(b) we do need the wmb()

The wmb() part is working correctly as designed:

The producer will do
  /* code to set up cookies */
  wmb(); /* makes sure above changes are committed */
  d->hdr.state = VIO_DESC_READY;

the consumer will do

        if (desc->hdr.state != VIO_DESC_READY)
                return 1;
        err = vnet_rx_one(port, desc->size, desc->cookies, desc->ncookies);
                :
        desc->hdr.state = VIO_DESC_DONE;

So the vnet_rx_one() will only use valid cookie information at
all times.

This allows the code to correctly able to read multiple READY descriptors 
for a single LDC trigger, which it already does today.
(and it would be needlessly inefficient to clamp this down to
only one descriptor read per LDC-start in the vnet_rx())

So what (if any) is the outstanding question about wmb() at this 
point?

--Sowmini

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
       [not found]         ` <20140907181510.GA23753@oracle.com>
@ 2014-09-07 19:36           ` Raghuram Kothakota
  0 siblings, 0 replies; 10+ messages in thread
From: Raghuram Kothakota @ 2014-09-07 19:36 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: David Miller, david.stevens, netdev

Sorry for joining this thread late, please see my response below:

On Sep 7, 2014, at 11:15 AM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote:

> 
> 
> On (09/06/14 17:02), Sowmini Varadhan wrote:
>> Date: Sat, 6 Sep 2014 17:02:53 -0400
>> From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
>> To: David Miller <davem@davemloft.net>
>> Cc: david.stevens@oracle.com, netdev@vger.kernel.org
>> Subject: Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data
>> descriptor after short udelay()
>> User-Agent: Mutt/1.5.21 (2010-09-15)
>> 
>> On (09/05/14 09:47), Sowmini Varadhan wrote:
>>>> The memory barrier exists in order to make sure the cookies et al. are
>>>> globally visible before the VIO_DESC_READY.  We don't want stores to
>>>> be reordered such that the VIO_DESC_READY is seen too early.
>>> 
>>> Ok, though David (dls) was just pointing out that a rmb() might
>>> be missing in vnet_walk_rx_one() before checking for READY descriptor
>> 
>> Stared at this a bit over the last two days, checked
>> the documentation, discussed with dls offline - looks like 
>> (a) the rmb() thing was mostly a red-herring/fud 
>> (b) we do need the wmb()
>> 
>> The wmb() part is working correctly as designed:
>> 
>> The producer will do
>>  /* code to set up cookies */
>>  wmb(); /* makes sure above changes are committed */
>>  d->hdr.state = VIO_DESC_READY;
>> 

Yeah, we need the producer memory barrier before setting
the descriptor state to ready. This ensures both data in the buffers
and other fields in the descriptor are flushed before the state
is visible to the consumer.  I do not believe the HV trap caused
by the trigger performs any sync of the store buffers.

>> the consumer will do
>> 
>>        if (desc->hdr.state != VIO_DESC_READY)
>>                return 1;
>>        err = vnet_rx_one(port, desc->size, desc->cookies, desc->ncookies);
>>                :
>>        desc->hdr.state = VIO_DESC_DONE;
>> 
>> So the vnet_rx_one() will only use valid cookie information at
>> all times.
>> 
>> This allows the code to correctly able to read multiple READY descriptors 
>> for a single LDC trigger, which it already does today.
>> (and it would be needlessly inefficient to clamp this down to
>> only one descriptor read per LDC-start in the vnet_rx())


Note, LDC message infrastructure is not a high performance infrastructure,
relying on an ldc message for each packet can greatly impact performance.
We need to reduce as many ldc messages as possible and rely on the
ring  more to pick up as many packets as possible without ldc messages.

The current implementation of vnet_start_xmit() performs all operations
such as grabbing a descriptor, copying the data into the data buffers
and updating the descriptor with in the same lock. This limits the parallelism
on the transmit path especially because the data copy operation is a bigger
operation and blocks other transmitters more than it is needed.  It is 
possible to split these operations where multiple transmitters can grab
descriptors in sequence but perform the copy operation in parallel.  When
this accomplished, there will be more possibility of racing with descriptors
getting ready quickly in a burst traffic situations. I believe the  udelay()
introduced in this patch will help even more when the transmitters are highly parallel.


-Raghuram
>> So what (if any) is the outstanding question about wmb() at this 
>> point?
>> 
>> --Sowmini
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
  2014-09-06 21:02       ` Sowmini Varadhan
       [not found]         ` <20140907181510.GA23753@oracle.com>
@ 2014-09-07 23:19         ` David Miller
  2014-09-08 13:45         ` David L Stevens
  2 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2014-09-07 23:19 UTC (permalink / raw)
  To: sowmini.varadhan; +Cc: david.stevens, netdev

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Sat, 6 Sep 2014 17:02:53 -0400

> So what (if any) is the outstanding question about wmb() at this 
> point?

I don't think there is any, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
  2014-09-06 21:02       ` Sowmini Varadhan
       [not found]         ` <20140907181510.GA23753@oracle.com>
  2014-09-07 23:19         ` David Miller
@ 2014-09-08 13:45         ` David L Stevens
  2014-09-08 14:03           ` Sowmini Varadhan
  2 siblings, 1 reply; 10+ messages in thread
From: David L Stevens @ 2014-09-08 13:45 UTC (permalink / raw)
  To: Sowmini Varadhan, David Miller; +Cc: netdev

Sowmini,

On 09/06/2014 05:02 PM, Sowmini Varadhan wrote:

> Stared at this a bit over the last two days, checked
> the documentation, discussed with dls offline - looks like 
> (a) the rmb() thing was mostly a red-herring/fud 
> (b) we do need the wmb()

	I don't think this has anything to do with your patch, but my (new) concern with the
wmb() and no matching rmb() is this text from Documentation/memory-barrier.txt:

> SMP BARRIER PAIRING
> -------------------
> 
> When dealing with CPU-CPU interactions, certain types of memory barrier should
> always be paired.  A lack of appropriate pairing is almost certainly an error.

I'm no mb expert, and I know of no symptoms, but it appears to be saying that
load reordering could result in a race where the READY flag could be set with
old data in other descriptor fields due to loading them in a different order --
something it says wmb() on another CPU explicitly does not prevent.

The particular case would be adding to the ring at the same time the other side
is removing from the ring, so no locks or LDC traffic would affect it.

So, it appears to me we have a missing rmb() that is needed and I don't know what
leads you to believe it isn't.

							+-DLS

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay()
  2014-09-08 13:45         ` David L Stevens
@ 2014-09-08 14:03           ` Sowmini Varadhan
  0 siblings, 0 replies; 10+ messages in thread
From: Sowmini Varadhan @ 2014-09-08 14:03 UTC (permalink / raw)
  To: David L Stevens; +Cc: David Miller, netdev, raghuram.kothakota

On (09/08/14 09:45), David L Stevens wrote:

First off, please don't drop email ids from the original email. 
I've re-added Raghuram Kothakota to fix that initial omission.

Second, this thread originally came up with your suggestion that
we shold move the wmb() down one line. I looked at this, and consulted
a few folks, all of whom agree that that is probably incorrect. 

As Bob Picco pointed out to me:
" Placing the wmb() after probably won't have any negative consequences
  BUT could. Suppose the compiler reorders the stores and the d->hdr.state
  store occurs before another part of the descriptor. We HV trap,
  and the trap doesn't flush the store buffer. The consumer could then
  see VIO_DESC_READY before the cookies arrray is updated. It would depend
  on how many cache lines are spanned by vio_net_desc and the store buffer
  organization."
thus I would be cautious about doing that.

> I'm no mb expert, and I know of no symptoms, but it appears to be saying that
> load reordering could result in a race where the READY flag could be set with
> old data in other descriptor fields due to loading them in a different order --
> something it says wmb() on another CPU explicitly does not prevent.

Please see the explanation that I, and Raghuram offered.
The wmb() assures that the stores of the cookie state are
ordered correctly by the consumer. The next store is for VIO_DESC_READY
and the consumer will not proceed to read cookie state until this
is visible. Thus no rmb() is needed. 

On its side, the consumer resets the hdr.state to DONE after it
consumes the cookies, and the DRING_STOPPLED announcement further
informs the producer that the descriptor is available. 

Hope that helps. If you think there is some sequence where this
is insufficient, please explain with details.

>
> The particular case would be adding to the ring at the same time the
> other side
> is removing from the ring, so no locks or LDC traffic would affect it.
> 
> So, it appears to me we have a missing rmb() that is needed and I
> don't know what
> leads you to believe it isn't.
> 
> 							+-DLS

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-09-08 14:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 16:20 [PATCH net-next v2 2/2] Re-check for a VIO_DESC_READY data descriptor after short udelay() Sowmini Varadhan
2014-09-02 16:27 ` David L Stevens
2014-09-02 16:32   ` Sowmini Varadhan
2014-09-05  5:36   ` David Miller
2014-09-05 13:47     ` Sowmini Varadhan
2014-09-06 21:02       ` Sowmini Varadhan
     [not found]         ` <20140907181510.GA23753@oracle.com>
2014-09-07 19:36           ` Raghuram Kothakota
2014-09-07 23:19         ` David Miller
2014-09-08 13:45         ` David L Stevens
2014-09-08 14:03           ` Sowmini Varadhan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.