All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp
@ 2017-03-15 19:10 Kevin Traynor
  2017-03-16  6:21 ` Yuanhan Liu
  2017-03-23 15:44 ` [PATCH v2] vhost: fix virtio_net false sharing Kevin Traynor
  0 siblings, 2 replies; 9+ messages in thread
From: Kevin Traynor @ 2017-03-15 19:10 UTC (permalink / raw)
  To: yuanhan.liu, maxime.coquelin; +Cc: dev, Kevin Traynor, stable

The virtio_net structure is used in both enqueue and dequeue datapaths.
broadcast_rarp is checked with cmpset in the dequeue datapath regardless
of whether descriptors are available or not.

It is observed in some cases where dequeue and enqueue are performed by
different cores and no packets are available on the dequeue datapath
(i.e. uni-directional traffic), the frequent checking of broadcast_rarp
in dequeue causes performance degradation for the enqueue datapath.

In OVS the issue can cause a uni-directional performance drop of up to 15%.

Fix that by moving broadcast_rarp to a different cache line in
virtio_net struct.

Fixes: a66bcad32240 ("vhost: arrange struct fields for better cache sharing")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 lib/librte_vhost/vhost.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 22564f1..a254328 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -156,6 +156,4 @@ struct virtio_net {
 	uint32_t		flags;
 	uint16_t		vhost_hlen;
-	/* to tell if we need broadcast rarp packet */
-	rte_atomic16_t		broadcast_rarp;
 	uint32_t		virt_qp_nb;
 	int			dequeue_zero_copy;
@@ -167,4 +165,6 @@ struct virtio_net {
 	uint64_t		log_addr;
 	struct ether_addr	mac;
+	/* to tell if we need broadcast rarp packet */
+	rte_atomic16_t		broadcast_rarp;
 
 	uint32_t		nr_guest_pages;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp
  2017-03-15 19:10 [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp Kevin Traynor
@ 2017-03-16  6:21 ` Yuanhan Liu
  2017-03-16 10:10   ` Kevin Traynor
  2017-03-23 15:44 ` [PATCH v2] vhost: fix virtio_net false sharing Kevin Traynor
  1 sibling, 1 reply; 9+ messages in thread
From: Yuanhan Liu @ 2017-03-16  6:21 UTC (permalink / raw)
  To: Kevin Traynor; +Cc: maxime.coquelin, dev, stable

On Wed, Mar 15, 2017 at 07:10:49PM +0000, Kevin Traynor wrote:
> The virtio_net structure is used in both enqueue and dequeue datapaths.
> broadcast_rarp is checked with cmpset in the dequeue datapath regardless
> of whether descriptors are available or not.
> 
> It is observed in some cases where dequeue and enqueue are performed by
> different cores and no packets are available on the dequeue datapath
> (i.e. uni-directional traffic), the frequent checking of broadcast_rarp
> in dequeue causes performance degradation for the enqueue datapath.
> 
> In OVS the issue can cause a uni-directional performance drop of up to 15%.
> 
> Fix that by moving broadcast_rarp to a different cache line in
> virtio_net struct.

Thanks, but I'm a bit confused. The drop looks like being caused by
cache false sharing, but I don't see anything would lead to a false
sharing. I mean, there is no write in the same cache line where the
broadcast_rarp belongs. Or, the "volatile" type is the culprit here?

Talking about that, I had actually considered to turn "broadcast_rarp"
to a simple "int" or "uint16_t" type, to make it more light weight.
The reason I used atomic type is to exactly send one broadcast RARP
packet once SEND_RARP request is recieved. Otherwise, we may send more
than one RARP packet when MQ is invovled. But I think we don't have
to be that accurate: it's tolerable when more RARP are sent. I saw 4
SEND_RARP requests (aka 4 RARP packets) in the last time I tried
vhost-user live migration after all. I don't quite remember why
it was 4 though.

That said, I think it also would resolve the performance issue if you
change "rte_atomic16_t" to "uint16_t", without moving the place?

	--yliu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp
  2017-03-16  6:21 ` Yuanhan Liu
@ 2017-03-16 10:10   ` Kevin Traynor
  2017-03-17  5:47     ` Yuanhan Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Kevin Traynor @ 2017-03-16 10:10 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: maxime.coquelin, dev, stable

On 03/16/2017 06:21 AM, Yuanhan Liu wrote:
> On Wed, Mar 15, 2017 at 07:10:49PM +0000, Kevin Traynor wrote:
>> The virtio_net structure is used in both enqueue and dequeue datapaths.
>> broadcast_rarp is checked with cmpset in the dequeue datapath regardless
>> of whether descriptors are available or not.
>>
>> It is observed in some cases where dequeue and enqueue are performed by
>> different cores and no packets are available on the dequeue datapath
>> (i.e. uni-directional traffic), the frequent checking of broadcast_rarp
>> in dequeue causes performance degradation for the enqueue datapath.
>>
>> In OVS the issue can cause a uni-directional performance drop of up to 15%.
>>
>> Fix that by moving broadcast_rarp to a different cache line in
>> virtio_net struct.
> 
> Thanks, but I'm a bit confused. The drop looks like being caused by
> cache false sharing, but I don't see anything would lead to a false
> sharing. I mean, there is no write in the same cache line where the
> broadcast_rarp belongs. Or, the "volatile" type is the culprit here?
> 

Yes, the cmpset code uses cmpxchg and that performs a write regardless
of the result - it either writes the new value or back the old value.

> Talking about that, I had actually considered to turn "broadcast_rarp"
> to a simple "int" or "uint16_t" type, to make it more light weight.
> The reason I used atomic type is to exactly send one broadcast RARP
> packet once SEND_RARP request is recieved. Otherwise, we may send more
> than one RARP packet when MQ is invovled. But I think we don't have
> to be that accurate: it's tolerable when more RARP are sent. I saw 4
> SEND_RARP requests (aka 4 RARP packets) in the last time I tried
> vhost-user live migration after all. I don't quite remember why
> it was 4 though.
> 
> That said, I think it also would resolve the performance issue if you
> change "rte_atomic16_t" to "uint16_t", without moving the place?
> 

Yes, that should work fine, with the side effect you mentioned of
possibly some more rarps - no big deal.

I tested another solution also - as it is unlikely we would need to send
the broadcast_rarp, you can first read and only do the cmpset if it is
likely to succeed. This resolved the issue too.

--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1057,7 +1057,8 @@ static inline bool __attribute__((always_inline))
         *
         * Check user_send_rarp() for more information.
         */
-       if (unlikely(rte_atomic16_cmpset((volatile uint16_t *)
+       if (unlikely(rte_atomic16_read(&dev->broadcast_rarp) &&
+                       rte_atomic16_cmpset((volatile uint16_t *)
                                         &dev->broadcast_rarp.cnt, 1, 0))) {
                rarp_mbuf = rte_pktmbuf_alloc(mbuf_pool);
                if (rarp_mbuf == NULL) {

I choose changing the struct because the 'read && cmpset' code is
non-obvious and someone might not think to do that in the future. I did
a PVP test with testpmd and didn't see any degradation with the struct
change, so I thought it can be a good solution.

I tested the struct change with several combinations of DPDK
16.11.1/17.02/master combined with OVS 2.6/2.7/master. If you prefer one
of the other solutions, let me know and I'll perform some additional
testing.

Kevin.

> 	--yliu
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp
  2017-03-16 10:10   ` Kevin Traynor
@ 2017-03-17  5:47     ` Yuanhan Liu
  2017-03-17 10:01       ` Maxime Coquelin
  0 siblings, 1 reply; 9+ messages in thread
From: Yuanhan Liu @ 2017-03-17  5:47 UTC (permalink / raw)
  To: Kevin Traynor; +Cc: maxime.coquelin, dev, stable

On Thu, Mar 16, 2017 at 10:10:05AM +0000, Kevin Traynor wrote:
> On 03/16/2017 06:21 AM, Yuanhan Liu wrote:
> > On Wed, Mar 15, 2017 at 07:10:49PM +0000, Kevin Traynor wrote:
> >> The virtio_net structure is used in both enqueue and dequeue datapaths.
> >> broadcast_rarp is checked with cmpset in the dequeue datapath regardless
> >> of whether descriptors are available or not.
> >>
> >> It is observed in some cases where dequeue and enqueue are performed by
> >> different cores and no packets are available on the dequeue datapath
> >> (i.e. uni-directional traffic), the frequent checking of broadcast_rarp
> >> in dequeue causes performance degradation for the enqueue datapath.
> >>
> >> In OVS the issue can cause a uni-directional performance drop of up to 15%.
> >>
> >> Fix that by moving broadcast_rarp to a different cache line in
> >> virtio_net struct.
> > 
> > Thanks, but I'm a bit confused. The drop looks like being caused by
> > cache false sharing, but I don't see anything would lead to a false
> > sharing. I mean, there is no write in the same cache line where the
> > broadcast_rarp belongs. Or, the "volatile" type is the culprit here?
> > 
> 
> Yes, the cmpset code uses cmpxchg and that performs a write regardless
> of the result - it either writes the new value or back the old value.

Oh, right, I missed this part!

> > Talking about that, I had actually considered to turn "broadcast_rarp"
> > to a simple "int" or "uint16_t" type, to make it more light weight.
> > The reason I used atomic type is to exactly send one broadcast RARP
> > packet once SEND_RARP request is recieved. Otherwise, we may send more
> > than one RARP packet when MQ is invovled. But I think we don't have
> > to be that accurate: it's tolerable when more RARP are sent. I saw 4
> > SEND_RARP requests (aka 4 RARP packets) in the last time I tried
> > vhost-user live migration after all. I don't quite remember why
> > it was 4 though.
> > 
> > That said, I think it also would resolve the performance issue if you
> > change "rte_atomic16_t" to "uint16_t", without moving the place?
> > 
> 
> Yes, that should work fine, with the side effect you mentioned of
> possibly some more rarps - no big deal.
> 
> I tested another solution also - as it is unlikely we would need to send
> the broadcast_rarp, you can first read and only do the cmpset if it is
> likely to succeed. This resolved the issue too.
> 
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -1057,7 +1057,8 @@ static inline bool __attribute__((always_inline))
>          *
>          * Check user_send_rarp() for more information.
>          */
> -       if (unlikely(rte_atomic16_cmpset((volatile uint16_t *)
> +       if (unlikely(rte_atomic16_read(&dev->broadcast_rarp) &&
> +                       rte_atomic16_cmpset((volatile uint16_t *)
>                                          &dev->broadcast_rarp.cnt, 1, 0))) {
>                 rarp_mbuf = rte_pktmbuf_alloc(mbuf_pool);
>                 if (rarp_mbuf == NULL) {

I'm okay with this one. It's simple and clean enough, that it could
be picked to a stable release. Later, I'd like to send another patch
to turn it to "uint16_t". Since it changes the behaviour a bit, it
is not a good candidate for stable release.

BTW, would you please include the root cause (false sharing) into
your commit log?

	--yliu
> 
> I choose changing the struct because the 'read && cmpset' code is
> non-obvious and someone might not think to do that in the future. I did
> a PVP test with testpmd and didn't see any degradation with the struct
> change, so I thought it can be a good solution.
> 
> I tested the struct change with several combinations of DPDK
> 16.11.1/17.02/master combined with OVS 2.6/2.7/master. If you prefer one
> of the other solutions, let me know and I'll perform some additional
> testing.
> 
> Kevin.
> 
> > 	--yliu
> > 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp
  2017-03-17  5:47     ` Yuanhan Liu
@ 2017-03-17 10:01       ` Maxime Coquelin
  2017-03-20 11:13         ` Kevin Traynor
  0 siblings, 1 reply; 9+ messages in thread
From: Maxime Coquelin @ 2017-03-17 10:01 UTC (permalink / raw)
  To: Yuanhan Liu, Kevin Traynor; +Cc: dev, stable



On 03/17/2017 06:47 AM, Yuanhan Liu wrote:
> On Thu, Mar 16, 2017 at 10:10:05AM +0000, Kevin Traynor wrote:
>> On 03/16/2017 06:21 AM, Yuanhan Liu wrote:
>>> On Wed, Mar 15, 2017 at 07:10:49PM +0000, Kevin Traynor wrote:
>>>> The virtio_net structure is used in both enqueue and dequeue datapaths.
>>>> broadcast_rarp is checked with cmpset in the dequeue datapath regardless
>>>> of whether descriptors are available or not.
>>>>
>>>> It is observed in some cases where dequeue and enqueue are performed by
>>>> different cores and no packets are available on the dequeue datapath
>>>> (i.e. uni-directional traffic), the frequent checking of broadcast_rarp
>>>> in dequeue causes performance degradation for the enqueue datapath.
>>>>
>>>> In OVS the issue can cause a uni-directional performance drop of up to 15%.
>>>>
>>>> Fix that by moving broadcast_rarp to a different cache line in
>>>> virtio_net struct.
>>>
>>> Thanks, but I'm a bit confused. The drop looks like being caused by
>>> cache false sharing, but I don't see anything would lead to a false
>>> sharing. I mean, there is no write in the same cache line where the
>>> broadcast_rarp belongs. Or, the "volatile" type is the culprit here?
>>>
>>
>> Yes, the cmpset code uses cmpxchg and that performs a write regardless
>> of the result - it either writes the new value or back the old value.
>
> Oh, right, I missed this part!
>
>>> Talking about that, I had actually considered to turn "broadcast_rarp"
>>> to a simple "int" or "uint16_t" type, to make it more light weight.
>>> The reason I used atomic type is to exactly send one broadcast RARP
>>> packet once SEND_RARP request is recieved. Otherwise, we may send more
>>> than one RARP packet when MQ is invovled. But I think we don't have
>>> to be that accurate: it's tolerable when more RARP are sent. I saw 4
>>> SEND_RARP requests (aka 4 RARP packets) in the last time I tried
>>> vhost-user live migration after all. I don't quite remember why
>>> it was 4 though.
>>>
>>> That said, I think it also would resolve the performance issue if you
>>> change "rte_atomic16_t" to "uint16_t", without moving the place?
>>>
>>
>> Yes, that should work fine, with the side effect you mentioned of
>> possibly some more rarps - no big deal.
>>
>> I tested another solution also - as it is unlikely we would need to send
>> the broadcast_rarp, you can first read and only do the cmpset if it is
>> likely to succeed. This resolved the issue too.
>>
>> --- a/lib/librte_vhost/virtio_net.c
>> +++ b/lib/librte_vhost/virtio_net.c
>> @@ -1057,7 +1057,8 @@ static inline bool __attribute__((always_inline))
>>          *
>>          * Check user_send_rarp() for more information.
>>          */
>> -       if (unlikely(rte_atomic16_cmpset((volatile uint16_t *)
>> +       if (unlikely(rte_atomic16_read(&dev->broadcast_rarp) &&
>> +                       rte_atomic16_cmpset((volatile uint16_t *)
>>                                          &dev->broadcast_rarp.cnt, 1, 0))) {
>>                 rarp_mbuf = rte_pktmbuf_alloc(mbuf_pool);
>>                 if (rarp_mbuf == NULL) {
>
> I'm okay with this one. It's simple and clean enough, that it could
> be picked to a stable release. Later, I'd like to send another patch
> to turn it to "uint16_t". Since it changes the behaviour a bit, it
> is not a good candidate for stable release.
>
> BTW, would you please include the root cause (false sharing) into
> your commit log?
And maybe also adds the info to the comment just above?
I will help people wondering why we read before cmpset.

Maxime

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp
  2017-03-17 10:01       ` Maxime Coquelin
@ 2017-03-20 11:13         ` Kevin Traynor
  0 siblings, 0 replies; 9+ messages in thread
From: Kevin Traynor @ 2017-03-20 11:13 UTC (permalink / raw)
  To: Maxime Coquelin, Yuanhan Liu; +Cc: dev, stable

On 03/17/2017 10:01 AM, Maxime Coquelin wrote:
> 
> 
> On 03/17/2017 06:47 AM, Yuanhan Liu wrote:
>> On Thu, Mar 16, 2017 at 10:10:05AM +0000, Kevin Traynor wrote:
>>> On 03/16/2017 06:21 AM, Yuanhan Liu wrote:
>>>> On Wed, Mar 15, 2017 at 07:10:49PM +0000, Kevin Traynor wrote:
>>>>> The virtio_net structure is used in both enqueue and dequeue
>>>>> datapaths.
>>>>> broadcast_rarp is checked with cmpset in the dequeue datapath
>>>>> regardless
>>>>> of whether descriptors are available or not.
>>>>>
>>>>> It is observed in some cases where dequeue and enqueue are
>>>>> performed by
>>>>> different cores and no packets are available on the dequeue datapath
>>>>> (i.e. uni-directional traffic), the frequent checking of
>>>>> broadcast_rarp
>>>>> in dequeue causes performance degradation for the enqueue datapath.
>>>>>
>>>>> In OVS the issue can cause a uni-directional performance drop of up
>>>>> to 15%.
>>>>>
>>>>> Fix that by moving broadcast_rarp to a different cache line in
>>>>> virtio_net struct.
>>>>
>>>> Thanks, but I'm a bit confused. The drop looks like being caused by
>>>> cache false sharing, but I don't see anything would lead to a false
>>>> sharing. I mean, there is no write in the same cache line where the
>>>> broadcast_rarp belongs. Or, the "volatile" type is the culprit here?
>>>>
>>>
>>> Yes, the cmpset code uses cmpxchg and that performs a write regardless
>>> of the result - it either writes the new value or back the old value.
>>
>> Oh, right, I missed this part!
>>
>>>> Talking about that, I had actually considered to turn "broadcast_rarp"
>>>> to a simple "int" or "uint16_t" type, to make it more light weight.
>>>> The reason I used atomic type is to exactly send one broadcast RARP
>>>> packet once SEND_RARP request is recieved. Otherwise, we may send more
>>>> than one RARP packet when MQ is invovled. But I think we don't have
>>>> to be that accurate: it's tolerable when more RARP are sent. I saw 4
>>>> SEND_RARP requests (aka 4 RARP packets) in the last time I tried
>>>> vhost-user live migration after all. I don't quite remember why
>>>> it was 4 though.
>>>>
>>>> That said, I think it also would resolve the performance issue if you
>>>> change "rte_atomic16_t" to "uint16_t", without moving the place?
>>>>
>>>
>>> Yes, that should work fine, with the side effect you mentioned of
>>> possibly some more rarps - no big deal.
>>>
>>> I tested another solution also - as it is unlikely we would need to send
>>> the broadcast_rarp, you can first read and only do the cmpset if it is
>>> likely to succeed. This resolved the issue too.
>>>
>>> --- a/lib/librte_vhost/virtio_net.c
>>> +++ b/lib/librte_vhost/virtio_net.c
>>> @@ -1057,7 +1057,8 @@ static inline bool __attribute__((always_inline))
>>>          *
>>>          * Check user_send_rarp() for more information.
>>>          */
>>> -       if (unlikely(rte_atomic16_cmpset((volatile uint16_t *)
>>> +       if (unlikely(rte_atomic16_read(&dev->broadcast_rarp) &&
>>> +                       rte_atomic16_cmpset((volatile uint16_t *)
>>>                                          &dev->broadcast_rarp.cnt, 1,
>>> 0))) {
>>>                 rarp_mbuf = rte_pktmbuf_alloc(mbuf_pool);
>>>                 if (rarp_mbuf == NULL) {
>>
>> I'm okay with this one. It's simple and clean enough, that it could
>> be picked to a stable release. Later, I'd like to send another patch
>> to turn it to "uint16_t". Since it changes the behaviour a bit, it
>> is not a good candidate for stable release.
>>
>> BTW, would you please include the root cause (false sharing) into
>> your commit log?
> And maybe also adds the info to the comment just above?
> I will help people wondering why we read before cmpset.
> 

Sure, I will re-spin, do some testing and submit a v2.

> Maxime

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] vhost: fix virtio_net false sharing
  2017-03-15 19:10 [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp Kevin Traynor
  2017-03-16  6:21 ` Yuanhan Liu
@ 2017-03-23 15:44 ` Kevin Traynor
  2017-03-27  7:34   ` Maxime Coquelin
  1 sibling, 1 reply; 9+ messages in thread
From: Kevin Traynor @ 2017-03-23 15:44 UTC (permalink / raw)
  To: yuanhan.liu, maxime.coquelin; +Cc: dev, Kevin Traynor, stable

The broadcast_rarp field in the virtio_net struct is checked in the
dequeue datapath regardless of whether descriptors are available or not.

As it is checked with cmpset leading to a write, false sharing on the
virtio_net struct can happen between enqueue and dequeue datapaths
regardless of whether a RARP is requested. In OVS, the issue can cause
a uni-directional performance drop of up to 15%.

Fix that by only performing the cmpset if a read of broadcast_rarp
indicates that the cmpset is likely to succeed.

Fixes: a66bcad32240 ("vhost: arrange struct fields for better cache sharing")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---

V2:
Change from fixing by moving broadcast_rarp location in virtio_net struct
to performing a read before cmpset.

 lib/librte_vhost/virtio_net.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 337470d..d0a3b11 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1057,7 +1057,19 @@ static inline bool __attribute__((always_inline))
 	 *
 	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * rte_atomic16_cmpset() causes a write if using cmpxchg. This could
+	 * result in false sharing between enqueue and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing cmpset if the read indicates it is likely to
+	 * be set.
 	 */
-	if (unlikely(rte_atomic16_cmpset((volatile uint16_t *)
-					 &dev->broadcast_rarp.cnt, 1, 0))) {
+
+	if (unlikely(rte_atomic16_read(&dev->broadcast_rarp) &&
+			rte_atomic16_cmpset((volatile uint16_t *)
+				&dev->broadcast_rarp.cnt, 1, 0))) {
+
 		rarp_mbuf = rte_pktmbuf_alloc(mbuf_pool);
 		if (rarp_mbuf == NULL) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] vhost: fix virtio_net false sharing
  2017-03-23 15:44 ` [PATCH v2] vhost: fix virtio_net false sharing Kevin Traynor
@ 2017-03-27  7:34   ` Maxime Coquelin
  2017-03-27  8:33     ` Yuanhan Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Maxime Coquelin @ 2017-03-27  7:34 UTC (permalink / raw)
  To: Kevin Traynor, yuanhan.liu; +Cc: dev, stable



On 03/23/2017 04:44 PM, Kevin Traynor wrote:
> The broadcast_rarp field in the virtio_net struct is checked in the
> dequeue datapath regardless of whether descriptors are available or not.
>
> As it is checked with cmpset leading to a write, false sharing on the
> virtio_net struct can happen between enqueue and dequeue datapaths
> regardless of whether a RARP is requested. In OVS, the issue can cause
> a uni-directional performance drop of up to 15%.
>
> Fix that by only performing the cmpset if a read of broadcast_rarp
> indicates that the cmpset is likely to succeed.
>
> Fixes: a66bcad32240 ("vhost: arrange struct fields for better cache sharing")
> Cc: stable@dpdk.org
>
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> ---
>
> V2:
> Change from fixing by moving broadcast_rarp location in virtio_net struct
> to performing a read before cmpset.
>
>  lib/librte_vhost/virtio_net.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)

Nice!
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

I'll try to benchmark it with testpmd only to see if we measure the
same gain without OVS.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] vhost: fix virtio_net false sharing
  2017-03-27  7:34   ` Maxime Coquelin
@ 2017-03-27  8:33     ` Yuanhan Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Yuanhan Liu @ 2017-03-27  8:33 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: Kevin Traynor, dev, stable

On Mon, Mar 27, 2017 at 09:34:19AM +0200, Maxime Coquelin wrote:
> 
> 
> On 03/23/2017 04:44 PM, Kevin Traynor wrote:
> >The broadcast_rarp field in the virtio_net struct is checked in the
> >dequeue datapath regardless of whether descriptors are available or not.
> >
> >As it is checked with cmpset leading to a write, false sharing on the
> >virtio_net struct can happen between enqueue and dequeue datapaths
> >regardless of whether a RARP is requested. In OVS, the issue can cause
> >a uni-directional performance drop of up to 15%.
> >
> >Fix that by only performing the cmpset if a read of broadcast_rarp
> >indicates that the cmpset is likely to succeed.
> >
> >Fixes: a66bcad32240 ("vhost: arrange struct fields for better cache sharing")
> >Cc: stable@dpdk.org
> >
> >Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> >---
> >
> >V2:
> >Change from fixing by moving broadcast_rarp location in virtio_net struct
> >to performing a read before cmpset.
> >
> > lib/librte_vhost/virtio_net.c | 16 ++++++++++++++--
> > 1 file changed, 14 insertions(+), 2 deletions(-)
> 
> Nice!
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Applied to dpdk-next-virtio.

Thanks.

	--yliu
> 
> I'll try to benchmark it with testpmd only to see if we measure the
> same gain without OVS.
> 
> Thanks,
> Maxime

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-03-27  8:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-15 19:10 [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp Kevin Traynor
2017-03-16  6:21 ` Yuanhan Liu
2017-03-16 10:10   ` Kevin Traynor
2017-03-17  5:47     ` Yuanhan Liu
2017-03-17 10:01       ` Maxime Coquelin
2017-03-20 11:13         ` Kevin Traynor
2017-03-23 15:44 ` [PATCH v2] vhost: fix virtio_net false sharing Kevin Traynor
2017-03-27  7:34   ` Maxime Coquelin
2017-03-27  8:33     ` Yuanhan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.