Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] net: optimize cmpxchg in ip_idents_reserve
@ 2020-01-15  3:23 Shaokun Zhang
  2020-01-16 12:27 ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Shaokun Zhang @ 2020-01-15  3:23 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Yuqi Jin, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Eric Dumazet, Yang Guo, Shaokun Zhang

From: Yuqi Jin <jinyuqi@huawei.com>

atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
the access number of the global variable @p_id in the loop. Let's
optimize it for performance.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yang Guo <guoyang2@huawei.com>
Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
---
 net/ipv4/route.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 87e979f2b74a..7e28c7121c20 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -496,10 +496,10 @@ u32 ip_idents_reserve(u32 hash, int segs)
 		delta = prandom_u32_max(now - old);
 
 	/* Do not use atomic_add_return() as it makes UBSAN unhappy */
+	old = (u32)atomic_read(p_id);
 	do {
-		old = (u32)atomic_read(p_id);
 		new = old + delta + segs;
-	} while (atomic_cmpxchg(p_id, old, new) != old);
+	} while (!atomic_try_cmpxchg(p_id, &old, new));
 
 	return new - segs;
 }
-- 
2.7.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: optimize cmpxchg in ip_idents_reserve
  2020-01-15  3:23 [PATCH] net: optimize cmpxchg in ip_idents_reserve Shaokun Zhang
@ 2020-01-16 12:27 ` David Miller
  2020-01-16 14:05   ` Shaokun Zhang
  2020-01-16 15:12   ` Eric Dumazet
  0 siblings, 2 replies; 7+ messages in thread
From: David Miller @ 2020-01-16 12:27 UTC (permalink / raw)
  To: zhangshaokun
  Cc: netdev, linux-kernel, jinyuqi, kuznet, yoshfuji, edumazet, guoyang2

From: Shaokun Zhang <zhangshaokun@hisilicon.com>
Date: Wed, 15 Jan 2020 11:23:40 +0800

> From: Yuqi Jin <jinyuqi@huawei.com>
> 
> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
> the access number of the global variable @p_id in the loop. Let's
> optimize it for performance.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Yang Guo <guoyang2@huawei.com>
> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>

I doubt this makes any measurable improvement in performance.

If you can document a specific measurable improvement under
a useful set of circumstances for real usage, then put those
details into the commit message and resubmit.

Otherwise, I'm not applying this, sorry.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: optimize cmpxchg in ip_idents_reserve
  2020-01-16 12:27 ` David Miller
@ 2020-01-16 14:05   ` Shaokun Zhang
  2020-01-16 15:12   ` Eric Dumazet
  1 sibling, 0 replies; 7+ messages in thread
From: Shaokun Zhang @ 2020-01-16 14:05 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-kernel, jinyuqi, kuznet, yoshfuji, edumazet, guoyang2

Hi David,

On 2020/1/16 20:27, David Miller wrote:
> From: Shaokun Zhang <zhangshaokun@hisilicon.com>
> Date: Wed, 15 Jan 2020 11:23:40 +0800
> 
>> From: Yuqi Jin <jinyuqi@huawei.com>
>>
>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>> the access number of the global variable @p_id in the loop. Let's
>> optimize it for performance.
>>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Yang Guo <guoyang2@huawei.com>
>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
> 
> I doubt this makes any measurable improvement in performance.
> 
> If you can document a specific measurable improvement under
> a useful set of circumstances for real usage, then put those
> details into the commit message and resubmit.

Ok, I will do it and resubmit.

Thanks your reply,
Shaokun

> 
> Otherwise, I'm not applying this, sorry.
> 
> .
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: optimize cmpxchg in ip_idents_reserve
  2020-01-16 12:27 ` David Miller
  2020-01-16 14:05   ` Shaokun Zhang
@ 2020-01-16 15:12   ` Eric Dumazet
  2020-01-16 15:19     ` Eric Dumazet
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2020-01-16 15:12 UTC (permalink / raw)
  To: David Miller, zhangshaokun
  Cc: netdev, linux-kernel, jinyuqi, kuznet, yoshfuji, edumazet, guoyang2



On 1/16/20 4:27 AM, David Miller wrote:
> From: Shaokun Zhang <zhangshaokun@hisilicon.com>
> Date: Wed, 15 Jan 2020 11:23:40 +0800
> 
>> From: Yuqi Jin <jinyuqi@huawei.com>
>>
>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>> the access number of the global variable @p_id in the loop. Let's
>> optimize it for performance.
>>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Yang Guo <guoyang2@huawei.com>
>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
> 
> I doubt this makes any measurable improvement in performance.
> 
> If you can document a specific measurable improvement under
> a useful set of circumstances for real usage, then put those
> details into the commit message and resubmit.
> 
> Otherwise, I'm not applying this, sorry.
> 


Real difference that could be made here is to 
only use this cmpxchg() dance for CONFIG_UBSAN

When CONFIG_UBSAN is not set, atomic_add_return() is just fine.

(Supposedly UBSAN should not warn about that either, but this depends on compiler version)

 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: optimize cmpxchg in ip_idents_reserve
  2020-01-16 15:12   ` Eric Dumazet
@ 2020-01-16 15:19     ` Eric Dumazet
  2020-01-17  6:54       ` Shaokun Zhang
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2020-01-16 15:19 UTC (permalink / raw)
  To: David Miller, zhangshaokun
  Cc: netdev, linux-kernel, jinyuqi, kuznet, yoshfuji, edumazet, guoyang2



On 1/16/20 7:12 AM, Eric Dumazet wrote:
> 
> 
> On 1/16/20 4:27 AM, David Miller wrote:
>> From: Shaokun Zhang <zhangshaokun@hisilicon.com>
>> Date: Wed, 15 Jan 2020 11:23:40 +0800
>>
>>> From: Yuqi Jin <jinyuqi@huawei.com>
>>>
>>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>>> the access number of the global variable @p_id in the loop. Let's
>>> optimize it for performance.
>>>
>>> Cc: "David S. Miller" <davem@davemloft.net>
>>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Yang Guo <guoyang2@huawei.com>
>>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
>>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
>>
>> I doubt this makes any measurable improvement in performance.
>>
>> If you can document a specific measurable improvement under
>> a useful set of circumstances for real usage, then put those
>> details into the commit message and resubmit.
>>
>> Otherwise, I'm not applying this, sorry.
>>
> 
> 
> Real difference that could be made here is to 
> only use this cmpxchg() dance for CONFIG_UBSAN
> 
> When CONFIG_UBSAN is not set, atomic_add_return() is just fine.
> 
> (Supposedly UBSAN should not warn about that either, but this depends on compiler version)

I will test something like :

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2010888e68ca96ae880481973a6d808d6c5612c5..e2fa972f5c78f2aefc801db6a45b2a81141c3028 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -495,11 +495,15 @@ u32 ip_idents_reserve(u32 hash, int segs)
        if (old != now && cmpxchg(p_tstamp, old, now) == old)
                delta = prandom_u32_max(now - old);
 
-       /* Do not use atomic_add_return() as it makes UBSAN unhappy */
+#ifdef CONFIG_UBSAN
+       /* Do not use atomic_add_return() as it makes old UBSAN versions unhappy */
        do {
                old = (u32)atomic_read(p_id);
                new = old + delta + segs;
        } while (atomic_cmpxchg(p_id, old, new) != old);
+#else
+       new = atomic_add_return(segs + delta, p_id);
+#endif
 
        return new - segs;
 }


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: optimize cmpxchg in ip_idents_reserve
  2020-01-16 15:19     ` Eric Dumazet
@ 2020-01-17  6:54       ` Shaokun Zhang
  2020-01-17 12:32         ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Shaokun Zhang @ 2020-01-17  6:54 UTC (permalink / raw)
  To: Eric Dumazet, David Miller
  Cc: netdev, linux-kernel, jinyuqi, kuznet, yoshfuji, edumazet,
	guoyang2, Will Deacon

+Cc Will Deacon,

On 2020/1/16 23:19, Eric Dumazet wrote:
> 
> 
> On 1/16/20 7:12 AM, Eric Dumazet wrote:
>>
>>
>> On 1/16/20 4:27 AM, David Miller wrote:
>>> From: Shaokun Zhang <zhangshaokun@hisilicon.com>
>>> Date: Wed, 15 Jan 2020 11:23:40 +0800
>>>
>>>> From: Yuqi Jin <jinyuqi@huawei.com>
>>>>
>>>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>>>> the access number of the global variable @p_id in the loop. Let's
>>>> optimize it for performance.
>>>>
>>>> Cc: "David S. Miller" <davem@davemloft.net>
>>>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>>>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>> Cc: Yang Guo <guoyang2@huawei.com>
>>>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
>>>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
>>>
>>> I doubt this makes any measurable improvement in performance.
>>>
>>> If you can document a specific measurable improvement under
>>> a useful set of circumstances for real usage, then put those
>>> details into the commit message and resubmit.
>>>
>>> Otherwise, I'm not applying this, sorry.
>>>
>>
>>
>> Real difference that could be made here is to 
>> only use this cmpxchg() dance for CONFIG_UBSAN
>>
>> When CONFIG_UBSAN is not set, atomic_add_return() is just fine.
>>
>> (Supposedly UBSAN should not warn about that either, but this depends on compiler version)
> 
> I will test something like :
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 2010888e68ca96ae880481973a6d808d6c5612c5..e2fa972f5c78f2aefc801db6a45b2a81141c3028 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -495,11 +495,15 @@ u32 ip_idents_reserve(u32 hash, int segs)
>         if (old != now && cmpxchg(p_tstamp, old, now) == old)
>                 delta = prandom_u32_max(now - old);
>  
> -       /* Do not use atomic_add_return() as it makes UBSAN unhappy */
> +#ifdef CONFIG_UBSAN

I appreciate Eric's idea.

> +       /* Do not use atomic_add_return() as it makes old UBSAN versions unhappy */
>         do {
>                 old = (u32)atomic_read(p_id);
>                 new = old + delta + segs;
>         } while (atomic_cmpxchg(p_id, old, new) != old);

But about this, I still think that we can try to use atomic_try_cmpxchg instead
of atomic_cmpxchg, like refcount_add_not_zero in include/linux/refcount.h

I abstract the model, as follow:
while (get_cycles() <= (time_temp + t_cnt))
{
	do{
		old_temp = wk_in->num.counter;
		oldt = atomic64_cmpxchg(&wk_in->num, old_temp, old_temp+1);
	}while(oldt != old_temp);

	myndelay(delay_o);
	w_cnt+=1;
}

val = atomic64_read(&wk_in->num);
while (get_cycles() <= (time_temp + t_cnt))
{
	do{
		new_val = val + 1;
			
	}while(!atomic64_try_cmpxchg(&wk_in->num, &val, new_val));

	myndelay(delay_o);
	w_cnt += 1;
}

If we take the access global variable out of loop, the performance become better
on both x86 and arm64, as follow:
Kunpeng920: 24 cores call it and the successful operations per second
atomic64_read in loop 	   atomic64_read out of the loop
6291034	                   8751962

Intel 6248: 20 cores call it and the successful operations per second
atomic64_read in loop 	   atomic64_read out of the loop
8934792	                   9978998

So how about this? ;-)

                delta = prandom_u32_max(now - old);

+#ifdef CONFIG_UBSAN
        /* Do not use atomic_add_return() as it makes UBSAN unhappy */
+       old = (u32)atomic_read(p_id);
        do {
-               old = (u32)atomic_read(p_id);
                new = old + delta + segs;
-       } while (atomic_cmpxchg(p_id, old, new) != old);
+       } while (!atomic_try_cmpxchg(p_id, &old, new));
+#else
+       new = atomic_add_return(segs + delta, p_id);
+#endif

Thanks,
Shaokun


> +#else
> +       new = atomic_add_return(segs + delta, p_id);
> +#endif
>  
>         return new - segs;
>  }
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: optimize cmpxchg in ip_idents_reserve
  2020-01-17  6:54       ` Shaokun Zhang
@ 2020-01-17 12:32         ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2020-01-17 12:32 UTC (permalink / raw)
  To: Shaokun Zhang
  Cc: Eric Dumazet, David Miller, netdev, linux-kernel, jinyuqi,
	kuznet, yoshfuji, edumazet, guoyang2, Will Deacon

On Fri, Jan 17, 2020 at 02:54:03PM +0800, Shaokun Zhang wrote:
> So how about this? ;-)
> 
>                 delta = prandom_u32_max(now - old);
> 
> +#ifdef CONFIG_UBSAN
>         /* Do not use atomic_add_return() as it makes UBSAN unhappy */
> +       old = (u32)atomic_read(p_id);
>         do {
> -               old = (u32)atomic_read(p_id);
>                 new = old + delta + segs;
> -       } while (atomic_cmpxchg(p_id, old, new) != old);
> +       } while (!atomic_try_cmpxchg(p_id, &old, new));
> +#else
> +       new = atomic_add_return(segs + delta, p_id);
> +#endif

That's crazy, just accept that UBSAN is taking bonghits and ignore it.
Use atomic_add_return() unconditionally.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15  3:23 [PATCH] net: optimize cmpxchg in ip_idents_reserve Shaokun Zhang
2020-01-16 12:27 ` David Miller
2020-01-16 14:05   ` Shaokun Zhang
2020-01-16 15:12   ` Eric Dumazet
2020-01-16 15:19     ` Eric Dumazet
2020-01-17  6:54       ` Shaokun Zhang
2020-01-17 12:32         ` Peter Zijlstra

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git