* [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack
@ 2023-02-18 6:53 Eddy Tao
2023-02-19 13:54 ` Simon Horman
0 siblings, 1 reply; 8+ messages in thread
From: Eddy Tao @ 2023-02-18 6:53 UTC (permalink / raw)
To: netdev
Cc: Eddy Tao, Pravin B Shelar, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, dev, linux-kernel
Add 2 performance revisions for ovs_packet_cmd_execute
1.Stores mainbody of sw_flow(600+ bytes) in stack
Benifit: avoid kmem cache alloc/free caused by ovs_flow_alloc/free
2.Define sw_flow_without_stats_init to initialize mainbody of
struct sw_flow, which does not provides memory for sw_flow_stats.
Reason: ovs_execute_actions does not touch sw_flow_stats.
Benefit: less memzero, say each 'sw_flow_stats *' takes 4/8
bytes, on systems with 20 to 128 logic cpus, this is a good deal.
Signed-off-by: Eddy Tao <taoyuan_eddy@hotmail.com>
---
net/openvswitch/datapath.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index fcee6012293b..337947d34355 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -589,6 +589,12 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
return err;
}
+static void sw_flow_without_stats_init(struct sw_flow *flow)
+{
+ memset(flow, 0, sizeof(*flow));
+ flow->stats_last_writer = -1;
+}
+
static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
{
struct ovs_header *ovs_header = info->userhdr;
@@ -596,7 +602,8 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
struct nlattr **a = info->attrs;
struct sw_flow_actions *acts;
struct sk_buff *packet;
- struct sw_flow *flow;
+ struct sw_flow f;
+ struct sw_flow *flow = &f;
struct sw_flow_actions *sf_acts;
struct datapath *dp;
struct vport *input_vport;
@@ -636,20 +643,18 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
}
/* Build an sw_flow for sending this packet. */
- flow = ovs_flow_alloc();
- err = PTR_ERR(flow);
- if (IS_ERR(flow))
- goto err_kfree_skb;
+ /* This flow has no sw_flow_stats */
+ sw_flow_without_stats_init(flow);
err = ovs_flow_key_extract_userspace(net, a[OVS_PACKET_ATTR_KEY],
packet, &flow->key, log);
if (err)
- goto err_flow_free;
+ goto err_kfree_skb;
err = ovs_nla_copy_actions(net, a[OVS_PACKET_ATTR_ACTIONS],
&flow->key, &acts, log);
if (err)
- goto err_flow_free;
+ goto err_kfree_skb;
rcu_assign_pointer(flow->sf_acts, acts);
packet->priority = flow->key.phy.priority;
@@ -677,13 +682,10 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
local_bh_enable();
rcu_read_unlock();
- ovs_flow_free(flow, false);
return err;
err_unlock:
rcu_read_unlock();
-err_flow_free:
- ovs_flow_free(flow, false);
err_kfree_skb:
kfree_skb(packet);
err:
--
2.27.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack
2023-02-18 6:53 [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack Eddy Tao
@ 2023-02-19 13:54 ` Simon Horman
2023-02-19 14:46 ` Eddy Tao
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Simon Horman @ 2023-02-19 13:54 UTC (permalink / raw)
To: Eddy Tao
Cc: netdev, Pravin B Shelar, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, dev, linux-kernel
On Sat, Feb 18, 2023 at 02:53:29PM +0800, Eddy Tao wrote:
> Add 2 performance revisions for ovs_packet_cmd_execute
I think that in general it's nicer to do one change per patch:
i.e. split this into two patches.
> 1.Stores mainbody of sw_flow(600+ bytes) in stack
> Benifit: avoid kmem cache alloc/free caused by ovs_flow_alloc/free
Perhaps I am wrong, but 600 bytes seems like a lot of stack memory to consume.
And thus probably needs a strong justification.
Do you have some performance numbers showing a benefit of this change?
> 2.Define sw_flow_without_stats_init to initialize mainbody of
> struct sw_flow, which does not provides memory for sw_flow_stats.
> Reason: ovs_execute_actions does not touch sw_flow_stats.
Are there other code-paths that would also benefit from this change.
> Benefit: less memzero, say each 'sw_flow_stats *' takes 4/8
> bytes, on systems with 20 to 128 logic cpus, this is a good deal.
Less is more :)
Do you have some performance numbers showing a benefit of this change?
> Signed-off-by: Eddy Tao <taoyuan_eddy@hotmail.com>
> ---
> net/openvswitch/datapath.c | 22 ++++++++++++----------
> 1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index fcee6012293b..337947d34355 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -589,6 +589,12 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
> return err;
> }
>
> +static void sw_flow_without_stats_init(struct sw_flow *flow)
> +{
> + memset(flow, 0, sizeof(*flow));
> + flow->stats_last_writer = -1;
> +}
> +
> static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
> {
> struct ovs_header *ovs_header = info->userhdr;
> @@ -596,7 +602,8 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
> struct nlattr **a = info->attrs;
> struct sw_flow_actions *acts;
> struct sk_buff *packet;
> - struct sw_flow *flow;
> + struct sw_flow f;
> + struct sw_flow *flow = &f;
I'm not sure it's really useful to have both f and flow.
Could we just have the following?
struct sw_flow *flow;
Also, it would be nice to move towards rather than away from
reverse xmas tree - longest line to shortest line - arrangement of local
variables in OVS code.
> struct sw_flow_actions *sf_acts;
> struct datapath *dp;
> struct vport *input_vport;
> @@ -636,20 +643,18 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
> }
>
> /* Build an sw_flow for sending this packet. */
> - flow = ovs_flow_alloc();
> - err = PTR_ERR(flow);
> - if (IS_ERR(flow))
> - goto err_kfree_skb;
> + /* This flow has no sw_flow_stats */
> + sw_flow_without_stats_init(flow);
>
> err = ovs_flow_key_extract_userspace(net, a[OVS_PACKET_ATTR_KEY],
> packet, &flow->key, log);
> if (err)
> - goto err_flow_free;
> + goto err_kfree_skb;
>
> err = ovs_nla_copy_actions(net, a[OVS_PACKET_ATTR_ACTIONS],
> &flow->key, &acts, log);
> if (err)
> - goto err_flow_free;
> + goto err_kfree_skb;
>
> rcu_assign_pointer(flow->sf_acts, acts);
> packet->priority = flow->key.phy.priority;
> @@ -677,13 +682,10 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
> local_bh_enable();
> rcu_read_unlock();
>
> - ovs_flow_free(flow, false);
> return err;
>
> err_unlock:
> rcu_read_unlock();
> -err_flow_free:
> - ovs_flow_free(flow, false);
> err_kfree_skb:
> kfree_skb(packet);
> err:
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack
2023-02-19 13:54 ` Simon Horman
@ 2023-02-19 14:46 ` Eddy Tao
2023-02-19 14:51 ` Eddy Tao
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Eddy Tao @ 2023-02-19 14:46 UTC (permalink / raw)
To: Simon Horman
Cc: netdev, Pravin B Shelar, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, dev, linux-kernel
Hi, Simon:
Thanks for looking into this.
The revisions i proposed are complementary for the same purpose, and
also reside in the same code segment.
I named them 2 items to clarify the details. Maybe it would be better to
name them 2 steps in the same revision to avoid confusion.
And yes, i do have some performance result below
Testing topology
|-----|
nic1--| |--nic1
nic2--| |--nic2
VM1(16cpus) | ovs | VM2(16 cpus)
nic3--| |--nic3
nic4--| |--nic4
|-----|
2 netperf client threads on each vnic
netperf -H $peer -p $((port+$i)) -t UDP_RR -l 60 -- -R 1 -r 8K,8K
netperf -H $peer -p $((port+$i)) -t TCP_RR -l 60 -- -R 1 -r 120,240
netperf -H $peer -p $((port+$i)) -t TCP_CRR -l 60 -- -R 1 -r 120,240
Mode Iterations Variance Average
UDP_RR 10 %1.33 48472 ==> before the change
UDP_RR 10 %2.13 49130 ==> after the change
TCP_RR 10 %4.56 79686 ==> before the change
TCP_RR 10 %3.42 79833 ==> after the change
TCP_CRR 10 %0.16 20596 ==> before the change
TCP_CRR 10 %0.11 21179 ==> after the change
Thanks
eddy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack
2023-02-19 13:54 ` Simon Horman
2023-02-19 14:46 ` Eddy Tao
@ 2023-02-19 14:51 ` Eddy Tao
2023-02-20 3:04 ` Eddy Tao
2023-02-20 7:11 ` Eddy Tao
3 siblings, 0 replies; 8+ messages in thread
From: Eddy Tao @ 2023-02-19 14:51 UTC (permalink / raw)
To: Simon Horman
Cc: netdev, Pravin B Shelar, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, dev, linux-kernel
>> Are there other code-paths that would also benefit from this change.
The change is focused on packets goes from user-space to data-path, I do not see other code-path that can benefit from this change
eddy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack
2023-02-19 13:54 ` Simon Horman
2023-02-19 14:46 ` Eddy Tao
2023-02-19 14:51 ` Eddy Tao
@ 2023-02-20 3:04 ` Eddy Tao
2023-02-20 3:44 ` [ovs-dev] " Eddy Tao
2023-02-20 7:11 ` Eddy Tao
3 siblings, 1 reply; 8+ messages in thread
From: Eddy Tao @ 2023-02-20 3:04 UTC (permalink / raw)
To: Simon Horman
Cc: netdev, Pravin B Shelar, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, dev, linux-kernel
Hi, Simon:
To have better visibility of the effect of the patch, i did another
test below
Disabling data-path flow installation to steer traffic to slow path
only, thus I can observe the performance on slow path, where
ovs_packet_cmd_execute is extensively used
Testing topology
|-----|
nic1--| |--nic1
nic2--| |--nic2
VM1(16cpus) | ovs | VM2(16 cpus)
nic3--| |--nic3
nic4--| |--nic4
|-----|
2 netperf client threads on each vnic
netperf -H $peer -p $((port+$i)) -t TCP_STREAM -l 60
netperf -H $peer -p $((port+$i)) -t TCP_RR -l 60 -- -R 1 -r 120,240
netperf -H $peer -p $((port+$i)) -t TCP_CRR -l 60 -- -R 1 -r 120,240
Mode Iterations Variance Average
TCP_STREAM 10 %3.83 1433 ==> before the change
TCP_STREAM 10 %3.39 1504 ==> after the change
TCP_RR 10 %2.35 45145 ==> before the change
TCP_RR 10 %1.06 47250 ==> after the change
TCP_CRR 10 %0.54 11310 ==> before the change
TCP_CRR 10 %2.64 12741 ==> after the change
Considering the size and simplicity of the patch, i would say the
performance benefit is decent.
Thanks
eddy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ovs-dev] [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack
2023-02-20 3:04 ` Eddy Tao
@ 2023-02-20 3:44 ` Eddy Tao
0 siblings, 0 replies; 8+ messages in thread
From: Eddy Tao @ 2023-02-20 3:44 UTC (permalink / raw)
To: Simon Horman
Cc: dev, netdev, linux-kernel, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, David S. Miller
More explanation to the meaning of performance data
>>Mode Iterations Variance Average
Iterations: the number of executions of the same test case (each
iteration get a performance value of perf[n] )
Average: sum of all execution and then divided by 'n_iterations'. Below
is the pseudo code
for (sum=0,i=0;i<iterations;i++) sum+= perf[i];
average = sum/iterations
Variance: A percentage value describing the overall deviation of
performance results from average. Below is the pseudo code
for (deviation=0,i=0;i<iterations;i++) deviation+=
square(perf[i] - average);
variance = ((square_root(deviation/iterations)) *
100)/average
On 2023/2/20 11:04, Eddy Tao wrote:
> Hi, Simon:
>
> To have better visibility of the effect of the patch, i did
> another test below
>
> Disabling data-path flow installation to steer traffic to slow path
> only, thus I can observe the performance on slow path, where
> ovs_packet_cmd_execute is extensively used
>
>
> Testing topology
>
> |-----|
> nic1--| |--nic1
> nic2--| |--nic2
> VM1(16cpus) | ovs | VM2(16 cpus)
> nic3--| |--nic3
> nic4--| |--nic4
> |-----|
> 2 netperf client threads on each vnic
>
> netperf -H $peer -p $((port+$i)) -t TCP_STREAM -l 60
> netperf -H $peer -p $((port+$i)) -t TCP_RR -l 60 -- -R 1 -r 120,240
> netperf -H $peer -p $((port+$i)) -t TCP_CRR -l 60 -- -R 1 -r 120,240
>
> Mode Iterations Variance Average
>
> TCP_STREAM 10 %3.83 1433 ==> before the change
> TCP_STREAM 10 %3.39 1504 ==> after the change
>
> TCP_RR 10 %2.35 45145 ==> before the change
> TCP_RR 10 %1.06 47250 ==> after the change
>
> TCP_CRR 10 %0.54 11310 ==> before the change
> TCP_CRR 10 %2.64 12741 ==> after the change
>
>
> Considering the size and simplicity of the patch, i would say the
> performance benefit is decent.
>
> Thanks
>
> eddy
>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack
2023-02-19 13:54 ` Simon Horman
` (2 preceding siblings ...)
2023-02-20 3:04 ` Eddy Tao
@ 2023-02-20 7:11 ` Eddy Tao
2023-02-20 10:37 ` Simon Horman
3 siblings, 1 reply; 8+ messages in thread
From: Eddy Tao @ 2023-02-20 7:11 UTC (permalink / raw)
To: Simon Horman
Cc: netdev, Pravin B Shelar, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, dev, linux-kernel
Hi, Simon:
About your concern for the stack size, it leads to more room for
improvement.
I will file a new version which will have smaller stack occupation and
better performance
The new revision is invoked by existing examples of using struct in
stack, in the same file net/openvswitch/datapath.c
struct sw_flow_actions *get_flow_actions(..)
{
struct sw_flow_key masked_key;==> sizeof sw_flow_key is 464 bytes
static noinline_for_stack int
ovs_nla_init_match_and_action(..)
{
struct sw_flow_mask mask;==> sizeof sw_flow_mask is 496 bytes
The first example reminded me, revisiting the code in
ovs_packet_cmd_execute, basically sw_flow serves as a container for
sw_flow_actions and sw_flow_key only.
We do not need the bulk of tunnel info memory in sw_flow, which saves us
200+ bytes further -- less is more.
The new revision will be presented shortly after some sanity and benchmark
eddy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack
2023-02-20 7:11 ` Eddy Tao
@ 2023-02-20 10:37 ` Simon Horman
0 siblings, 0 replies; 8+ messages in thread
From: Simon Horman @ 2023-02-20 10:37 UTC (permalink / raw)
To: Eddy Tao
Cc: netdev, Pravin B Shelar, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, dev, linux-kernel
On Mon, Feb 20, 2023 at 03:11:17PM +0800, Eddy Tao wrote:
> Hi, Simon:
>
>
> About your concern for the stack size, it leads to more room for
> improvement.
>
> I will file a new version which will have smaller stack occupation and
> better performance
>
>
> The new revision is invoked by existing examples of using struct in stack,
> in the same file net/openvswitch/datapath.c
>
> struct sw_flow_actions *get_flow_actions(..)
> {
> struct sw_flow_key masked_key;==> sizeof sw_flow_key is 464 bytes
>
> static noinline_for_stack int
> ovs_nla_init_match_and_action(..)
> {
> struct sw_flow_mask mask;==> sizeof sw_flow_mask is 496 bytes
>
>
> The first example reminded me, revisiting the code in
> ovs_packet_cmd_execute, basically sw_flow serves as a container for
> sw_flow_actions and sw_flow_key only.
>
> We do not need the bulk of tunnel info memory in sw_flow, which saves us
> 200+ bytes further -- less is more.
>
>
> The new revision will be presented shortly after some sanity and benchmark
Thanks, for addressing my review.
It is the stack size issue that is my greatest concern.
Please consider including performance results
in the patch description of v2.
Kind regards,
Simon
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-02-20 10:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-18 6:53 [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack Eddy Tao
2023-02-19 13:54 ` Simon Horman
2023-02-19 14:46 ` Eddy Tao
2023-02-19 14:51 ` Eddy Tao
2023-02-20 3:04 ` Eddy Tao
2023-02-20 3:44 ` [ovs-dev] " Eddy Tao
2023-02-20 7:11 ` Eddy Tao
2023-02-20 10:37 ` Simon Horman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).