All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Stringer <joe@ovn.org>
To: Andy Zhou <azhou@ovn.org>
Cc: netdev <netdev@vger.kernel.org>, pravin shelar <pshelar@ovn.org>
Subject: Re: [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases
Date: Thu, 9 Mar 2017 11:46:00 -0800	[thread overview]
Message-ID: <CAPWQB7F0XkHWkJ15N-ru83XBUkr2474yjmk0C_H1WErhELr+Ew@mail.gmail.com> (raw)
In-Reply-To: <1488932137-120383-4-git-send-email-azhou@ovn.org>

On 7 March 2017 at 16:15, Andy Zhou <azhou@ovn.org> wrote:
> With the introduction of open flow 'clone' action, the OVS user space
> can now translate the 'clone' action into kernel datapath 'sample'
> action, with 100% probability, to ensure that the clone semantics,
> which is that the packet seen by the clone action is the same as the
> packet seen by the action after clone, is faithfully carried out
> in the datapath.
>
> While the sample action in the datpath has the matching semantics,
> its implementation is only optimized for its original use.
> Specifically, there are two limitation: First, there is a 3 level of
> nesting restriction, enforced at the flow downloading time. This
> limit turns out to be too restrictive for the 'clone' use case.
> Second, the implementation avoid recursive call only if the sample
> action list has a single userspace action.
>
> The optimization implemented in this series removes the static
> nesting limit check, instead, implement the run time recursion limit
> check, and recursion avoidance similar to that of the 'recirc' action.
> This optimization solve both #1 and #2 issues above.
>
> Another optimization implemented is to avoid coping flow key as

*copying

> long as the actions enclosed does not change the flow key. The
> detection is performed only once at the flow downloading time.
>
> The third optimization implemented is to rewrite the action list
> at flow downloading time in order to save the fast path from parsing
> the sample action list in its original form repeatedly.

Whenever there is an enumeration (1, 2, 3; ..another..., third thing
implemented) in a commit message, I have to ask whether each "another
change..." should be a separate patch. It certainly makes it easier to
review.

I ran this through the OVS kernel tests and it's working correctly
from that point of view, but I didn't get a chance to dig in and
ensure for example, runtime behaviour of several nested
sample(actions(sample(actions(sample(actions(output)))))) handles
reasonably when it runs out of stack and deferred actions space. At a
high level though, this seems pretty straightforward.

Several comments below, thanks.

>
> Signed-off-by: Andy Zhou <azhou@ovn.org>
> ---
>  net/openvswitch/actions.c      | 106 ++++++++++++++++++------------------
>  net/openvswitch/datapath.h     |   7 +++
>  net/openvswitch/flow_netlink.c | 118 ++++++++++++++++++++++++++++-------------
>  3 files changed, 140 insertions(+), 91 deletions(-)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 259aea9..2e8c372 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
>  }
>
>  static int sample(struct datapath *dp, struct sk_buff *skb,
> -                 struct sw_flow_key *key, const struct nlattr *attr,
> -                 const struct nlattr *actions, int actions_len)
> +                 struct sw_flow_key *key, const struct nlattr *attr)
>  {
> -       const struct nlattr *acts_list = NULL;
> -       const struct nlattr *a;
> -       int rem;
> -       u32 cutlen = 0;
> -
> -       for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
> -                a = nla_next(a, &rem)) {
> -               u32 probability;
> -
> -               switch (nla_type(a)) {
> -               case OVS_SAMPLE_ATTR_PROBABILITY:
> -                       probability = nla_get_u32(a);
> -                       if (!probability || prandom_u32() > probability)
> -                               return 0;
> -                       break;
> -
> -               case OVS_SAMPLE_ATTR_ACTIONS:
> -                       acts_list = a;
> -                       break;
> -               }
> -       }
> +       struct nlattr *actions;
> +       struct nlattr *sample_arg;
> +       struct sw_flow_key *orig = key;
> +       int rem = nla_len(attr);
> +       int err = 0;
> +       const struct sample_arg *arg;
>
> -       rem = nla_len(acts_list);
> -       a = nla_data(acts_list);
> +       /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */

Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see.

> +       sample_arg = nla_data(attr);

We could do this in the parent call, like several other actions do.

<snip>

> @@ -1246,9 +1227,24 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>                         err = execute_masked_set_action(skb, key, nla_data(a));
>                         break;
>
> -               case OVS_ACTION_ATTR_SAMPLE:
> -                       err = sample(dp, skb, key, a, attr, len);
> +               case OVS_ACTION_ATTR_SAMPLE: {
> +                       bool last = nla_is_last(a, rem);
> +                       struct sk_buff *clone_skb;
> +
> +                       clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC);
> +
> +                       if (!clone_skb)
> +                               /* Out of memory, skip this sample action.
> +                                */

Should we ratelimited complain in this case?

> +                               break;
> +
> +                       err = sample(dp, clone_skb, key, a);
> +                       if (last)
> +                               return err;

Given that sample() doesn't guarantee it will free 'clone_skb', I
don't think this is safe.

> +
> +                       pr_err("executing sample");

Useful for debugging, but don't forget to drop for next version.

>                         break;
> +                       }

This bracket is typically aligned with the character of the beginning
of the 'case' that it applies to.

>
>                 case OVS_ACTION_ATTR_CT:
>                         if (!is_flow_key_valid(key)) {
> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> index 1c6e937..d7a6e803 100644
> --- a/net/openvswitch/datapath.h
> +++ b/net/openvswitch/datapath.h
> @@ -220,4 +220,11 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>         if (logging_allowed && net_ratelimit())                 \
>                 pr_info("netlink: " fmt "\n", ##__VA_ARGS__);   \
>  } while (0)
> +
> +#define OVS_SAMPLE_ATTR_ARG  (OVS_ACTION_ATTR_MAX + 1)

This should be defined in include/uapi/linux/openvswitch.h, protected
by #ifdef __KERNEL__ like the other internal formats. While I don't
think it actually conflicts with any other internal action format,
these should be in one place so we don't end up accidentally using the
same enum twice.

> +struct sample_arg {
> +       bool exec;
> +       u32  probability;
> +};

Perhaps we should move this to either net/openvswitch/flow_netlink.h
or include/linux/openvswitch.h since it's only for consumption
internally by the kernel module?

> +
>  #endif /* datapath.h */
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 6f5fa50..0e7cf13 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2007-2014 Nicira, Inc.
> + * Copyright (c) 2007-2017 Nicira, Inc.
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of version 2 of the GNU General Public
> @@ -59,6 +59,39 @@ struct ovs_len_tbl {
>  #define OVS_ATTR_NESTED -1
>  #define OVS_ATTR_VARIABLE -2
>
> +static bool actions_may_change_flow(const struct nlattr *actions)
> +{
> +       struct nlattr *nla;
> +       int rem;
> +
> +       nla_for_each_nested(nla, actions, rem) {
> +               u16 action = nla_type(nla);
> +
> +               switch (action) {
> +               case OVS_ACTION_ATTR_OUTPUT:
> +               case OVS_ACTION_ATTR_RECIRC:
> +               case OVS_ACTION_ATTR_USERSPACE:
> +               case OVS_ACTION_ATTR_SAMPLE:
> +               case OVS_ACTION_ATTR_TRUNC:

Trunc doesn't change the flow, but if there's something like
sample(output,trunc),output() then I wouldn't expect the final output
to be truncated.

<snip>

> @@ -2621,39 +2670,36 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>         return err;
>  }
>
> -static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
> +static int sample_action_to_attr(const struct nlattr *attr,
> +                                struct sk_buff *skb)
>  {
> -       const struct nlattr *a;
> -       struct nlattr *start;
> -       int err = 0, rem;
> +       struct nlattr *start, *ac_start, *sample_arg;
> +       int err = 0, rem = nla_len(attr);
> +       const struct sample_arg *arg;
> +       struct nlattr *actions;
>
>         start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE);
>         if (!start)
>                 return -EMSGSIZE;
>
> -       nla_for_each_nested(a, attr, rem) {
> -               int type = nla_type(a);
> -               struct nlattr *st_sample;
> +       sample_arg = nla_data(attr);
> +       arg = nla_data(sample_arg);
> +       actions = nla_next(sample_arg, &rem);
>
> -               switch (type) {
> -               case OVS_SAMPLE_ATTR_PROBABILITY:
> -                       if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY,
> -                                   sizeof(u32), nla_data(a)))
> -                               return -EMSGSIZE;
> -                       break;
> -               case OVS_SAMPLE_ATTR_ACTIONS:
> -                       st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
> -                       if (!st_sample)
> -                               return -EMSGSIZE;
> -                       err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb);
> -                       if (err)
> -                               return err;
> -                       nla_nest_end(skb, st_sample);
> -                       break;
> -               }
> -       }
> +       if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability))
> +               return -EMSGSIZE;
>
> +       ac_start = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
> +       if (!ac_start)
> +               return -EMSGSIZE;
> +
> +       err = ovs_nla_put_actions(actions, rem, skb);
> +       if (err)
> +               return err;

Shouldn't we nla_nest_cancel() when something fails in the middle of
nesting nla attributes? For example I believe that upcall will attempt
to serialize actions, but if actions don't fit in the buffer then
it'll just skip the actions and forward everything else.

> +
> +       nla_nest_end(skb, ac_start);
>         nla_nest_end(skb, start);
> +
>         return err;
>  }
>
> --
> 1.8.3.1
>

  reply	other threads:[~2017-03-09 19:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-08  0:15 [RFC net-next sample action optimization 0/3] Andy Zhou
2017-03-08  0:15 ` [RFC net-next sample action optimization 1/3] openvswitch: deferred fifo api change Andy Zhou
2017-03-09 19:08   ` Joe Stringer
2017-03-08  0:15 ` [RFC net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation Andy Zhou
2017-03-09 19:11   ` Joe Stringer
2017-03-10 21:44     ` Andy Zhou
2017-03-08  0:15 ` [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases Andy Zhou
2017-03-09 19:46   ` Joe Stringer [this message]
2017-03-10 22:07     ` Andy Zhou
2017-03-10 23:12       ` Joe Stringer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPWQB7F0XkHWkJ15N-ru83XBUkr2474yjmk0C_H1WErhELr+Ew@mail.gmail.com \
    --to=joe@ovn.org \
    --cc=azhou@ovn.org \
    --cc=netdev@vger.kernel.org \
    --cc=pshelar@ovn.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.