archive mirror
 help / color / mirror / Atom feed
From: John Hurley <>
To: Vlad Buslov <>
Cc: Jiri Pirko <>,
	"" <>,
	"" <>,
	"" <>,
	"" <>
Subject: Re: [RFC net-next 0/2] prevent sync issues with hw offload of flower
Date: Thu, 3 Oct 2019 17:59:50 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Thu, Oct 3, 2019 at 5:26 PM Vlad Buslov <> wrote:
> On Thu 03 Oct 2019 at 02:14, John Hurley <> wrote:
> > Hi,
> >
> > Putting this out an RFC built on net-next. It fixes some issues
> > discovered in testing when using the TC API of OvS to generate flower
> > rules and subsequently offloading them to HW. Rules seen contain the same
> > match fields or may be rule modifications run as a delete plus an add.
> > We're seeing race conditions whereby the rules present in kernel flower
> > are out of sync with those offloaded. Note that there are some issues
> > that will need fixed in the RFC before it becomes a patch such as
> > potential races between releasing locks and re-taking them. However, I'm
> > putting this out for comments or potential alternative solutions.
> >
> > The main cause of the races seem to be in the chain table of cls_api. If
> > a tcf_proto is destroyed then it is removed from its chain. If a new
> > filter is then added to the same chain with the same priority and protocol
> > a new tcf_proto will be created - this may happen before the first is
> > fully removed and the hw offload message sent to the driver. In cls_flower
> > this means that the fl_ht_insert_unique() function can pass as its
> > hashtable is associated with the tcf_proto. We are then in a position
> > where the 'delete' and the 'add' are in a race to get offloaded. We also
> > noticed that doing an offload add, then checking if a tcf_proto is
> > concurrently deleting, then remove the offload if it is, can extend the
> > out of order messages. Drivers do not expect to get duplicate rules.
> > However, the kernel TC datapath they are not duplicates so we can get out
> > of sync here.
> >
> > The RFC fixes this by adding a pre_destroy hook to cls_api that is called
> > when a tcf_proto is signaled to be destroyed but before it is removed from
> > its chain (which is essentially the lock for allowing duplicates in
> > flower). Flower then uses this new hook to send the hw delete messages
> > from tcf_proto destroys, preventing them racing with duplicate adds. It
> > also moves the check for 'deleting' to before the sending the hw add
> > message.
> >
> > John Hurley (2):
> >   net: sched: add tp_op for pre_destroy
> >   net: sched: fix tp destroy race conditions in flower
> >
> >  include/net/sch_generic.h |  3 +++
> >  net/sched/cls_api.c       | 29 ++++++++++++++++++++++++-
> >  net/sched/cls_flower.c    | 55 ++++++++++++++++++++++++++---------------------
> >  3 files changed, 61 insertions(+), 26 deletions(-)
> Hi John,
> Thanks for working on this!
> Are there any other sources for race conditions described in this
> letter? When you describe tcf_proto deletion you say "main cause" but
> don't provide any others. If tcf_proto is the only problematic part,

Hi Vlad,
Thanks for the input.
The tcf_proto deletion was the cause from the tests we ran. That's not
to say there are not more I wasn't seeing in my analysis.

> then it might be worth to look into alternative ways to force concurrent
> users to wait for proto deletion/destruction to be properly finished.
> Maybe having some table that maps chain id + prio to completion would be
> simpler approach? With such infra tcf_proto_create() can wait for
> previous proto with same prio and chain to be fully destroyed (including
> offloads) before creating a new one.

I think a problem with this is that the chain removal functions call
tcf_proto_put() (which calls destroy when ref is 0) so, if other
concurrent processes (like a dump) have references to the tcf_proto
then we may not get the hw offload even by the time the chain deletion
function has finished. We would need to make sure this was tracked -
say after the tcf_proto_destroy function has completed.
How would you suggest doing the wait? With a replay flag as happens in
some other places?

To me it seems the main problem is that the tcf_proto being in a chain
almost acts like the lock to prevent duplicates filters getting to the
driver. We need some mechanism to ensure a delete has made it to HW
before we release this 'lock'.

> Regards,
> Vlad

  reply	other threads:[~2019-10-03 17:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-02 23:14 John Hurley
2019-10-02 23:14 ` [RFC net-next 1/2] net: sched: add tp_op for pre_destroy John Hurley
2019-10-02 23:14 ` [RFC net-next 2/2] net: sched: fix tp destroy race conditions in flower John Hurley
2019-10-03 16:18   ` Vlad Buslov
2019-10-03 16:39     ` John Hurley
2019-10-03 16:26 ` [RFC net-next 0/2] prevent sync issues with hw offload of flower Vlad Buslov
2019-10-03 16:59   ` John Hurley [this message]
2019-10-03 17:19     ` Vlad Buslov
2019-10-04 15:39       ` John Hurley
2019-10-04 15:58         ` Vlad Buslov
2019-10-04 16:06           ` John Hurley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \
    --subject='Re: [RFC net-next 0/2] prevent sync issues with hw offload of flower' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).