All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
@ 2018-09-03  7:06 Vlad Buslov
  2018-09-03 18:50 ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2018-09-03  7:06 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Action API was changed to work with actions and action_idr in concurrency
safe manner, however tcf_del_walker() still uses actions without taking
reference to them first and deletes them directly, disregarding possible
concurrent delete.

Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
caller to hold reference to action and accepts action id as argument,
instead of direct action pointer.

Modify tcf_action_delete_index() to return ACT_P_DELETED instead of 0 when
action was deleted successfully. This is necessary to allow
tcf_del_walker() to count deleted actions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/act_api.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 398c752ff529..d593114e7d2f 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -246,6 +246,8 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	goto done;
 }
 
+static int tcf_idr_delete_index(struct tcf_idrinfo *idrinfo, u32 index);
+
 static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 			  const struct tc_action_ops *ops)
 {
@@ -263,13 +265,11 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 		goto nla_put_failure;
 
 	idr_for_each_entry_ul(idr, p, id) {
-		ret = __tcf_idr_release(p, false, true);
-		if (ret == ACT_P_DELETED) {
-			module_put(ops->owner);
+		ret = tcf_idr_delete_index(idrinfo, id);
+		if (ret == ACT_P_DELETED)
 			n_i++;
-		} else if (ret < 0) {
+		else if (ret < 0)
 			goto nla_put_failure;
-		}
 	}
 	if (nla_put_u32(skb, TCA_FCNT, n_i))
 		goto nla_put_failure;
@@ -343,7 +343,7 @@ static int tcf_idr_delete_index(struct tcf_idrinfo *idrinfo, u32 index)
 
 			tcf_action_cleanup(p);
 			module_put(owner);
-			return 0;
+			return ACT_P_DELETED;
 		}
 		ret = 0;
 	} else {
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
  2018-09-03  7:06 [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete Vlad Buslov
@ 2018-09-03 18:50 ` Cong Wang
  2018-09-03 20:33   ` Vlad Buslov
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2018-09-03 18:50 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Action API was changed to work with actions and action_idr in concurrency
> safe manner, however tcf_del_walker() still uses actions without taking
> reference to them first and deletes them directly, disregarding possible
> concurrent delete.
>
> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
> caller to hold reference to action and accepts action id as argument,
> instead of direct action pointer.

Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
tcf_dump_walker() already does.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
  2018-09-03 18:50 ` Cong Wang
@ 2018-09-03 20:33   ` Vlad Buslov
  2018-09-04 22:41     ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2018-09-03 20:33 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Mon 03 Sep 2018 at 18:50, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Action API was changed to work with actions and action_idr in concurrency
>> safe manner, however tcf_del_walker() still uses actions without taking
>> reference to them first and deletes them directly, disregarding possible
>> concurrent delete.
>>
>> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
>> caller to hold reference to action and accepts action id as argument,
>> instead of direct action pointer.
>
> Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
> tcf_dump_walker() already does.

Because tcf_del_walker() calls __tcf_idr_release(), which take
idrinfo->lock itself (deadlock). It also calls sleeping functions like
tcf_action_goto_chain_fini(), so just implementing function that
releases action without taking idrinfo->lock is not enough.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
  2018-09-03 20:33   ` Vlad Buslov
@ 2018-09-04 22:41     ` Cong Wang
  2018-09-05  7:03       ` Vlad Buslov
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2018-09-04 22:41 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Mon, Sep 3, 2018 at 1:33 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Mon 03 Sep 2018 at 18:50, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >> Action API was changed to work with actions and action_idr in concurrency
> >> safe manner, however tcf_del_walker() still uses actions without taking
> >> reference to them first and deletes them directly, disregarding possible
> >> concurrent delete.
> >>
> >> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
> >> caller to hold reference to action and accepts action id as argument,
> >> instead of direct action pointer.
> >
> > Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
> > tcf_dump_walker() already does.
>
> Because tcf_del_walker() calls __tcf_idr_release(), which take
> idrinfo->lock itself (deadlock). It also calls sleeping functions like

Deadlock can be easily resolved by moving the lock out.


> tcf_action_goto_chain_fini(), so just implementing function that
> releases action without taking idrinfo->lock is not enough.

Sleeping can be resolved either by making it atomic or
deferring it to a work queue.

None of your arguments here is a blocker to locking
idrinfo->lock. You really should focus on if it is really
necessary to lock idrinfo->lock in tcf_del_walker(), rather
than these details.

For me, if you need idrinfo->lock for dump walker, you must
need it for delete walker too, because deletion is a writer
which should require stronger protection than the dumper,
which merely a reader.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
  2018-09-04 22:41     ` Cong Wang
@ 2018-09-05  7:03       ` Vlad Buslov
  2018-09-05 20:32         ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2018-09-05  7:03 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Tue 04 Sep 2018 at 22:41, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Sep 3, 2018 at 1:33 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Mon 03 Sep 2018 at 18:50, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >> Action API was changed to work with actions and action_idr in concurrency
>> >> safe manner, however tcf_del_walker() still uses actions without taking
>> >> reference to them first and deletes them directly, disregarding possible
>> >> concurrent delete.
>> >>
>> >> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
>> >> caller to hold reference to action and accepts action id as argument,
>> >> instead of direct action pointer.
>> >
>> > Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
>> > tcf_dump_walker() already does.
>>
>> Because tcf_del_walker() calls __tcf_idr_release(), which take
>> idrinfo->lock itself (deadlock). It also calls sleeping functions like
>
> Deadlock can be easily resolved by moving the lock out.
>
>
>> tcf_action_goto_chain_fini(), so just implementing function that
>> releases action without taking idrinfo->lock is not enough.
>
> Sleeping can be resolved either by making it atomic or
> deferring it to a work queue.
>
> None of your arguments here is a blocker to locking
> idrinfo->lock. You really should focus on if it is really
> necessary to lock idrinfo->lock in tcf_del_walker(), rather
> than these details.
>
> For me, if you need idrinfo->lock for dump walker, you must
> need it for delete walker too, because deletion is a writer
> which should require stronger protection than the dumper,
> which merely a reader.

I don't get how it is necessary. Dump walker uses pointers to actions
directly, and in order to be concurrency-safe it must either hold the
lock or obtain reference to action. Note that del walker doesn't use the
action pointer, it only passed action id to tcf_idr_delete_index()
function, which does all the necessary locking and can deal with
potential concurrency issues (concurrent delete, etc.). This approach
also benefits from code reuse from other code paths that delete actions,
instead of implementing its own.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
  2018-09-05  7:03       ` Vlad Buslov
@ 2018-09-05 20:32         ` Cong Wang
  2018-09-06 11:14           ` Vlad Buslov
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2018-09-05 20:32 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Wed, Sep 5, 2018 at 12:05 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Tue 04 Sep 2018 at 22:41, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Mon, Sep 3, 2018 at 1:33 PM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >>
> >> On Mon 03 Sep 2018 at 18:50, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> > On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> >>
> >> >> Action API was changed to work with actions and action_idr in concurrency
> >> >> safe manner, however tcf_del_walker() still uses actions without taking
> >> >> reference to them first and deletes them directly, disregarding possible
> >> >> concurrent delete.
> >> >>
> >> >> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
> >> >> caller to hold reference to action and accepts action id as argument,
> >> >> instead of direct action pointer.
> >> >
> >> > Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
> >> > tcf_dump_walker() already does.
> >>
> >> Because tcf_del_walker() calls __tcf_idr_release(), which take
> >> idrinfo->lock itself (deadlock). It also calls sleeping functions like
> >
> > Deadlock can be easily resolved by moving the lock out.
> >
> >
> >> tcf_action_goto_chain_fini(), so just implementing function that
> >> releases action without taking idrinfo->lock is not enough.
> >
> > Sleeping can be resolved either by making it atomic or
> > deferring it to a work queue.
> >
> > None of your arguments here is a blocker to locking
> > idrinfo->lock. You really should focus on if it is really
> > necessary to lock idrinfo->lock in tcf_del_walker(), rather
> > than these details.
> >
> > For me, if you need idrinfo->lock for dump walker, you must
> > need it for delete walker too, because deletion is a writer
> > which should require stronger protection than the dumper,
> > which merely a reader.
>
> I don't get how it is necessary. Dump walker uses pointers to actions
> directly, and in order to be concurrency-safe it must either hold the

It uses the pointer in a read-only way, what you said doesn't change
the fact that it is a reader. And, like other readers, it may not need
to lock at all, which is a different topic.


> lock or obtain reference to action. Note that del walker doesn't use the
> action pointer, it only passed action id to tcf_idr_delete_index()
> function, which does all the necessary locking and can deal with
> potential concurrency issues (concurrent delete, etc.). This approach
> also benefits from code reuse from other code paths that delete actions,
> instead of implementing its own.

Look at the difference below.

With your change:

idr_for_each_entry_ul{
   spin_lock(&idrinfo->lock);
   idr_remove();
   spin_unlock(&idrinfo->lock);
}

With what I suggest:

spin_lock(&idrinfo->lock);
idr_for_each_entry_ul{
   idr_remove();
}
spin_unlock(&idrinfo->lock);

Isn't a concurrent tcf_idr_check_alloc() able to livelock here with
your change?

idr_for_each_entry_ul{
   spin_lock(&idrinfo->lock);
   idr_remove();
   spin_unlock(&idrinfo->lock);
      // tcf_idr_check_alloc() jumps in,
     // allocates next ID which can be found
      // by idr_get_next_ul()
} // the whole loop goes _literately_ infinite...


Also, idr_for_each_entry_ul() is supposed to be protected either
by RCU or idrinfo->lock, no? With your change or without any change,
it doesn't even have any lock after removing RTNL?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
  2018-09-05 20:32         ` Cong Wang
@ 2018-09-06 11:14           ` Vlad Buslov
  2018-09-06 19:58             ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2018-09-06 11:14 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Wed 05 Sep 2018 at 20:32, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Sep 5, 2018 at 12:05 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Tue 04 Sep 2018 at 22:41, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Mon, Sep 3, 2018 at 1:33 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >>
>> >> On Mon 03 Sep 2018 at 18:50, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >> > On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >> >>
>> >> >> Action API was changed to work with actions and action_idr in concurrency
>> >> >> safe manner, however tcf_del_walker() still uses actions without taking
>> >> >> reference to them first and deletes them directly, disregarding possible
>> >> >> concurrent delete.
>> >> >>
>> >> >> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
>> >> >> caller to hold reference to action and accepts action id as argument,
>> >> >> instead of direct action pointer.
>> >> >
>> >> > Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
>> >> > tcf_dump_walker() already does.
>> >>
>> >> Because tcf_del_walker() calls __tcf_idr_release(), which take
>> >> idrinfo->lock itself (deadlock). It also calls sleeping functions like
>> >
>> > Deadlock can be easily resolved by moving the lock out.
>> >
>> >
>> >> tcf_action_goto_chain_fini(), so just implementing function that
>> >> releases action without taking idrinfo->lock is not enough.
>> >
>> > Sleeping can be resolved either by making it atomic or
>> > deferring it to a work queue.
>> >
>> > None of your arguments here is a blocker to locking
>> > idrinfo->lock. You really should focus on if it is really
>> > necessary to lock idrinfo->lock in tcf_del_walker(), rather
>> > than these details.
>> >
>> > For me, if you need idrinfo->lock for dump walker, you must
>> > need it for delete walker too, because deletion is a writer
>> > which should require stronger protection than the dumper,
>> > which merely a reader.
>>
>> I don't get how it is necessary. Dump walker uses pointers to actions
>> directly, and in order to be concurrency-safe it must either hold the
>
> It uses the pointer in a read-only way, what you said doesn't change
> the fact that it is a reader. And, like other readers, it may not need
> to lock at all, which is a different topic.
>
>
>> lock or obtain reference to action. Note that del walker doesn't use the
>> action pointer, it only passed action id to tcf_idr_delete_index()
>> function, which does all the necessary locking and can deal with
>> potential concurrency issues (concurrent delete, etc.). This approach
>> also benefits from code reuse from other code paths that delete actions,
>> instead of implementing its own.
>
> Look at the difference below.
>
> With your change:
>
> idr_for_each_entry_ul{
>    spin_lock(&idrinfo->lock);
>    idr_remove();
>    spin_unlock(&idrinfo->lock);
> }
>
> With what I suggest:
>
> spin_lock(&idrinfo->lock);
> idr_for_each_entry_ul{
>    idr_remove();
> }
> spin_unlock(&idrinfo->lock);
>
> Isn't a concurrent tcf_idr_check_alloc() able to livelock here with
> your change?
>
> idr_for_each_entry_ul{
>    spin_lock(&idrinfo->lock);
>    idr_remove();
>    spin_unlock(&idrinfo->lock);
>       // tcf_idr_check_alloc() jumps in,
>      // allocates next ID which can be found
>       // by idr_get_next_ul()
> } // the whole loop goes _literately_ infinite...

idr_for_each_entry_ul traverses idr entries with ascending order of
identifiers, so infinite livelock like this is not possible because it
never goes back to newly added entries with id<current_id.

>
>
> Also, idr_for_each_entry_ul() is supposed to be protected either
> by RCU or idrinfo->lock, no? With your change or without any change,
> it doesn't even have any lock after removing RTNL?

After reading this comment I checked actual idr implementation and I
think you are right. Even though idr_for_each_entry_ul() macro (and
function idr_get_next_ul() that it uses to iterate over idr entries)
doesn't specify any locking requirements in comment description (that is
why this patch doesn't use any), its implementation seems to require
external synchronization.

You suggest I should just hold idrinfo->lock for whole del_walker loop
duration, or play nicely with potential concurrent users and
take/release it per action?

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
  2018-09-06 11:14           ` Vlad Buslov
@ 2018-09-06 19:58             ` Cong Wang
  2018-09-07 13:51               ` [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock Vlad Buslov
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2018-09-06 19:58 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Thu, Sep 6, 2018 at 4:14 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> > Isn't a concurrent tcf_idr_check_alloc() able to livelock here with
> > your change?
> >
> > idr_for_each_entry_ul{
> >    spin_lock(&idrinfo->lock);
> >    idr_remove();
> >    spin_unlock(&idrinfo->lock);
> >       // tcf_idr_check_alloc() jumps in,
> >      // allocates next ID which can be found
> >       // by idr_get_next_ul()
> > } // the whole loop goes _literately_ infinite...
>
> idr_for_each_entry_ul traverses idr entries with ascending order of
> identifiers, so infinite livelock like this is not possible because it
> never goes back to newly added entries with id<current_id.

I said "literately infinite", it could go from 1 to UINT_MAX,
sufficient to prove my point of livelock.


> >
> > Also, idr_for_each_entry_ul() is supposed to be protected either
> > by RCU or idrinfo->lock, no? With your change or without any change,
> > it doesn't even have any lock after removing RTNL?
>
> After reading this comment I checked actual idr implementation and I
> think you are right. Even though idr_for_each_entry_ul() macro (and
> function idr_get_next_ul() that it uses to iterate over idr entries)
> doesn't specify any locking requirements in comment description (that is
> why this patch doesn't use any), its implementation seems to require
> external synchronization.

Yeah, it is also a reader, so either a reader lock like RCU or a writer lock
like idrinfo->lock.

>
> You suggest I should just hold idrinfo->lock for whole del_walker loop
> duration, or play nicely with potential concurrent users and
> take/release it per action?

My suggestion is pretty clear, you just missed it, let me copy-n-paste:

With what I suggest:

spin_lock(&idrinfo->lock);
idr_for_each_entry_ul{
   idr_remove();
}
spin_unlock(&idrinfo->lock);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock
  2018-09-06 19:58             ` Cong Wang
@ 2018-09-07 13:51               ` Vlad Buslov
  2018-09-07 19:12                 ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2018-09-07 13:51 UTC (permalink / raw)
  To: netdev, xiyou.wangcong; +Cc: jhs, jiri, davem, Vlad Buslov

Action API was changed to work with actions and action_idr in concurrency
safe manner, however tcf_del_walker() still uses actions without taking a
reference or idrinfo->lock first, and deletes them directly, disregarding
possible concurrent delete.

Add tc_action_wq workqueue to action API. Implement
tcf_idr_release_unsafe() that assumes external synchronization by caller
and delays blocking action cleanup part to tc_action_wq workqueue. Extend
tcf_action_cleanup() with 'async' argument to indicate that function should
free action asynchronously.

Change tcf_del_walker() to take idrinfo->lock while iterating over actions
and use new tcf_idr_release_unsafe() to release them while holding the
lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 include/net/act_api.h |  1 +
 net/sched/act_api.c   | 73 ++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c6f195b3c706..4c5117bc4afb 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -38,6 +38,7 @@ struct tc_action {
 	struct gnet_stats_queue __percpu *cpu_qstats;
 	struct tc_cookie	__rcu *act_cookie;
 	struct tcf_chain	*goto_chain;
+	struct work_struct	work;
 };
 #define tcf_index	common.tcfa_index
 #define tcf_refcnt	common.tcfa_refcnt
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 6f118d62c731..4ad9062c34b3 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -90,13 +90,38 @@ static void free_tcf(struct tc_action *p)
 	kfree(p);
 }
 
-static void tcf_action_cleanup(struct tc_action *p)
+static void tcf_action_free(struct tc_action *p)
+{
+	gen_kill_estimator(&p->tcfa_rate_est);
+	free_tcf(p);
+}
+
+static void tcf_action_free_work(struct work_struct *work)
+{
+	struct tc_action *p = container_of(work,
+					   struct tc_action,
+					   work);
+
+	tcf_action_free(p);
+}
+
+static struct workqueue_struct *tc_action_wq;
+
+static bool tcf_action_queue_work(struct work_struct *work, work_func_t func)
+{
+	INIT_WORK(work, func);
+	return queue_work(tc_action_wq, work);
+}
+
+static void tcf_action_cleanup(struct tc_action *p, bool async)
 {
 	if (p->ops->cleanup)
 		p->ops->cleanup(p);
 
-	gen_kill_estimator(&p->tcfa_rate_est);
-	free_tcf(p);
+	if (async)
+		tcf_action_queue_work(&p->work, tcf_action_free_work);
+	else
+		tcf_action_free(p);
 }
 
 static int __tcf_action_put(struct tc_action *p, bool bind)
@@ -109,7 +134,7 @@ static int __tcf_action_put(struct tc_action *p, bool bind)
 		idr_remove(&idrinfo->action_idr, p->tcfa_index);
 		spin_unlock(&idrinfo->lock);
 
-		tcf_action_cleanup(p);
+		tcf_action_cleanup(p, false);
 		return 1;
 	}
 
@@ -147,6 +172,24 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
 }
 EXPORT_SYMBOL(__tcf_idr_release);
 
+/* Release idr without obtaining idrinfo->lock. Caller must prevent any
+ * concurrent modifications of idrinfo->action_idr!
+ */
+
+static int tcf_idr_release_unsafe(struct tc_action *p)
+{
+	if (atomic_read(&p->tcfa_bindcnt) > 0)
+		return -EPERM;
+
+	if (refcount_dec_and_test(&p->tcfa_refcnt)) {
+		idr_remove(&p->idrinfo->action_idr, p->tcfa_index);
+		tcf_action_cleanup(p, true);
+		return ACT_P_DELETED;
+	}
+
+	return 0;
+}
+
 static size_t tcf_action_shared_attrs_size(const struct tc_action *act)
 {
 	struct tc_cookie *act_cookie;
@@ -262,20 +305,25 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	if (nla_put_string(skb, TCA_KIND, ops->kind))
 		goto nla_put_failure;
 
+	spin_lock(&idrinfo->lock);
 	idr_for_each_entry_ul(idr, p, id) {
-		ret = __tcf_idr_release(p, false, true);
+		ret = tcf_idr_release_unsafe(p);
 		if (ret == ACT_P_DELETED) {
 			module_put(ops->owner);
 			n_i++;
 		} else if (ret < 0) {
-			goto nla_put_failure;
+			goto nla_put_failure_locked;
 		}
 	}
+	spin_unlock(&idrinfo->lock);
+
 	if (nla_put_u32(skb, TCA_FCNT, n_i))
 		goto nla_put_failure;
 	nla_nest_end(skb, nest);
 
 	return n_i;
+nla_put_failure_locked:
+	spin_unlock(&idrinfo->lock);
 nla_put_failure:
 	nla_nest_cancel(skb, nest);
 	return ret;
@@ -341,7 +389,7 @@ static int tcf_idr_delete_index(struct tcf_idrinfo *idrinfo, u32 index)
 						p->tcfa_index));
 			spin_unlock(&idrinfo->lock);
 
-			tcf_action_cleanup(p);
+			tcf_action_cleanup(p, false);
 			module_put(owner);
 			return 0;
 		}
@@ -1713,16 +1761,23 @@ static int __init tc_action_init(void)
 {
 	int err;
 
+	tc_action_wq = alloc_ordered_workqueue("tc_action_workqueue", 0);
+	if (!tc_action_wq)
+		return -ENOMEM;
+
 	err = register_pernet_subsys(&tcf_action_net_ops);
 	if (err)
-		return err;
+		goto err_register_pernet_subsys;
 
 	rtnl_register(PF_UNSPEC, RTM_NEWACTION, tc_ctl_action, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_DELACTION, tc_ctl_action, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_GETACTION, tc_ctl_action, tc_dump_action,
 		      0);
+	return err;
 
-	return 0;
+err_register_pernet_subsys:
+	destroy_workqueue(tc_action_wq);
+	return err;
 }
 
 subsys_initcall(tc_action_init);
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock
  2018-09-07 13:51               ` [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock Vlad Buslov
@ 2018-09-07 19:12                 ` Cong Wang
  2018-09-12  8:50                   ` Vlad Buslov
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2018-09-07 19:12 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Fri, Sep 7, 2018 at 6:52 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Action API was changed to work with actions and action_idr in concurrency
> safe manner, however tcf_del_walker() still uses actions without taking a
> reference or idrinfo->lock first, and deletes them directly, disregarding
> possible concurrent delete.
>
> Add tc_action_wq workqueue to action API. Implement
> tcf_idr_release_unsafe() that assumes external synchronization by caller
> and delays blocking action cleanup part to tc_action_wq workqueue. Extend
> tcf_action_cleanup() with 'async' argument to indicate that function should
> free action asynchronously.

Where exactly is blocking in tcf_action_cleanup()?

>From your code, it looks like free_tcf(), but from my observation,
the only blocking function inside is tcf_action_goto_chain_fini()
which calls __tcf_chain_put(). But, __tcf_chain_put() is blocking
_ONLY_ when tc_chain_notify() is called, for tc action it is never
called.

So, what else is blocking?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock
  2018-09-07 19:12                 ` Cong Wang
@ 2018-09-12  8:50                   ` Vlad Buslov
  2018-09-13 17:13                     ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2018-09-12  8:50 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Fri 07 Sep 2018 at 19:12, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Sep 7, 2018 at 6:52 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Action API was changed to work with actions and action_idr in concurrency
>> safe manner, however tcf_del_walker() still uses actions without taking a
>> reference or idrinfo->lock first, and deletes them directly, disregarding
>> possible concurrent delete.
>>
>> Add tc_action_wq workqueue to action API. Implement
>> tcf_idr_release_unsafe() that assumes external synchronization by caller
>> and delays blocking action cleanup part to tc_action_wq workqueue. Extend
>> tcf_action_cleanup() with 'async' argument to indicate that function should
>> free action asynchronously.
>
> Where exactly is blocking in tcf_action_cleanup()?
>
> From your code, it looks like free_tcf(), but from my observation,
> the only blocking function inside is tcf_action_goto_chain_fini()
> which calls __tcf_chain_put(). But, __tcf_chain_put() is blocking
> _ONLY_ when tc_chain_notify() is called, for tc action it is never
> called.
>
> So, what else is blocking?

__tcf_chain_put() calls tc_chain_tmplt_del(), which calls
ops->tmplt_destroy(). This last function uses hw offload API, which is
blocking.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock
  2018-09-12  8:50                   ` Vlad Buslov
@ 2018-09-13 17:13                     ` Cong Wang
  2018-09-14 10:46                       ` Vlad Buslov
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2018-09-13 17:13 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Wed, Sep 12, 2018 at 1:51 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Fri 07 Sep 2018 at 19:12, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Fri, Sep 7, 2018 at 6:52 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >> Action API was changed to work with actions and action_idr in concurrency
> >> safe manner, however tcf_del_walker() still uses actions without taking a
> >> reference or idrinfo->lock first, and deletes them directly, disregarding
> >> possible concurrent delete.
> >>
> >> Add tc_action_wq workqueue to action API. Implement
> >> tcf_idr_release_unsafe() that assumes external synchronization by caller
> >> and delays blocking action cleanup part to tc_action_wq workqueue. Extend
> >> tcf_action_cleanup() with 'async' argument to indicate that function should
> >> free action asynchronously.
> >
> > Where exactly is blocking in tcf_action_cleanup()?
> >
> > From your code, it looks like free_tcf(), but from my observation,
> > the only blocking function inside is tcf_action_goto_chain_fini()
> > which calls __tcf_chain_put(). But, __tcf_chain_put() is blocking
> > _ONLY_ when tc_chain_notify() is called, for tc action it is never
> > called.
> >
> > So, what else is blocking?
>
> __tcf_chain_put() calls tc_chain_tmplt_del(), which calls
> ops->tmplt_destroy(). This last function uses hw offload API, which is
> blocking.

Good to know.

Can we just make ops->tmplt_destroy() to use workqueue?
Making tc action to workqueue seems overkill, for me.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock
  2018-09-13 17:13                     ` Cong Wang
@ 2018-09-14 10:46                       ` Vlad Buslov
  2018-09-14 20:53                         ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2018-09-14 10:46 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Thu 13 Sep 2018 at 17:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Sep 12, 2018 at 1:51 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Fri 07 Sep 2018 at 19:12, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Fri, Sep 7, 2018 at 6:52 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >> Action API was changed to work with actions and action_idr in concurrency
>> >> safe manner, however tcf_del_walker() still uses actions without taking a
>> >> reference or idrinfo->lock first, and deletes them directly, disregarding
>> >> possible concurrent delete.
>> >>
>> >> Add tc_action_wq workqueue to action API. Implement
>> >> tcf_idr_release_unsafe() that assumes external synchronization by caller
>> >> and delays blocking action cleanup part to tc_action_wq workqueue. Extend
>> >> tcf_action_cleanup() with 'async' argument to indicate that function should
>> >> free action asynchronously.
>> >
>> > Where exactly is blocking in tcf_action_cleanup()?
>> >
>> > From your code, it looks like free_tcf(), but from my observation,
>> > the only blocking function inside is tcf_action_goto_chain_fini()
>> > which calls __tcf_chain_put(). But, __tcf_chain_put() is blocking
>> > _ONLY_ when tc_chain_notify() is called, for tc action it is never
>> > called.
>> >
>> > So, what else is blocking?
>>
>> __tcf_chain_put() calls tc_chain_tmplt_del(), which calls
>> ops->tmplt_destroy(). This last function uses hw offload API, which is
>> blocking.
>
> Good to know.
>
> Can we just make ops->tmplt_destroy() to use workqueue?
> Making tc action to workqueue seems overkill, for me.

How about changing tcf_chain_put_by_act() to use tc_filter_wq, instead
of directly calling __tcf_chain_put()? IMO it is a better solution
because it benefits all classifiers, instead of requiring every
classifier with templates support to implement non-blocking
ops->tmplt_destroy().

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock
  2018-09-14 10:46                       ` Vlad Buslov
@ 2018-09-14 20:53                         ` Cong Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Cong Wang @ 2018-09-14 20:53 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Fri, Sep 14, 2018 at 3:46 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Thu 13 Sep 2018 at 17:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, Sep 12, 2018 at 1:51 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >>
> >> On Fri 07 Sep 2018 at 19:12, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> > On Fri, Sep 7, 2018 at 6:52 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> >>
> >> >> Action API was changed to work with actions and action_idr in concurrency
> >> >> safe manner, however tcf_del_walker() still uses actions without taking a
> >> >> reference or idrinfo->lock first, and deletes them directly, disregarding
> >> >> possible concurrent delete.
> >> >>
> >> >> Add tc_action_wq workqueue to action API. Implement
> >> >> tcf_idr_release_unsafe() that assumes external synchronization by caller
> >> >> and delays blocking action cleanup part to tc_action_wq workqueue. Extend
> >> >> tcf_action_cleanup() with 'async' argument to indicate that function should
> >> >> free action asynchronously.
> >> >
> >> > Where exactly is blocking in tcf_action_cleanup()?
> >> >
> >> > From your code, it looks like free_tcf(), but from my observation,
> >> > the only blocking function inside is tcf_action_goto_chain_fini()
> >> > which calls __tcf_chain_put(). But, __tcf_chain_put() is blocking
> >> > _ONLY_ when tc_chain_notify() is called, for tc action it is never
> >> > called.
> >> >
> >> > So, what else is blocking?
> >>
> >> __tcf_chain_put() calls tc_chain_tmplt_del(), which calls
> >> ops->tmplt_destroy(). This last function uses hw offload API, which is
> >> blocking.
> >
> > Good to know.
> >
> > Can we just make ops->tmplt_destroy() to use workqueue?
> > Making tc action to workqueue seems overkill, for me.
>
> How about changing tcf_chain_put_by_act() to use tc_filter_wq, instead
> of directly calling __tcf_chain_put()? IMO it is a better solution
> because it benefits all classifiers, instead of requiring every
> classifier with templates support to implement non-blocking
> ops->tmplt_destroy().

My point is, there is only one filter implements ops->tmplt_destroy
so far, so there is no reason to just make all filters to adjusted
for this single one. Not to mention actions, actions are innocent
here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-09-15  2:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-03  7:06 [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete Vlad Buslov
2018-09-03 18:50 ` Cong Wang
2018-09-03 20:33   ` Vlad Buslov
2018-09-04 22:41     ` Cong Wang
2018-09-05  7:03       ` Vlad Buslov
2018-09-05 20:32         ` Cong Wang
2018-09-06 11:14           ` Vlad Buslov
2018-09-06 19:58             ` Cong Wang
2018-09-07 13:51               ` [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock Vlad Buslov
2018-09-07 19:12                 ` Cong Wang
2018-09-12  8:50                   ` Vlad Buslov
2018-09-13 17:13                     ` Cong Wang
2018-09-14 10:46                       ` Vlad Buslov
2018-09-14 20:53                         ` Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.