All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@nvidia.com>
To: Peilin Ye <yepeilin.cs@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Pedro Tammela <pctammela@mojatatu.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Peilin Ye <peilin.ye@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	"Hillf Danton" <hdanton@sina.com>, <netdev@vger.kernel.org>,
	Cong Wang <cong.wang@bytedance.com>
Subject: Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
Date: Thu, 8 Jun 2023 10:48:12 +0300	[thread overview]
Message-ID: <87ttviwfoy.fsf@nvidia.com> (raw)
In-Reply-To: <ZIEqAosXPn8hB1hK@C02FL77VMD6R.googleapis.com>

On Wed 07 Jun 2023 at 18:08, Peilin Ye <yepeilin.cs@gmail.com> wrote:
> On Wed, Jun 07, 2023 at 11:18:32AM +0300, Vlad Buslov wrote:
>> > I also thought about adding the new DELETED-REJECT-NEW-FILTERS flag to
>> > ::state2, but not sure if it's okay to extend it for our purpose.
>>
>> As you described above qdisc->flags is already used to interact with cls
>> api (including changing it dynamically), so I don't see why not.
>
> Sorry, I don't follow, I meant qdisc->state2:
>
>   enum qdisc_state2_t {
>           /* Only for !TCQ_F_NOLOCK qdisc. Never access it directly.
>            * Use qdisc_run_begin/end() or qdisc_is_running() instead.
>            */
>           __QDISC_STATE2_RUNNING,
>   };

Sorry, I misunderstood what you were suggesting. Got it now.

>
> NVM, I think using qdisc->flags after making it atomic sounds better.

Agree.

>
> On Wed, Jun 07, 2023 at 11:18:32AM +0300, Vlad Buslov wrote:
>> > 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
>> > 			      flags, extack);
>> > 	if (err == 0) {
>> > 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
>> > 			       RTM_NEWTFILTER, false, rtnl_held, extack);
>> > 		tfilter_put(tp, fh);
>> > 		/* q pointer is NULL for shared blocks */
>> > 		if (q)
>> > 			q->flags &= ~TCQ_F_CAN_BYPASS;
>> > 	}               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >
>> > TCQ_F_CAN_BYPASS is cleared after e.g. adding a filter to the Qdisc, and it
>> > isn't atomic [1].
>> 
>> Yeah, I see we have already got such behavior in 3f05e6886a59
>> ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters").
>> 
>> > We also have this:
>> >
>> >   ->dequeue()
>> >     htb_dequeue()
>> >       htb_dequeue_tree()
>> >         qdisc_warn_nonwc():
>> >
>> >   void qdisc_warn_nonwc(const char *txt, struct Qdisc *qdisc)
>> >   {
>> >           if (!(qdisc->flags & TCQ_F_WARN_NONWC)) {
>> >                   pr_warn("%s: %s qdisc %X: is non-work-conserving?\n",
>> >                           txt, qdisc->ops->id, qdisc->handle >> 16);
>> >                   qdisc->flags |= TCQ_F_WARN_NONWC;
>> >           }       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >   }
>> >   EXPORT_SYMBOL(qdisc_warn_nonwc);
>> >
>> > Also non-atomic; isn't it possible for the above 2 underlined statements to
>> > race with each other?  If true, I think we need to change Qdisc::flags to
>> > use atomic bitops, just like what we're doing for Qdisc::state and
>> > ::state2.  It feels like a separate TODO, however.
>> 
>> It looks like even though 3f05e6886a59 ("net_sched: unset
>> TCQ_F_CAN_BYPASS when adding filters") was introduced after cls api
>> unlock by now we have these in exactly the same list of supported
>> kernels (5.4 LTS and newer). Considering this, the conversion to the
>> atomic bitops can be done as a standalone fix for cited commit and after
>> it will have been accepted and backported the qdisc fix can just assume
>> that qdisc->flags is an atomic bitops field in all target kernels and
>> use it as-is. WDYT?
>
> Sounds great, how about:
>
>   1. I'll post the non-replay version of the fix (after updating the commit
>      message), and we apply that first, as suggested by Jamal

From my side there are no objections to any of the proposed approaches
since we have never had any users with legitimate use-case where they
need to replace/delete a qdisc concurrently with a filter update, so
returning -EBUSY (or -EAGAIN) to the user in such case would work as
either temporary or the final fix. However, Jakub had reservations with
such approach so don't know where we stand now regarding this.

>
>   2. Make qdisc->flags atomic
>
>   3. Make the fix better by replaying and using the (now atomic)
>      IS-DESTROYING flag with test_and_set_bit() and friends
>
> ?

Again, no objections from my side. Ping me if you need help with any of
these.


  reply	other threads:[~2023-06-08  7:56 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
2023-05-24  1:17 ` [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
2023-05-24 15:37   ` Pedro Tammela
2023-05-24 15:57   ` Jamal Hadi Salim
2023-05-24  1:18 ` [PATCH v5 net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT Peilin Ye
2023-05-24 15:38   ` Pedro Tammela
2023-05-24 15:58     ` Jamal Hadi Salim
2023-05-24  1:19 ` [PATCH v5 net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Peilin Ye
2023-05-24 15:38   ` Pedro Tammela
2023-05-24  1:19 ` [PATCH v5 net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs Peilin Ye
2023-05-24 15:38   ` Pedro Tammela
2023-05-24  1:20 ` [PATCH v5 net 5/6] net/sched: Refactor qdisc_graft() for ingress and " Peilin Ye
2023-05-24  1:20 ` [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Peilin Ye
2023-05-24 15:39   ` Pedro Tammela
2023-05-24 16:09     ` Jamal Hadi Salim
2023-05-25  9:25       ` Paolo Abeni
2023-05-26 12:19         ` Jamal Hadi Salim
2023-05-26 12:20     ` Jamal Hadi Salim
2023-05-26 19:47       ` Jamal Hadi Salim
2023-05-26 20:21         ` Pedro Tammela
2023-05-26 23:09           ` Peilin Ye
2023-05-27  2:33             ` Jakub Kicinski
2023-05-27  8:23               ` Peilin Ye
2023-05-28 18:54                 ` Jamal Hadi Salim
2023-05-29 11:50                   ` Vlad Buslov
2023-05-29 12:58                     ` Vlad Buslov
2023-05-30  1:03                       ` Jakub Kicinski
2023-05-30  9:11                       ` Peilin Ye
2023-05-30 12:18                         ` Vlad Buslov
2023-05-31  0:29                           ` Peilin Ye
2023-06-01  3:57                           ` Peilin Ye
2023-06-01  6:20                             ` Vlad Buslov
2023-06-07  0:57                               ` Peilin Ye
2023-06-07  8:18                                 ` Vlad Buslov
2023-06-08  1:08                                   ` Peilin Ye
2023-06-08  7:48                                     ` Vlad Buslov [this message]
2023-06-11  3:25                                       ` Peilin Ye
2023-06-08  0:39                               ` Peilin Ye
2023-06-08  9:17                                 ` Vlad Buslov
2023-06-10  0:20                                   ` Peilin Ye
2023-06-01 13:03                             ` Pedro Tammela
2023-06-07  4:25                               ` Peilin Ye
2023-05-29 13:55                     ` Jamal Hadi Salim
2023-05-29 19:14                       ` Peilin Ye
2023-05-25 17:16 ` [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Vlad Buslov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttviwfoy.fsf@nvidia.com \
    --to=vladbu@nvidia.com \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=peilin.ye@bytedance.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.