Re: [PATCH net-next] net:sched: add gkprio scheduler

From: Jamal Hadi Salim <jhs@mojatatu.com>
To: Michel Machado <michel@digirati.com.br>,
	Cong Wang <xiyou.wangcong@gmail.com>
Cc: Nishanth Devarajan <ndev2021@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>, David Miller <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Cody Doucette <doucette@bu.edu>
Subject: Re: [PATCH net-next] net:sched: add gkprio scheduler
Date: Sat, 12 May 2018 10:48:59 -0400	[thread overview]
Message-ID: <d429f4a6-4141-2cb2-f67e-036553a9a3dd@mojatatu.com> (raw)
In-Reply-To: <32b58e4e-c06c-0266-e4d9-caca365e46ef@digirati.com.br>

Sorry for the latency..

On 09/05/18 01:37 PM, Michel Machado wrote:
> On 05/09/2018 10:43 AM, Jamal Hadi Salim wrote:
>> On 08/05/18 10:27 PM, Cong Wang wrote:
>>> On Tue, May 8, 2018 at 6:29 AM, Jamal Hadi Salim <jhs@mojatatu.com> 
>>> wrote:

> 
> I like the suggestion of extending skbmod to mark skbprio based on ds. 
> Given that DSprio would no longer depend on the DS field, would you have 
> a name suggestion for this new queue discipline since the name "prio" is 
> currently in use?
> 

Not sure what to call it.
My struggle is still with the intended end goal of the qdisc.
It looks like prio qdisc except for the enqueue part which attempts
to use a shared global queue size for all prios. I would have
pointed to other approaches which use global priority queue pool
which do early congestion detection like RED or variants like GRED but
those use average values of the queue lengths not instantenous values 
such as you do.
I am tempted to say - based on my current understanding - that you dont
need a new qdisc; rather you need to map your dsfields to skbprio
(via skbmod) and stick with prio qdisc. I also think the skbmod
mapping is useful regardless of this need.

> What should be the range of priorities that this new queue discipline 
> would accept? skb->prioriry is of type __u32, but supporting 2^32 
> priorities would require too large of an array to index packets by 
> priority; the DS field is only 6 bits long. Do you have a use case in 
> mind to guide us here?
>

Look at the priomap or prio2band arrangement on prio qdisc
or pfifo_fast qdisc. You take an skbprio as an index into the array
and retrieve a queue to enqueue to. The size of the array is 16.
In the past this was based IIRC on ip precedence + 1 bit. Those map
similarly to DS fields (calls selectors, assured forwarding etc). So
no need to even increase the array beyond current 16.

>> I find the cleverness in changing the highest/low prios confusing.
>> It looks error-prone (I guess that is why there is a BUG check)
>> To the authors: Is there a document/paper on the theory of this thing
>> as to why no explicit queues are "faster"?
> 
> The priority orientation in GKprio is due to two factors: failing safe 
> and elegance. If zero were the highest priority, any operational mistake 
> that leads not-classified packets through GKprio would potentially 
> disrupt the system. We are humans, we'll make mistakes. The elegance 
> aspect comes from the fact that the assigned priority is not massaged to 
> fit the DS field. We find it helpful while inspecting packets on the wire.
> 
> The reason for us to avoid explicit queues in GKprio, which could change 
> the behavior within a given priority, is to closely abide to the 
> expected behavior assumed to prove Theorem 4.1 in the paper "Portcullis: 
> Protecting Connection Setup from Denial-of-Capability Attacks":
> 
> https://dl.acm.org/citation.cfm?id=1282413
> 

Paper seems to be under paywall. Googling didnt help.
My concern is still the science behind this; if you had written up
some test setup which shows how you concluded this was a better
approach at DOS prevention and showed some numbers it would have
helped greatly clarify.

>> 1) I agree that using multiple queues as in prio qdisc would make it
>> more manageable; does not necessarily need to be classful if you
>> use implicit skbprio classification. i.e on equeue use a priority
>> map to select a queue; on dequeue always dequeu from highest prio
>> until it has no more packets to send.
> 
> In my reply to Cong, I point out that there is a technical limitation in 
> the interface of queue disciplines that forbids GKprio to have explicit 
> sub-queues:
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg234201.html
> 
>> 2) Dropping already enqueued packets will not work well for
>> local feedback (__NET_XMIT_BYPASS return code is about the
>> packet that has been dropped from earlier enqueueing because
>> it is lower priority - it does not  signify anything with
>> current skb to which actually just got enqueud).
>> Perhaps (off top of my head) is to always enqueue packets on
>> high priority when their limit is exceeded as long as lower prio has
>> some space. Means youd have to increment low prio accounting if their
>> space is used.
> 
> I don't understand the point you are making here. Could you develop it 
> further?
> 

Sorry - I was meaning NET_XMIT_CN
If you drop an already enqueued packet - it makes sense to signify as
such using NET_XMIT_CN
this does not make sense for forwarded packets but it does
for locally sourced packets.

cheers,
jamal