All of lore.kernel.org
 help / color / mirror / Atom feed
* Intro into qdisc writing?
@ 2021-08-10  3:17 Thorsten Glaser
  2021-08-10  8:34 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Thorsten Glaser @ 2021-08-10  3:17 UTC (permalink / raw)
  To: netdev

Hi,

I hope this is the right place to ask this kind of questions,
and not just send patches ☺

I’m currently working with a… network simulator of sorts, which
has so far mostly used htb, netem, dualpi2 and fq_codel to do the
various tricks needed for whatever they do, but now I have rather
specific change requests (one of which I already implemented).

The next things on my list basically involve delaying all traffic
or a subset of traffic for a certain amount of time (in the one‑ to
two-digit millisecond ballpark, so rather long, in CPU time). I’ve
seen the netem source use qdisc_watchdog_schedule_ns for this, but,
unlike the functions I used in my earlier module changes, I cannot
find any documentation for this.

Similarily, is there an intro of sorts for qdisc writing, the things
to know, concepts, locking, whatever is needed?

My background is multi-decade low-level programmer, but so far only
userland, libc variants and bootloaders, not kernel, and what bit of
kernel I touched so far was in BSD land so any pointers welcome.

If it helps: while this is for a customer project, so far everything
coming out of it is published under OSS licences; mostly at
https://github.com/tarent/sch_jens/tree/master/sch_jens as regards
the kernel module (and ../jens/ for the relayfs client example) but
https://github.com/tarent/ECN-Bits has a related userspace project.

Thanks in advance,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

*************************************************

Mit dem tarent-Newsletter nichts mehr verpassen: www.tarent.de/newsletter

*************************************************

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Intro into qdisc writing?
  2021-08-10  3:17 Intro into qdisc writing? Thorsten Glaser
@ 2021-08-10  8:34 ` Eric Dumazet
  2021-08-10 10:22   ` Jesper Dangaard Brouer
  2021-10-18 18:06   ` Thorsten Glaser
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2021-08-10  8:34 UTC (permalink / raw)
  To: Thorsten Glaser, netdev



On 8/10/21 5:17 AM, Thorsten Glaser wrote:
> Hi,
> 
> I hope this is the right place to ask this kind of questions,
> and not just send patches ☺
> 
> I’m currently working with a… network simulator of sorts, which
> has so far mostly used htb, netem, dualpi2 and fq_codel to do the
> various tricks needed for whatever they do, but now I have rather
> specific change requests (one of which I already implemented).
> 
> The next things on my list basically involve delaying all traffic
> or a subset of traffic for a certain amount of time (in the one‑ to
> two-digit millisecond ballpark, so rather long, in CPU time). I’ve
> seen the netem source use qdisc_watchdog_schedule_ns for this, but,
> unlike the functions I used in my earlier module changes, I cannot
> find any documentation for this.
> 
> Similarily, is there an intro of sorts for qdisc writing, the things
> to know, concepts, locking, whatever is needed?
> 
> My background is multi-decade low-level programmer, but so far only
> userland, libc variants and bootloaders, not kernel, and what bit of
> kernel I touched so far was in BSD land so any pointers welcome.
> 
> If it helps: while this is for a customer project, so far everything
> coming out of it is published under OSS licences; mostly at
> https://github.com/tarent/sch_jens/tree/master/sch_jens as regards
> the kernel module (and ../jens/ for the relayfs client example) but
> https://github.com/tarent/ECN-Bits has a related userspace project.
> 
> Thanks in advance,
> //mirabilos
> 

Instead of writing a new qdisc, you could simply use FQ packet scheduler,
and a eBPF program adjusting skb->tstamp depending on your needs.

https://legacy.netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Intro into qdisc writing?
  2021-08-10  8:34 ` Eric Dumazet
@ 2021-08-10 10:22   ` Jesper Dangaard Brouer
  2021-08-10 16:36     ` Thorsten Glaser
  2021-10-18 18:06   ` Thorsten Glaser
  1 sibling, 1 reply; 5+ messages in thread
From: Jesper Dangaard Brouer @ 2021-08-10 10:22 UTC (permalink / raw)
  To: Eric Dumazet, Thorsten Glaser, netdev; +Cc: brouer


On 10/08/2021 10.34, Eric Dumazet wrote:
> On 8/10/21 5:17 AM, Thorsten Glaser wrote:
>> Hi,
>>
>> I hope this is the right place to ask this kind of questions,
>> and not just send patches ☺
>>
>> I’m currently working with a… network simulator of sorts, which
>> has so far mostly used htb, netem, dualpi2 and fq_codel to do the
>> various tricks needed for whatever they do, but now I have rather
>> specific change requests (one of which I already implemented).
>>
>> The next things on my list basically involve delaying all traffic
>> or a subset of traffic for a certain amount of time (in the one‑ to
>> two-digit millisecond ballpark, so rather long, in CPU time). I’ve
>> seen the netem source use qdisc_watchdog_schedule_ns for this, but,
>> unlike the functions I used in my earlier module changes, I cannot
>> find any documentation for this.
>>
>> Similarily, is there an intro of sorts for qdisc writing, the things
>> to know, concepts, locking, whatever is needed?
>>
>> My background is multi-decade low-level programmer, but so far only
>> userland, libc variants and bootloaders, not kernel, and what bit of
>> kernel I touched so far was in BSD land so any pointers welcome.
>>
>> If it helps: while this is for a customer project, so far everything
>> coming out of it is published under OSS licences; mostly at
>> https://github.com/tarent/sch_jens/tree/master/sch_jens as regards
>> the kernel module (and ../jens/ for the relayfs client example) but
>> https://github.com/tarent/ECN-Bits has a related userspace project.
>>
>> Thanks in advance,
>> //mirabilos
>>
> Instead of writing a new qdisc, you could simply use FQ packet scheduler,
> and a eBPF program adjusting skb->tstamp depending on your needs.
>
> https://legacy.netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF


Good link and reference.

If you want to see some code doing this via BPF see: 
https://github.com/xdp-project/bpf-examples/blob/master/traffic-pacing-edt/

I've unfortunately not had time to document the 'traffic-pacing-edt' 
code, as this was done under time pressure, for solving a production 
problem at an ISP. They needed packet pacing or transmission smoothing 
due to switches with too small buffers to handle bursts, but as close to 
1Gbit/s as possible as they sold 1G to their customers.

The comments in the code and scripts should hopefully be enough for you 
to understand the concept. Eric's slides describe the overall concept 
and background.


The main code you want to look at is in 'edt_pacer_vlan.c' [1], but 
notice that is assumes it have lockless access to the datastructure. 
This assumption is only true because the XDP-prog in 'xdp_cpumap_qinq.c' 
[2] moves packets associated with the datastructure to the right CPU 
(and invokes/starts the normal networks stack on that CPU).


[1] 
https://github.com/xdp-project/bpf-examples/blob/master/traffic-pacing-edt/edt_pacer_vlan.c

[2] 
https://github.com/xdp-project/bpf-examples/blob/master/traffic-pacing-edt/xdp_cpumap_qinq.c


--Jesper



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Intro into qdisc writing?
  2021-08-10 10:22   ` Jesper Dangaard Brouer
@ 2021-08-10 16:36     ` Thorsten Glaser
  0 siblings, 0 replies; 5+ messages in thread
From: Thorsten Glaser @ 2021-08-10 16:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Eric Dumazet, netdev, brouer

On Tue, 10 Aug 2021, Jesper Dangaard Brouer wrote:

> > Instead of writing a new qdisc, you could simply use FQ packet scheduler,
> > and a eBPF program adjusting skb->tstamp depending on your needs.

Hmm, this opens another magnitude of complexity I’m not sure I’m ready
to tackle right now. So let me explain the specific scenarios more (I
had hoped to get more general advice first):

There are two operation modes. One uses netem to limit bandwidth and
introduce latency. The other, which I’ve been working on, uses htb to
limit bandwidth (not my part until now) with a sub-qdisc fq_codel to
do ECN CE marking. (As you might have seen from the other link, those
ECN markings are what we’re actually after, not so much the traffic
behaviour.) I forked fq_codel into something else to change the way it
does the ECN marking, to match the scenario we’re modelling more closely.

There’s a controlling application running on the router which sets up
these qdiscs. It also changes the htb qdisc to increase or reduce the
bandwidth available to the fq_codel fork, currently every second or so,
but the goal is to do this every 10ms or so (that’s a ton of tc(8) in‐
vocations, I know…); this is, again, not my department.

There are two more specific delays to be introduced now. I think they
need to be introduced to both scenarios (and I was told netem doesn’t
play with fq_codel; I’m not sure if the netem scenario also uses htb to
control bandwidth or not).

One is that “on command” all traffic needs to stop for, say 20, 30 ms.
I was thinking of adding a flag to htb that, when set on tc, does this
(oneshot), since tc is called to reconfigure htb often enough anyway.
This doesn’t happen often, maybe once every few minutes.

The other is a running counter over the packets sent out, and every
n-th packet must be delayed by an additional x ms. Every n*n-th packet
by 2*x ms even. I was considering putting those aside, sending out the
next packets that arrive, or if none, returning NULL from the dequeue
function, but when I do that I need to be called again in x ms, so I
need the watchdog, right?

So this is all very static, and I’m familiar enough with C to implement
things there, but not with BPF let alone Linux’ eBPF. I’m also not sure
how to get the “on command” thing done there.

Perhaps the entire “playlist” of network behaviour could be moved there,
but there is, again, much more involved than I am even familiar with or
told about. There is, at least, limiting bandwidth, then either introducing
latency (with jitter) or doing the ECN marking, then introducing additional
“dead times” for individual packets or all traffic, and there probably will
be more. We’re approaching this piece by piece as we’re learning about the
to-be-modelled environment, which in itself is *also* still under develop‐
ment (with feedback from the simulator to the environment as well). I’m
just the one C guy involved ☺

> > https://legacy.netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF

> If you want to see some code doing this via BPF see:
> https://github.com/xdp-project/bpf-examples/blob/master/traffic-pacing-edt/

> The comments in the code and scripts should hopefully be enough for you to
> understand the concept. Eric's slides describe the overall concept and
> background.

I’ll definitely look into them, but…

> > > Similarily, is there an intro of sorts for qdisc writing, the things
> > > to know, concepts, locking, whatever is needed?

… is there something more general for starters?

Ah Eric, did you even see my earlier mail about the fq_codel undocumented
flag? I wrote up what I could gather from the code and some websites about
fq_codel and documented that (as part of documenting the changed version)
in https://github.com/tarent/sch_jens/blob/master/man/man8/tc-jens.8 and
was thinking of isolating the fq_codel part and submitting it as replacement
for the current tc-fq_codel(8) manpage; review of that (for correctness of
the documentation) would be welcome if you’re available…

//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

*************************************************

Mit dem tarent-Newsletter nichts mehr verpassen: www.tarent.de/newsletter

*************************************************

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Intro into qdisc writing?
  2021-08-10  8:34 ` Eric Dumazet
  2021-08-10 10:22   ` Jesper Dangaard Brouer
@ 2021-10-18 18:06   ` Thorsten Glaser
  1 sibling, 0 replies; 5+ messages in thread
From: Thorsten Glaser @ 2021-10-18 18:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Tue, 10 Aug 2021, Eric Dumazet wrote:

> https://legacy.netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF

This apparently landed in 4.20, I need to support buster (4.19) as well
though, so (independent of the other concerns) it’s out.

bye,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

                        ****************************************************
/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against      Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also,     https://www.tarent.de/newsletter
╱ ╲ header encryption!
                        ****************************************************

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-18 18:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10  3:17 Intro into qdisc writing? Thorsten Glaser
2021-08-10  8:34 ` Eric Dumazet
2021-08-10 10:22   ` Jesper Dangaard Brouer
2021-08-10 16:36     ` Thorsten Glaser
2021-10-18 18:06   ` Thorsten Glaser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.