All of lore.kernel.org
 help / color / mirror / Atom feed
* tc question about ingress bandwidth splitting
@ 2020-03-22 21:56 Philip Prindeville
  2020-03-22 22:59 ` Grant Taylor
                   ` (12 more replies)
  0 siblings, 13 replies; 19+ messages in thread
From: Philip Prindeville @ 2020-03-22 21:56 UTC (permalink / raw)
  To: lartc

Hi all,

I asked around on IRC but no one seems to know the answer, so I thought I’d go to the source…

I have a SoHo router with two physical subnets, which we’ll call “production” (eth0) and “guest” (eth1), and the egress interface “wan” (eth5).

The uplink is G.PON 50/10 mbps.  I’d like to cap the usage on “guest” to 10/2 mbps.  Any unused bandwidth from “guest” goes to “production”.

I thought about marking the traffic coming in off “wan" (the public interface).  Then using HTB to have a 50 mbps cap at the root, and allocating 10mb/s to the child “guest”.  The other sibling would be “production”, and he gets the remaining traffic.

Upstream would be the reverse, marking ingress traffic from “guest” with a separate tag.  Allocating upstream root on “wan” with 10 mbps, and the child “guest” getting 2 mbps.  The remainder goes to the sibling “production”.

Should be straightforward enough, right? (Well, forwarding is more straightforward than traffic terminating on the router itself, I guess… bonus points for getting that right, too.)

I’m hoping that the limiting will work adequately so that the end-to-end path has adequate congestion avoidance happening, and that upstream doesn’t overrun the receiver and cause a lot of packets to be dropped on the last hop (work case of wasted bandwidth).  Not sure if I need special accommodations for bursting or if that would just delay the “settling” of congestion avoidance into steady-state.

Also not sure if ECN is worth marking at this point.  Congestion control is supposed to work better than congestion avoidance, right?

Anyone know what the steps would look like to accomplish the above?

A bunch of people responded, “yeah, I’ve been wanting to do that too…” when I brought up my question, so if I get a good solution I’ll submit a FAQ entry.

Thanks,

-Philip

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
@ 2020-03-22 22:59 ` Grant Taylor
  2020-03-24  6:51 ` Philip Prindeville
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Grant Taylor @ 2020-03-22 22:59 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 6022 bytes --]

On 3/22/20 3:56 PM, Philip Prindeville wrote:
> Hi all,

Hi Philip,

> The uplink is G.PON 50/10 mbps.

Aside:  /Gigabit/ PON serving 50 / 10 Mbps.  ~chuckle~

> I’d like to cap the usage on “guest” to 10/2 mbps.  Any unused 
> bandwidth from “guest” goes to “production”.

Does any of production's unused bandwidth go to guest?  Or is guest hard 
capped at 10 & 2?

> I thought about marking the traffic coming in off “wan" (the public 
> interface).

One of the most important lessons that I remember about QoS is that you 
can only /effectively/ limit what you send.

Read:  You can't limit what is sent down your line to your router.

Further read:  You will receive more down your line than the 10 & 2 that 
you limit guest to, but you can feed guest at 10 & 2.

> Then using HTB to have a 50 mbps cap at the root, and allocating 10mb/s 
> to the child “guest”.  The other sibling would be “production”, 
> and he gets the remaining traffic.
> 
> Upstream would be the reverse, marking ingress traffic from “guest” 
> with a separate tag.  Allocating upstream root on “wan” with 10 
> mbps, and the child “guest” getting 2 mbps.  The remainder goes 
> to the sibling “production”.

It's been 15+ years since I've done much with designing QoS trees.  I'm 
sure that things have changed since the last time I looked at them.

> Should be straightforward enough, right? (Well, forwarding is more 
> straightforward than traffic terminating on the router itself, 
> I guess… bonus points for getting that right, too.)

As they say, the devil is in the details.

Conceptually, it's simple enough.  The the particulars of the execution 
is going to take effort.

> I’m hoping that the limiting will work adequately so that the 
> end-to-end path has adequate congestion avoidance happening, and that 
> upstream doesn’t overrun the receiver and cause a lot of packets to 
> be dropped on the last hop (work case of wasted bandwidth).

(See further read above.)

> Not sure if I need special accommodations for bursting or if that 
> would just delay the “settling” of congestion avoidance into 
> steady-state.

Well, if the connection is a hard 50 & 10, there's nothing that can 
burst over that.

The last time I dealt with bursting, I found that it was a lot of 
effort, for minimal return on said effort.  Further, I was able to get 
quite similar effort by allowing production and guest to use the 
bandwidth that the other didn't use, which was considerably simpler to 
set up.

The bursting I used in the past was bucket based (I don't remember the 
exact QoS term) where the bucket filled at the defined rate, and could 
empty it's contents as fast as it could be taken out.  So if the bucket 
was 5 gallons, then a burst at line rate up to 5 gallons was possible. 
Then it became a matter of how big the bucket needed to be, 5 gallons, 
55 gallons, 1000 gallons, etc.

I found that guaranteeing each class a specific amount of bandwidth and 
allowing the unused bandwidth to be used by other classes simpler and 
just as effective.

Read:  Speed of burst, without the complexity and better (more 
consistent) use of the bandwidth.  Remember, if the bandwidth isn't 
used, it's gone, wasted, so why not let someone use it?

> Also not sure if ECN is worth marking at this point.  Congestion 
> control is supposed to work better than congestion avoidance, right?

If I could relatively easily mark things with ECN, I would.  But I don't 
know how valuable ECN really is.  I've not looked in 10+ years, and the 
last time I did, I didn't find much that was actually utilizing it.

> Anyone know what the steps would look like to accomplish the above?

It is going to be highly dependent on what you want to do and what your 
device is capable of.

I have an idea of what I would do if I were to implement this on a 
standard Linux machine functioning as the router.

1st:  Address the fact that you can only effectively rate limit what you 
send.  So, change the problem so that you rate limit what is sent to 
your router.  I would do this by having the incoming connection go into 
a Network Namespace and a new virtual connection to the main part of the 
router.  This Network Namespace can then easily rate limit what it sends 
to the main part of the router, on a single interface.

              +------------------------+
(Internet)---+-eth5  router  eth{0,1}-+---(LAN)
              +------------------------+

              +--------------------+-------------------------+
(Internet)---+-eth5  NetNS  veth0=|=veth5  router  eth{0,1}-+---(LAN)
              +--------------------+-------------------------+

This has the advantage that the QoS tree in the NetNS only needs to deal 
with sending on one interface, veth0.

This has the added advantage that QoS tree won't be applied to traffic 
between production and guest.  (Or you don't need to make the QoS tree 
/more/ complex to account for this.)

2nd:  Don't worry about bucketing.  Define a minimum that each traffic 
class is guaranteed to get if it uses it.  Then allow the other traffic 
class to use what ever bandwidth the first traffic class did not use.

Why limit guest to 10 Mbps if production is only using 5 Mbps.  That's 
35 Mbps of available download that's wasted.

3rd:  The nature of things, TCP in particular, is to keep bumping into 
the ceiling.  So if you artificially lower the ceiling, traffic coming 
in /will/ go over the limit.  Conversely, the circuit is limited at 50 
Mbps inbound.  That limit is enforced by the ISP.  There is no way that 
the traffic can go over it.

> A bunch of people responded, “yeah, I’ve been wanting to do that 
> too…” when I brought up my question, so if I get a good solution 
> I’ll submit a FAQ entry.

Cool.

> Thanks,

You're welcome.

Good luck.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4013 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
  2020-03-22 22:59 ` Grant Taylor
@ 2020-03-24  6:51 ` Philip Prindeville
  2020-03-24  9:21 ` Marco Gaiarin
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Philip Prindeville @ 2020-03-24  6:51 UTC (permalink / raw)
  To: lartc

Hi Grant,

> On Mar 22, 2020, at 4:59 PM, Grant Taylor <gtaylor@tnetconsulting.net> wrote:
> 
> On 3/22/20 3:56 PM, Philip Prindeville wrote:
>> Hi all,
> 
> Hi Philip,
> 
>> The uplink is G.PON 50/10 mbps.
> 
> Aside:  /Gigabit/ PON serving 50 / 10 Mbps.  ~chuckle~


Well, it’s exactly because it *isn’t* 1Gbps each direction that I need good shaping.  I could get more, but I’d also pay more.


> 
>> I’d like to cap the usage on “guest” to 10/2 mbps.  Any unused bandwidth from “guest” goes to “production”.
> 
> Does any of production's unused bandwidth go to guest?  Or is guest hard capped at 10 & 2?


No.  The idea being that “guest” relies on the kindness of strangers… whereas “production” has a guaranteed SLA of at least 40/8 mbps.


> 
>> I thought about marking the traffic coming in off “wan" (the public interface).
> 
> One of the most important lessons that I remember about QoS is that you can only /effectively/ limit what you send.


Right.  In this case I’m limiting (or pacing) the ACKs so that the sender paces his data.


> 
> Read:  You can't limit what is sent down your line to your router.


For UDP not at all.  For TCP you can apply back pressure, as above.  If the sender has filled his window, and I hold back any ACKs, he can’t send anything more until I do send an ACK.


> 
> Further read:  You will receive more down your line than the 10 & 2 that you limit guest to, but you can feed guest at 10 & 2.


Correct.  Eventually the sender will back off in an attempt to reach a congestion-free steady state.

My scenario, as I said, is a SoHo router.  I don’t have a lot of servers behind it that receive bursts of incoming traffic asynchronously from outside (other than email, which I host locally).

If my daughter decides to watch an HD movie on an iPad during the day while I’m working, I don’t want that traffic overrunning my network and causing me to not be able to work.  In that scenario, the connection is originating internally and going outbound, and it’s long-lived (where "long-lived" is any duration of 20 or more RTT’s).

> 
>> Then using HTB to have a 50 mbps cap at the root, and allocating 10mb/s to the child “guest”.  The other sibling would be “production”, and he gets the remaining traffic.
>> Upstream would be the reverse, marking ingress traffic from “guest” with a separate tag.  Allocating upstream root on “wan” with 10 mbps, and the child “guest” getting 2 mbps.  The remainder goes to the sibling “production”.
> 
> It's been 15+ years since I've done much with designing QoS trees.  I'm sure that things have changed since the last time I looked at them.


Only slightly less for me:  I did a traffic-shaper plugin for Arno’s Internet Firewall (AIF) about 12 years ago.  I’ve since forgotten everything.


> 
>> Should be straightforward enough, right? (Well, forwarding is more straightforward than traffic terminating on the router itself, I guess… bonus points for getting that right, too.)
> 
> As they say, the devil is in the details.
> 
> Conceptually, it's simple enough.  The the particulars of the execution is going to take effort.


Yup.  And I’m hoping to be able to not need ifb to do it.


> 
>> I’m hoping that the limiting will work adequately so that the end-to-end path has adequate congestion avoidance happening, and that upstream doesn’t overrun the receiver and cause a lot of packets to be dropped on the last hop (work case of wasted bandwidth).
> 
> (See further read above.)
> 
>> Not sure if I need special accommodations for bursting or if that would just delay the “settling” of congestion avoidance into steady-state.
> 
> Well, if the connection is a hard 50 & 10, there's nothing that can burst over that.


Sure, for the total.  I meant “guest” bursting over his allotted 10/2 mbps for a short duration, say 600ms (I came up with that as being 5 RTT’s of 120ms).  I figure that’s enough for slow-start to ramp up into steady state…


> 
> The last time I dealt with bursting, I found that it was a lot of effort, for minimal return on said effort.  Further, I was able to get quite similar effort by allowing production and guest to use the bandwidth that the other didn't use, which was considerably simpler to set up.


Well, know you’ve got me confused.  Because if each can borrow from the other, where’s the SLA?  Where’s the cap?  Who gets prioritized?

I could be completely unshaped, and have both borrowing from each other… which is the degenerate case.


> 
> The bursting I used in the past was bucket based (I don't remember the exact QoS term) where the bucket filled at the defined rate, and could empty it's contents as fast as it could be taken out.  So if the bucket was 5 gallons, then a burst at line rate up to 5 gallons was possible. Then it became a matter of how big the bucket needed to be, 5 gallons, 55 gallons, 1000 gallons, etc.
> 
> I found that guaranteeing each class a specific amount of bandwidth and allowing the unused bandwidth to be used by other classes simpler and just as effective.


Yeah, and indeed that’s what HTB excels at.


> 
> Read:  Speed of burst, without the complexity and better (more consistent) use of the bandwidth.  Remember, if the bandwidth isn't used, it's gone, wasted, so why not let someone use it?


Agreed.

Although… in the case of the “guest” network, I don’t ever want it performing better than the hard SLA of 10/2 mbps, or people will complain when they don’t get extra bandwidth.  If they’re conditioned to think that “I’m on the guest network, and 10/2 mbps is all I’m going to get” then they’ll be happy with it and won’t complain.

I don’t want to hear, “well, this was so much better two days ago!”

My answer is, “It’s free.  You’re getting it by someone else’s good graces… be grateful you’re getting anything at all.”


> 
>> Also not sure if ECN is worth marking at this point.  Congestion control is supposed to work better than congestion avoidance, right?
> 
> If I could relatively easily mark things with ECN, I would.  But I don't know how valuable ECN really is.  I've not looked in 10+ years, and the last time I did, I didn't find much that was actually utilizing it.


Some ISPs were actually squashing the bits, and got spanked severely by the FCC.

Also, some older router’s IP stacks were not ECN aware, and had the older bit definitions (remember that RFC 3168 and ECN borrowed the ECT1 bit from TOS/LOWCOST from RFC 791 and 1349).


> 
>> Anyone know what the steps would look like to accomplish the above?
> 
> It is going to be highly dependent on what you want to do and what your device is capable of.


I’m assuming a 3.18 kernel or later and iproute2 + iptables.  Nothing else.  And sch_htb is present.


> 
> I have an idea of what I would do if I were to implement this on a standard Linux machine functioning as the router.
> 
> 1st:  Address the fact that you can only effectively rate limit what you send.  So, change the problem so that you rate limit what is sent to your router.  I would do this by having the incoming connection go into a Network Namespace and a new virtual connection to the main part of the router.  This Network Namespace can then easily rate limit what it sends to the main part of the router, on a single interface.


This is the same problem that ifb solves, right?

I’m not sure I want to assume that Namespaces are available in all scenarios.


> 
>             +------------------------+
> (Internet)---+-eth5  router  eth{0,1}-+---(LAN)
>             +------------------------+
> 
>             +--------------------+-------------------------+
> (Internet)---+-eth5  NetNS  veth0=|=veth5  router  eth{0,1}-+---(LAN)
>             +--------------------+-------------------------+
> 
> This has the advantage that the QoS tree in the NetNS only needs to deal with sending on one interface, veth0.
> 
> This has the added advantage that QoS tree won't be applied to traffic between production and guest.  (Or you don't need to make the QoS tree /more/ complex to account for this.)


Yeah, for now I’m not concerned about internal traffic.  Yet.


> 
> 2nd:  Don't worry about bucketing.  Define a minimum that each traffic class is guaranteed to get if it uses it.  Then allow the other traffic class to use what ever bandwidth the first traffic class did not use.


Agreed.


> 
> Why limit guest to 10 Mbps if production is only using 5 Mbps.  That's 35 Mbps of available download that's wasted.


As I said, I don’t want to have to explain to anyone later that “35mbps might have been available Sunday, but today I’m running Carbonite and it’s hogging all the bandwidth while I download these 10 new VM’s I created this morning, so suck it.”


> 
> 3rd:  The nature of things, TCP in particular, is to keep bumping into the ceiling.  So if you artificially lower the ceiling, traffic coming in /will/ go over the limit.  Conversely, the circuit is limited at 50 Mbps inbound.  That limit is enforced by the ISP.  There is no way that the traffic can go over it.


No, but it can cause other traffic destined to the production network to get dropped, which is the scenario I’m trying to avoid.

As I remember, some of the newer (model-based) congestion avoidance algorithms (like BBR) were really much better at fairness and avoiding dropped packets…


> 
>> A bunch of people responded, “yeah, I’ve been wanting to do that too…” when I brought up my question, so if I get a good solution I’ll submit a FAQ entry.
> 
> Cool.
> 
>> Thanks,
> 
> You're welcome.
> 
> Good luck.


Thanks.

-Philip

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
  2020-03-22 22:59 ` Grant Taylor
  2020-03-24  6:51 ` Philip Prindeville
@ 2020-03-24  9:21 ` Marco Gaiarin
  2020-03-24 17:57 ` Grant Taylor
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Marco Gaiarin @ 2020-03-24  9:21 UTC (permalink / raw)
  To: lartc

Mandi! Philip Prindeville
  In chel di` si favelave...

> > 1st:  Address the fact that you can only effectively rate limit what you send.  So, change the problem so that you rate limit what is sent to your router.  I would do this by having the incoming connection go into a Network Namespace and a new virtual connection to the main part of the router.  This Network Namespace can then easily rate limit what it sends to the main part of the router, on a single interface.
> This is the same problem that ifb solves, right?
> I’m not sure I want to assume that Namespaces are available in all scenarios.

Interesting... i've found:

	https://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/

and i've not understood how can i 'link' phisical interfaces with
vethX.
Using bond? But after that, i need to use ebtales?


ifbX interfaces are very limited by not having connection tracking,
having some 'real' interfaces would be a must!

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (2 preceding siblings ...)
  2020-03-24  9:21 ` Marco Gaiarin
@ 2020-03-24 17:57 ` Grant Taylor
  2020-03-24 18:17 ` Grant Taylor
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Grant Taylor @ 2020-03-24 17:57 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 9182 bytes --]

On 3/24/20 12:51 AM, Philip Prindeville wrote:
> Hi Grant,

Hi,

> Well, it’s exactly because it *isn’t* 1Gbps each direction that I 
> need good shaping.  I could get more, but I’d also pay more.

Fair enough.

> No.  The idea being that “guest” relies on the kindness of strangers… 
> whereas “production” has a guaranteed SLA of at least 40/8 mbps.

QoS has the ability to guarantee an SLA of 40 & 8 to production.

Think about it this way:

1)  Production gets up to it's SLA.
2)  Guest gets up to it's SLA.
3)  Production and / or guest get any unused bandwidth.

Each class is guaranteed their SLA, and can optionally use any remaining 
bandwidth (unused bandwidth of other classes).

> Right.  In this case I’m limiting (or pacing) the ACKs so that the 
> sender paces his data.

That's not what I was referring to.

QoS can rate limit what is sent out the internal interfaces at the 40 / 
10 Mbps values.

The thing that it can not do is rate limit want comes in the outside 
interface.  There may be ~12 Mbps of incoming traffic for guests.  But 
the router will only send 10 Mbps of that out it's inside interface. 
Thus the router is rate limiting what guest receives to 10 Mbps.  It's 
just that there is an additional 2 Mbps that the router is dropping on 
the floor.

Does that make sense?

> For UDP not at all.  For TCP you can apply back pressure, as above. 
>  If the sender has filled his window, and I hold back any ACKs, he 
> can’t send anything more until I do send an ACK.

See above.

It's possible for a router to use QoS to rate limit any type of traffic. 
  The router quite literally receives the traffic on one interface and 
sends it out another interface.  The rate that the traffic is sent is 
what is rate limited.

TCP, UDP, ICMP, it doesn't matter what type of traffic.

> Correct.  Eventually the sender will back off in an attempt to reach 
> a congestion-free steady state.

I would bet that a "congestion-free steady state" is /never/ achieved. 
The very design of most protocols is to send as fast as possible.  When 
they detect errors, they /may/ slow down /for/ /a/ /little/ /while/. 
But they will speed back up.

Even if a given flow could achieve something resembling a 
congestion-free steady state, the nature of Internet traffic is so 
inconsistent that you have flows starting & stopping all the time.  Thus 
you have wildly shifting demands on traffic.

> My scenario, as I said, is a SoHo router.  I don’t have a lot of 
> servers behind it that receive bursts of incoming traffic 
> asynchronously from outside (other than email, which I host 
> locally).

IMHO servers are actually less of a problem than the average person 
surfing the web.

Every single web page you go to is at least one new and short lived 
flow.  Many web pages are 100s of new and short lived flows.  Most of 
them start at about the same time.

The more web surfers you have, the more of these types of traffic 
patterns that you have.  It's also very random when they will happen. 
You could have anywhere between 0 and the number of people on your 
network at the same time.

Also, multiple windows / tabs mean that more and more of these can 
happen at the same time.

> If my daughter decides to watch an HD movie on an iPad during the day 
> while I’m working, I don’t want that traffic overrunning my network 
> and causing me to not be able to work.  In that scenario, the 
> connection is originating internally and going outbound, and it’s 
> long-lived (where "long-lived" is any duration of 20 or more RTT’s).

That's one of the things that QoS is quite good at dealing with.

Though I question how long lived your daughter's streams actually are. 
I know for a fact that YouTube is a series of small downloads.  So each 
download is relatively short lived.  It's not one long lived connection 
that lasts for the duration of the video.

There also the fact that YouTube prefers QUIC, which is UDP based, to 
TCP if it can use it.

> Only slightly less for me:  I did a traffic-shaper plugin for Arno’s 
> Internet Firewall (AIF) about 12 years ago.  I’ve since forgotten 
> everything.

Tempus fugit.

> Yup.  And I’m hoping to be able to not need ifb to do it.

I forgot about ifb.  I think it would do similar to what I was 
suggesting with network namespaces.  Though I do wonder how complicated 
having multiple things in the same namespace will make tc rules.

> Sure, for the total.  I meant “guest” bursting over his allotted 10/2 
> mbps for a short duration, say 600ms (I came up with that as being 5 
> RTT’s of 120ms).  I figure that’s enough for slow-start to ramp up 
> into steady state…

See above comments about steady state.

> Well, know you’ve got me confused.  Because if each can borrow from 
> the other, where’s the SLA?  Where’s the cap?  Who gets prioritized?

I think I explained it above.

Each is guaranteed the availability of it's SLA.  The unused traffic 
over the SLA is (can be) fair game.

Meaning that if production is using 15 & 3, there is 25 & 5 that guest 
could use if allowed to.

Similarly, if guests are sleeping, there is an additional 10 & 2 that 
production could take advantage of.

> I could be completely unshaped, and have both borrowing from each 
> other… which is the degenerate case.

That's why each is guaranteed their SLA *FIRST* and then can use 
whatever is unused *SECOND*.  This allows optimal use of the bandwidth 
while still guaranteeing SLAs.

> Yeah, and indeed that’s what HTB excels at.

Yep.

If memory serves, HTB is one of many that can do it.  But HTB was one of 
the earlier options.

> Agreed.
> 
> Although… in the case of the “guest” network, I don’t ever want it 
> performing better than the hard SLA of 10/2 mbps, or people will 
> complain when they don’t get extra bandwidth.  If they’re conditioned 
> to think that “I’m on the guest network, and 10/2 mbps is all I’m 
> going to get” then they’ll be happy with it and won’t complain.

Okay.

That is a hard policy decision that you are making.  I have no objection 
to that.  It also means that guest doesn't get to borrow unused 
bandwidth from production.

> I don’t want to hear, “well, this was so much better two days ago!”
> 
> My answer is, “It’s free.  You’re getting it by someone else’s good 
> graces… be grateful you’re getting anything at all.”

ACK

> Some ISPs were actually squashing the bits, and got spanked severely 
> by the FCC.

Okay.  I don't recall that.  I wonder why they wanted to stomp on ECN. 
Especially seeing as how ECN is to alert about congestion.  Lack of 
congestion notification encourages additional ramp up.

I'm assuming that ISPs were clearing ECN.  Maybe I have this backwards. 
  Maybe they were artificially setting it to induce slowdowns.

> Also, some older router’s IP stacks were not ECN aware, and had the 
> older bit definitions (remember that RFC 3168 and ECN borrowed the 
> ECT1 bit from TOS/LOWCOST from RFC 791 and 1349).

My experience has been that most routers ignore QoS / ECN.

> I’m assuming a 3.18 kernel or later and iproute2 + iptables.  Nothing 
> else.  And sch_htb is present.

Unfortunately there are a LOT of possible combination in that mix.

I also know that the Ubiquity EdgeRouter Lite uses a 2.6 kernel.  I 
don't know about other EdgeOS (Ubiquity Linux distro).  But I wouldn't 
be surprised to learn that EdgeOS is 2.6 period.

> This is the same problem that ifb solves, right?

Probably.  (See above.)

> I’m not sure I want to assume that Namespaces are available in all 
> scenarios.

Fair enough.

I run old PCs with standard Linux distros as routers.  So I can easily 
add what I want to them.

> Yeah, for now I’m not concerned about internal traffic.  Yet.

That's something to keep in mind when creating the QoS configuration. 
As in you might need to take it into account and make sure that you 
don't artificially slow it down.

> Agreed.
> 
> As I said, I don’t want to have to explain to anyone later that 
> “35mbps might have been available Sunday, but today I’m running 
> Carbonite and it’s hogging all the bandwidth while I download these 
> 10 new VM’s I created this morning, so suck it.”

ACK

> No, but it can cause other traffic destined to the production network 
> to get dropped, which is the scenario I’m trying to avoid.

I understand.

> As I remember, some of the newer (model-based) congestion avoidance 
> algorithms (like BBR) were really much better at fairness and 
> avoiding dropped packets…

My understanding is that BBR is rather aggressive in that it tries to 
identify what the bandwidth is and then use all of that bandwidth that 
it can.  It's particularly aggressive at finding the bandwidth too.

See above comments about the transient nature of flows.

> Thanks.

You're welcome.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4013 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (3 preceding siblings ...)
  2020-03-24 17:57 ` Grant Taylor
@ 2020-03-24 18:17 ` Grant Taylor
  2020-03-26  3:44 ` Philip Prindeville
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Grant Taylor @ 2020-03-24 18:17 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 2041 bytes --]

On 3/24/20 3:21 AM, Marco Gaiarin wrote:
> Interesting... i've found:
> 
> https://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespac
> es/
> 
> and i've not understood how can i 'link' phisical interfaces with 
> vethX.

It depends what you mean by "link".

> Using bond?

I would avoid using a bond with a vEth interface.

> But after that, i need to use ebtales?

Did you mean "bridge"?

ebtables, as in Ethernet Bridging Tables, is associated with bridges.

Bonding is LACP / EtherChannel / etc.

Yes, bridging would be a good choice to have L2 connectivity between the 
Network Namespace and the physical NIC.

You can also use traditional routing between the physical and the vEth NICs.

You can even move the physical NIC into a Network Namespace.

It *REALLY* depends on what you want to do.

Network Namespaces are as powerful as the Linux kernel is.  Meaning that 
you can do just about everything with the network in a network namespace 
that you can do outside of it.  The benefit is that you can have 
multiple network namespaces on the same machine with minimal resources used.

Think about all the things that you can do with virtual machines acting 
as routers (or other servers), but with comparatively no resource 
utilization.

I think about network namespaces as if they are different sets of 
configuration data that the same kernel TCP/IP stack uses.  So the 
resource over head is only what's necessary to hold the different 
network configuration.  (I'm guessing single digit MBs at the most.)

I have had double digits of network namespaces on Raspberry Pis multiple 
times.  No problem.  Getting fat VMs on a Raspberry Pi is problematic 
b/c of resource constraint.

> ifbX interfaces are very limited by not having connection tracking,
> having some 'real' interfaces would be a must!

vEth interfaces are very much so 'real' interfaces.

As are MACVLAN & IPVLAN, other options that are frequently used.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4013 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (4 preceding siblings ...)
  2020-03-24 18:17 ` Grant Taylor
@ 2020-03-26  3:44 ` Philip Prindeville
  2020-03-26  3:53 ` Fwd: " Philip Prindeville
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Philip Prindeville @ 2020-03-26  3:44 UTC (permalink / raw)
  To: lartc

Trimming…


> On Mar 24, 2020, at 11:57 AM, Grant Taylor <gtaylor@tnetconsulting.net> wrote:
> 
> QoS has the ability to guarantee an SLA of 40 & 8 to production.
> 
> Think about it this way:
> 
> 1)  Production gets up to it's SLA.
> 2)  Guest gets up to it's SLA.
> 3)  Production and / or guest get any unused bandwidth.
> 
> Each class is guaranteed their SLA, and can optionally use any remaining bandwidth (unused bandwidth of other classes).


If they’re both oversubscribed, then how does it get divvied up?


> 
>> Right.  In this case I’m limiting (or pacing) the ACKs so that the sender paces his data.
> 
> That's not what I was referring to.
> 
> QoS can rate limit what is sent out the internal interfaces at the 40 / 10 Mbps values.


Sure.


> 
> The thing that it can not do is rate limit want comes in the outside interface.  There may be ~12 Mbps of incoming traffic for guests.  But the router will only send 10 Mbps of that out it's inside interface. Thus the router is rate limiting what guest receives to 10 Mbps.  It's just that there is an additional 2 Mbps that the router is dropping on the floor.
> 
> Does that make sense?


Well, yeah.  That can happen at any time and there’s nothing you can do about it.

I’ve been the target of a DDoS reflection attack and 99% of my traffic was TCP RST’s and ICMP Unreachable… and that’s just what was getting through… not what was being dropped upstream.

Serenity prayer here…


> 
>> For UDP not at all.  For TCP you can apply back pressure, as above.  If the sender has filled his window, and I hold back any ACKs, he can’t send anything more until I do send an ACK.
> 
> See above.
> 
> It's possible for a router to use QoS to rate limit any type of traffic.  The router quite literally receives the traffic on one interface and sends it out another interface.  The rate that the traffic is sent is what is rate limited.
> 
> TCP, UDP, ICMP, it doesn't matter what type of traffic.


Sure.  And by applying controls inside the firewall, you affect the perceived end-to-end properties as seen by the sender.  Which is about the best you can hope for.


> 
>> Correct.  Eventually the sender will back off in an attempt to reach a congestion-free steady state.
> 
> I would bet that a "congestion-free steady state" is /never/ achieved. The very design of most protocols is to send as fast as possible.  When they detect errors, they /may/ slow down /for/ /a/ /little/ /while/. But they will speed back up.


Sure.  Because we use congestion control, and not congestion avoidance.  It’s assumed that we will periodically hit a congested state…


> 
> Even if a given flow could achieve something resembling a congestion-free steady state, the nature of Internet traffic is so inconsistent that you have flows starting & stopping all the time.  Thus you have wildly shifting demands on traffic.


Of course.  For a relatively small SoHo network, it’s better understood, but also more bursty since they’re less statistical smoothing taking place.


> 
>> My scenario, as I said, is a SoHo router.  I don’t have a lot of servers behind it that receive bursts of incoming traffic asynchronously from outside (other than email, which I host locally).
> 
> IMHO servers are actually less of a problem than the average person surfing the web.
> 
> Every single web page you go to is at least one new and short lived flow.  Many web pages are 100s of new and short lived flows.  Most of them start at about the same time.


Although with HTTP 1.1 and pipelining that’s better than the request-per-connection of HTTP 1.0.


> 
> The more web surfers you have, the more of these types of traffic patterns that you have.  It's also very random when they will happen. You could have anywhere between 0 and the number of people on your network at the same time.
> 
> Also, multiple windows / tabs mean that more and more of these can happen at the same time.


Sure.


> 
>> If my daughter decides to watch an HD movie on an iPad during the day while I’m working, I don’t want that traffic overrunning my network and causing me to not be able to work.  In that scenario, the connection is originating internally and going outbound, and it’s long-lived (where "long-lived" is any duration of 20 or more RTT’s).
> 
> That's one of the things that QoS is quite good at dealing with.
> 
> Though I question how long lived your daughter's streams actually are. I know for a fact that YouTube is a series of small downloads.  So each download is relatively short lived.  It's not one long lived connection that lasts for the duration of the video.
> 
> There also the fact that YouTube prefers QUIC, which is UDP based, to TCP if it can use it.


More worried about Netflix, Hulu, and Disney+ which are all TCP-based.  All three (and possibly Amazon Prime, I don’t remember) use HTTP byte-ranges, but reuse the same connection.  So one connection, but bursty fetches…


> 
>> Only slightly less for me:  I did a traffic-shaper plugin for Arno’s Internet Firewall (AIF) about 12 years ago.  I’ve since forgotten everything.
> 
> Tempus fugit.


Indeed.  Although there are moments I wouldn’t recapture even if I could.


> 
>> Well, know you’ve got me confused.  Because if each can borrow from the other, where’s the SLA?  Where’s the cap?  Who gets prioritized?
> 
> I think I explained it above.
> 
> Each is guaranteed the availability of it's SLA.  The unused traffic over the SLA is (can be) fair game.
> 
> Meaning that if production is using 15 & 3, there is 25 & 5 that guest could use if allowed to.
> 
> Similarly, if guests are sleeping, there is an additional 10 & 2 that production could take advantage of.


And if they both want to go over quota… I guess they can compete.


> 
>> Yeah, and indeed that’s what HTB excels at.
> 
> Yep.
> 
> If memory serves, HTB is one of many that can do it.  But HTB was one of the earlier options.


I like HTB because it’s very straightforward to model in simulations, etc.


> 
>> Some ISPs were actually squashing the bits, and got spanked severely by the FCC.
> 
> Okay.  I don't recall that.  I wonder why they wanted to stomp on ECN. Especially seeing as how ECN is to alert about congestion.  Lack of congestion notification encourages additional ramp up.
> 
> I'm assuming that ISPs were clearing ECN.  Maybe I have this backwards.  Maybe they were artificially setting it to induce slowdowns.


No, they were clearing it because they thought they were protecting subscribers with not-up-to-date equipment from being confused by seeing markings they didn’t know how to correctly interpret.

Odd, considering that customer equipment often moves faster than ISP or RBOC’s.  The whole 5 years I was at Cisco, several RBOC’s were still running 12.0S (and insisting on continued support) even as I was writing features for 12.4(T)… And they had only recently migrated off 11.3 Mainline.


> 
>> Also, some older router’s IP stacks were not ECN aware, and had the older bit definitions (remember that RFC 3168 and ECN borrowed the ECT1 bit from TOS/LOWCOST from RFC 791 and 1349).
> 
> My experience has been that most routers ignore QoS / ECN.


That’s typically a configuration issue, and not a question of not having current software.

https://www.reddit.com/r/linux/comments/933vys/is_tcp_ecn_still_a_problem_today/
https://www.bufferbloat.net/projects/cerowrt/wiki/Enable_ECN/

Especially:

https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf

slide 14

Unfortunately most of the surveys on how widely deployed ECN marking in transit networks is, is 12-19 years old.


> 
>> I’m assuming a 3.18 kernel or later and iproute2 + iptables.  Nothing else.  And sch_htb is present.
> 
> Unfortunately there are a LOT of possible combination in that mix.
> 
> I also know that the Ubiquity EdgeRouter Lite uses a 2.6 kernel.  I don't know about other EdgeOS (Ubiquity Linux distro).  But I wouldn't be surprised to learn that EdgeOS is 2.6 period.


Yeah, I was going to work at Ubiquiti on the OS update until they made a salary offer…


> 
>> This is the same problem that ifb solves, right?
> 
> Probably.  (See above.)
> 
>> I’m not sure I want to assume that Namespaces are available in all scenarios.
> 
> Fair enough.
> 
> I run old PCs with standard Linux distros as routers.  So I can easily add what I want to them.


I’m using Supermicro pizza boxes (mostly SYS-5018D’s) that require EFI support…


> 
>> Yeah, for now I’m not concerned about internal traffic.  Yet.
> 
> That's something to keep in mind when creating the QoS configuration. As in you might need to take it into account and make sure that you don't artificially slow it down.


Sure.  Though on Gigabit interfaces, 50mbps is not statistically significant even if I blocked it out…


> 
>> As I remember, some of the newer (model-based) congestion avoidance algorithms (like BBR) were really much better at fairness and avoiding dropped packets…
> 
> My understanding is that BBR is rather aggressive in that it tries to identify what the bandwidth is and then use all of that bandwidth that it can.  It's particularly aggressive at finding the bandwidth too.


I remember people having the same complaint about Reno back in the day…

Ever wake up and realize “I’m old”?  Well, my wife wakes up every day and says to me, “You’re old.”  But not the same thing…

-Philip

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Fwd: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (5 preceding siblings ...)
  2020-03-26  3:44 ` Philip Prindeville
@ 2020-03-26  3:53 ` Philip Prindeville
  2020-03-26 12:50   ` Toke Høiland-Jørgensen
  2020-03-26  4:03 ` Grant Taylor
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 19+ messages in thread
From: Philip Prindeville @ 2020-03-26  3:53 UTC (permalink / raw)
  To: netdev

Had originally posted this to LARTC but realized that “netdev” is probably the better forum.

Was hoping someone familiar with the nuts and bolts of tc and scheduler minutiae could help me come up with a configuration to use as a starting point, then I could tweak it, gather some numbers, make graphs etc, and write a LARTC or LWN article around the findings.

I’d be trying to do shaping in both directions.  Sure, egress shaping is trivial and obviously works.

But I was also thinking about ingress shaping on the last hop, i.e. as traffic flows into the last-hop CPE router, and limiting/delaying it so that the entire end-to-end path is appropriately perceived by the sender, since the effective bandwidth of a [non-multipath] route is the min bandwidth of all individual hops, right?

So that min could be experienced at the final hop before the receiver as delay injected between packets to shape the bitrate.

How far off-base am I?

And what would some tc scripting look like to measure my thesis?



> Begin forwarded message:
> 
> From: Philip Prindeville <philipp_subx@redfish-solutions.com>
> Subject: tc question about ingress bandwidth splitting
> Date: March 22, 2020 at 3:56:46 PM MDT
> To: lartc@vger.kernel.org
> 
> Hi all,
> 
> I asked around on IRC but no one seems to know the answer, so I thought I’d go to the source…
> 
> I have a SoHo router with two physical subnets, which we’ll call “production” (eth0) and “guest” (eth1), and the egress interface “wan” (eth5).
> 
> The uplink is G.PON 50/10 mbps.  I’d like to cap the usage on “guest” to 10/2 mbps.  Any unused bandwidth from “guest” goes to “production”.
> 
> I thought about marking the traffic coming in off “wan" (the public interface).  Then using HTB to have a 50 mbps cap at the root, and allocating 10mb/s to the child “guest”.  The other sibling would be “production”, and he gets the remaining traffic.
> 
> Upstream would be the reverse, marking ingress traffic from “guest” with a separate tag.  Allocating upstream root on “wan” with 10 mbps, and the child “guest” getting 2 mbps.  The remainder goes to the sibling “production”.
> 
> Should be straightforward enough, right? (Well, forwarding is more straightforward than traffic terminating on the router itself, I guess… bonus points for getting that right, too.)
> 
> I’m hoping that the limiting will work adequately so that the end-to-end path has adequate congestion avoidance happening, and that upstream doesn’t overrun the receiver and cause a lot of packets to be dropped on the last hop (work case of wasted bandwidth).  Not sure if I need special accommodations for bursting or if that would just delay the “settling” of congestion avoidance into steady-state.
> 
> Also not sure if ECN is worth marking at this point.  Congestion control is supposed to work better than congestion avoidance, right?
> 
> Anyone know what the steps would look like to accomplish the above?
> 
> A bunch of people responded, “yeah, I’ve been wanting to do that too…” when I brought up my question, so if I get a good solution I’ll submit a FAQ entry.
> 
> Thanks,
> 
> -Philip
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (6 preceding siblings ...)
  2020-03-26  3:53 ` Fwd: " Philip Prindeville
@ 2020-03-26  4:03 ` Grant Taylor
  2020-04-01  9:48 ` Marco Gaiarin
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Grant Taylor @ 2020-03-26  4:03 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 3111 bytes --]

On 3/25/20 9:44 PM, Philip Prindeville wrote:
> If they’re both oversubscribed, then how does it get divvied up?

Production gets their 40 & 8
Guest gets their 10 & 2
There's nothing left over to divvy up further.

> I’ve been the target of a DDoS reflection attack and 99% of my 
> traffic was TCP RST’s and ICMP Unreachable… and that’s just 
> what was getting through… not what was being dropped upstream.

Oy vey!

> Sure.  And by applying controls inside the firewall, you affect the 
> perceived end-to-end properties as seen by the sender.  Which is 
> about the best you can hope for.

Yep.

Though I have wondered about a VPN to a VPS where I could control the 
bulk of what comes in on my wire.  Or at least apply some QoS on the end 
sending to my link.  ;-)

> More worried about Netflix, Hulu, and Disney+ which are all TCP-based. 
> All three (and possibly Amazon Prime, I don’t remember) use HTTP 
> byte-ranges, but reuse the same connection.  So one connection, 
> but bursty fetches…

Fair enough.  I do more consuming with them and less technical analysis.

> And if they both want to go over quota… I guess they can compete.

Nope.  They both get their SLA.  Since there's nothing left over, 
there's nothing to compete for.

> No, they were clearing it because they thought they were protecting 
> subscribers with not-up-to-date equipment from being confused by 
> seeing markings they didn’t know how to correctly interpret.

 >:-|

> Odd, considering that customer equipment often moves faster than 
> ISP or RBOC’s.  The whole 5 years I was at Cisco, several RBOC’s 
> were still running 12.0S (and insisting on continued support) even 
> as I was writing features for 12.4(T)… And they had only recently 
> migrated off 11.3 Mainline.

I can't say as I'm surprised.

> That’s typically a configuration issue, and not a question of not 
> having current software.

ACK

> Unfortunately most of the surveys on how widely deployed ECN marking 
> in transit networks is, is 12-19 years old.

I should inquire of colleagues at $WORK.  Though we tend to focus on QUIC.

> Yeah, I was going to work at Ubiquiti on the OS update until they 
> made a salary offer…

~chuckle~

> I’m using Supermicro pizza boxes (mostly SYS-5018D’s) that require 
> EFI support…

I deployed more than a few Supermicro boxen in my time.

> Sure.  Though on Gigabit interfaces, 50mbps is not statistically 
> significant even if I blocked it out…

I was thinking more about the other 950 Mbps that might not be available 
for Production <-> Guest transfers.  (Assuming you're reaching from Prod 
into Guest through a stateful firewall to copy files or the likes.)

> I remember people having the same complaint about Reno back in 
> the day…

That was a little bit before my time.

> Ever wake up and realize “I’m old”?  Well, my wife wakes up every 
> day and says to me, “You’re old.”  But not the same thing…

Yes on all accounts.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4013 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fwd: tc question about ingress bandwidth splitting
  2020-03-26  3:53 ` Fwd: " Philip Prindeville
@ 2020-03-26 12:50   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-26 12:50 UTC (permalink / raw)
  To: Philip Prindeville, netdev

Philip Prindeville <philipp_subx@redfish-solutions.com> writes:

> Had originally posted this to LARTC but realized that “netdev” is
> probably the better forum.
>
> Was hoping someone familiar with the nuts and bolts of tc and
> scheduler minutiae could help me come up with a configuration to use
> as a starting point, then I could tweak it, gather some numbers, make
> graphs etc, and write a LARTC or LWN article around the findings.
>
> I’d be trying to do shaping in both directions. Sure, egress shaping
> is trivial and obviously works.
>
> But I was also thinking about ingress shaping on the last hop, i.e. as
> traffic flows into the last-hop CPE router, and limiting/delaying it
> so that the entire end-to-end path is appropriately perceived by the
> sender, since the effective bandwidth of a [non-multipath] route is
> the min bandwidth of all individual hops, right?

Indeed, we have been using ingress shaping to combat bufferbloat for
years, and it works quite well (although you may have to set it a few %
lower than your actual line speed). There's even a separate mode in
sch_cake specifically for this purpose.

> So that min could be experienced at the final hop before the receiver
> as delay injected between packets to shape the bitrate.
>
> How far off-base am I?
>
> And what would some tc scripting look like to measure my thesis?

Take a look at sqm-scripts: https://github.com/tohojo/sqm-scripts

It's basically a collection of scripts to setup the kind of bandwidth
shaper you're talking about, with various configuration options. It
is packaged for OpenWrt, but you can also install it on a regular Linux
box.

Now, it doesn't specifically do the kind of guest/production split
you're talking about. However, it does have a script (simple.qos) that
does a three-tier shaping based on different DiffServ markings. If you
start from that, you should be able to change the classification and
bandwidth tiers to suit your purposes.

Having said that, however...

...Are you sure you really need to split bandwidth that way? Usually,
people do this because they don't want the 'guest' traffic to negatively
impact 'their own' usage of the network. But really, with a correctly
de-bloated link, this is much less of an issue than people think. And
with the per-host isolation feature of sch_cake[0], it becomes even less
so.

Not saying you are definitely wrong to pursue this kind of throttling of
your guest network, of course. Just encouraging you to keep an open mind
and test out the other feature first; you may find that it solves your
the problem well enough to be worth the decrease in complexity :)

-Toke


[0] See the section 'To enable Per-Host Isolation' here: https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm-details#making_cake_sing_and_dance_on_a_tight_rope_without_a_safety_net_aka_advanced_features


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (7 preceding siblings ...)
  2020-03-26  4:03 ` Grant Taylor
@ 2020-04-01  9:48 ` Marco Gaiarin
  2020-04-03 22:44 ` Grant Taylor
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Marco Gaiarin @ 2020-04-01  9:48 UTC (permalink / raw)
  To: lartc

Mandi! Grant Taylor
  In chel di` si favelave...

> > Interesting... i've found:
> > https://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/
> > and i've not understood how can i 'link' phisical interfaces with vethX.
> It depends what you mean by "link".

Beh, something similar to what i do now for IFB:

 tc filter add dev eth1 parent ffff: protocol ip prio 50 \
        u32 match ip src 0.0.0.0/0 \
        flowid :1 \
        action mirred egress redirect dev ifb0



> > But after that, i need to use ebtales?
> Did you mean "bridge"?

AH! Sure, i meant 'bridge', not bond, sorry...


> You can also use traditional routing between the physical and the vEth NICs.
> You can even move the physical NIC into a Network Namespace.
> It *REALLY* depends on what you want to do.

I suppose, throw away 'ifb' and use veth in place. ;-)


With 'tc' command above, i 'pipe' ingress to ifb; surely i can create a
'route' between phisical and veth interfaces, but clearly i have to
manage a bit of routing and so on...


Can you provide me some examples? Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (8 preceding siblings ...)
  2020-04-01  9:48 ` Marco Gaiarin
@ 2020-04-03 22:44 ` Grant Taylor
  2020-04-06  9:13 ` Marco Gaiarin
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Grant Taylor @ 2020-04-03 22:44 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

On 4/1/20 3:48 AM, Marco Gaiarin wrote:
> Mandi! Grant Taylor
>    In chel di` si favelave...
> 
> 
> Beh, something similar to what i do now for IFB:
> 
>   tc filter add dev eth1 parent ffff: protocol ip prio 50 \
>          u32 match ip src 0.0.0.0/0 \
>          flowid :1 \
>          action mirred egress redirect dev ifb0

ACK

Thank you.

> AH! Sure, i meant 'bridge', not bond, sorry...

;-)

> I suppose, throw away 'ifb' and use veth in place. ;-)

If it makes sense for the need at hand.

> With 'tc' command above, i 'pipe' ingress to ifb; surely i can create 
> a 'route' between phisical and veth interfaces, but clearly i have 
> to manage a bit of routing and so on...

I'd expect to.

> Can you provide me some examples? Thanks.

Sure?

Add a veth interface (pair), bring the local one up, add an IP & subnet 
to it, enable forwarding.  Then on your remote system, add a route to 
the new veth subnet via the eth0 IP.

The uncertainty above is that I doubt that this is what you're asking.

Please provide a hypothetical topology and I'll describe how it could be 
implemented with network namespaces and veth pairs.  (I don't know if 
you are asking for an ifb alternative or something else.)



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4013 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (9 preceding siblings ...)
  2020-04-03 22:44 ` Grant Taylor
@ 2020-04-06  9:13 ` Marco Gaiarin
  2020-04-13  1:11 ` Grant Taylor
  2020-04-17  9:58 ` Marco Gaiarin
  12 siblings, 0 replies; 19+ messages in thread
From: Marco Gaiarin @ 2020-04-06  9:13 UTC (permalink / raw)
  To: lartc

Mandi! Grant Taylor
  In chel di` si favelave...

> > Can you provide me some examples? Thanks.
> Sure?
> Add a veth interface (pair), bring the local one up, add an IP & subnet to
> it, enable forwarding.  Then on your remote system, add a route to the new
> veth subnet via the eth0 IP.
> The uncertainty above is that I doubt that this is what you're asking.

Local? Remote? really i don't understand...


> Please provide a hypothetical topology and I'll describe how it could be
> implemented with network namespaces and veth pairs.  (I don't know if you
> are asking for an ifb alternative or something else.)

Surely i ask for an IFB alternative...

As stated, all my interfaces in firewall have egress shaped on
the interface itself, while ingress are shaped into the companion ifb
intrfaces. This work, but not effectively as i wakt, because ifb
interfaces have not iptables/connmark and i can use only u32.

Because veth interfaces comes 'in pair', if i can link my WAN
interfaces to a veth pair, i can shape on egress on both interface,
considering that egress of one interface is ingress for the other (eg,
a pipe).


It is possible?

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (10 preceding siblings ...)
  2020-04-06  9:13 ` Marco Gaiarin
@ 2020-04-13  1:11 ` Grant Taylor
  2020-04-17  9:58 ` Marco Gaiarin
  12 siblings, 0 replies; 19+ messages in thread
From: Grant Taylor @ 2020-04-13  1:11 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 2798 bytes --]

On 4/6/20 3:13 AM, Marco Gaiarin wrote:
> Local? Remote? really i don't understand...

Say that there are two physical hosts on the network; A and B.

A is the machine that will use a vEth pair to connect a network 
namespace (container).

Create the network namespace on A.

    A# ip netns add Ans1

Create the vEth pair on A.

    A# ip link add ns1 type veth peer name hostA

Move one end of the vEth pair into the Ans1 network namespace.

    A# ip link set hostA netns Ans1

Bring the vEth pair interfaces up.

    A# ip link set ns1 up
    A# ip netns exec Ans1 ip link set hostA up

Assign IP addresses to the vEth pair.

    A# ip addr add 192.0.2.1/24 dev ns1
    A# ip netns exec Ans1 ip addr add 192.0.2.2/24 dev hostA

Add a default gateway in Ans1

    A# ip netns exec Ans1 ip route add default via 192.0.2.1

Add a route to B telling it how to get to the subnet on A's vEth pair.

    B# ip route add 192.0.2.0/24 via $AsIPaddress

> Because veth interfaces comes 'in pair', if i can link my WAN 
> interfaces to a veth pair, i can shape on egress on both interface, 
> considering that egress of one interface is ingress for the other
> (eg, a pipe).

I have not needed to do the following yet, but this is how I would do it.

I would move the Internet interface into it's a network namespace, 
create a vEth pair between said network namespace and the main / default 
/ unnamed network namespace.

The new network namespace would have it's default out the Internet 
connection, and have a route to the main / default / unnamed network 
namespace and associated networks behind it (home LAN).

The new network namespace can apply all tc rules to it's end of the vEth 
pair for traffic that is sent to the main / default / unnamed network 
namespace.

The main / default / unnamed network namespace can apply all tc rules to 
it's end of the vEth pair for traffic that is going to the Internet.

This:

    |
+--+--+
| WAN |  DHCP
|     |           All one (main / default / unnamed) network namespace.
| LAN |  Static
+--+--+
    |

Would be turned into this:

    |
+--+--+
| WAN |  DHCP
|     |           New network namespace.
| vE0 |  Static
+--+--+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| vE1 |  Static
|     |           Original (main / default / unnamed) network namespace.
| LAN |  Static
+--+--+
    |

Apply tc rules to vE0 for traffic going down.  Apply tc rules to vE1 for 
traffic going up.

> It is possible?

Just about anything is possible.  It's a question of how difficult is it 
and is it reasonable to do so.

Aside:  I think that you could have the vEth pair be unnumbered and use 
interface routes.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4013 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
                   ` (11 preceding siblings ...)
  2020-04-13  1:11 ` Grant Taylor
@ 2020-04-17  9:58 ` Marco Gaiarin
  12 siblings, 0 replies; 19+ messages in thread
From: Marco Gaiarin @ 2020-04-17  9:58 UTC (permalink / raw)
  To: lartc

Mandi! Grant Taylor
  In chel di` si favelave...

> Say that there are two physical hosts on the network; A and B.

I think i'm missing most of the background here; there's some
doc/paper/wiki/... about NS/VETH i can read about?

Still i miss how can i 'link' various interfaces... for example:

>    |
> +--+--+
> | WAN |  DHCP
> |     |           New network namespace.
> | vE0 |  Static
> +--+--+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> | vE1 |  Static
> |     |           Original (main / default / unnamed) network namespace.
> | LAN |  Static
> +--+--+
>    |
> 
> Apply tc rules to vE0 for traffic going down.  Apply tc rules to vE1 for
> traffic going up.

This is exactly what i think to do. But... how 'routing' (in
loose/sparse sense, not literally) works on namespaces?

EG, normally i have public IP address assigned to WAN interface, and
private one assigned to LAN; LAN and WAN are not 'linked' between them,
apart routing and firewall rules.

I don't understand how NS/VETH came into this play... and is surely my
fault!

In my head i suppose that WAN is 'linked' (again, in loose/sparse
sense) to a couple of VETH interfaces as above, and i use vE1 to shape
egress traffic and vE0 to shape egress traffic (that is, ingress for
vE1).


Thanks!

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-23  9:36   ` Marc SCHAEFER
@ 2020-03-23 18:15     ` Philip Prindeville
  0 siblings, 0 replies; 19+ messages in thread
From: Philip Prindeville @ 2020-03-23 18:15 UTC (permalink / raw)
  To: Marc SCHAEFER; +Cc: Gáspár Lajos, netfilter


> On Mar 23, 2020, at 3:36 AM, Marc SCHAEFER <schaefer@alphanet.ch> wrote:
> 
> On Mon, Mar 23, 2020 at 07:47:11AM +0100, Gáspár Lajos wrote:
>> Just a tip: AFAIK, you can only limit your sending bandwith... Everything
>> what you already received is already on your device... :)
> 
> Right, however in case you do not have control on the sending
> device, you can slow down or loose TCP ACKs to slow down
> the opposite direction TCP traffic.


Exactly.  That’s my desire.

The sender calibrates based on 4 observed end-to-end path properties:

1. min path MTU, i.e. the smallest of all per-hop path MTU’s (likely doesn’t come into play here);
2. end-to-end delay, i.e. the summation of all per-hop delays, which we can artificially increase (shape) on the final internal hop to affect sender pacing;
3. end-to-end bandwidth, again the minimum bandwidth for the slowest hop along the path, which we can also artificially increase on the final internal hop to affect sender pacing;
4. end-to-end loss, the product of the reliability of all per-hop lossiness (hopefully we won’t be increasing lossiness, with the last hop having a reliability of 1.0, assuming that delay provides enough back pressure that we don’t need to actually drop any packets);

So the intention is to use (2) and (3) to apply back-pressure to the sender to influence his congestion avoidance.

Hopefully that makes things a little more clear!

So, back to my main question… what do the tc/htb commands look like to configure a tree that’s spread out across multiple (well, two in this case) interfaces?  Is that even possible?

I guess I could do all of the shaping on the “wan” interface, including on ingress… and just use the destination subnet (post NATting, obviously) on ingress, and based on the source subnet or “indev” on egress.

Thanks,

-Philip


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-23  6:47 ` Gáspár Lajos
@ 2020-03-23  9:36   ` Marc SCHAEFER
  2020-03-23 18:15     ` Philip Prindeville
  0 siblings, 1 reply; 19+ messages in thread
From: Marc SCHAEFER @ 2020-03-23  9:36 UTC (permalink / raw)
  To: Gáspár Lajos; +Cc: Philip Prindeville, netfilter

On Mon, Mar 23, 2020 at 07:47:11AM +0100, Gáspár Lajos wrote:
> Just a tip: AFAIK, you can only limit your sending bandwith... Everything
> what you already received is already on your device... :)

Right, however in case you do not have control on the sending
device, you can slow down or loose TCP ACKs to slow down
the opposite direction TCP traffic.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: tc question about ingress bandwidth splitting
  2020-03-22 18:20 Philip Prindeville
@ 2020-03-23  6:47 ` Gáspár Lajos
  2020-03-23  9:36   ` Marc SCHAEFER
  0 siblings, 1 reply; 19+ messages in thread
From: Gáspár Lajos @ 2020-03-23  6:47 UTC (permalink / raw)
  To: Philip Prindeville, netfilter

Hi Philip,


Just a tip: AFAIK, you can only limit your sending bandwith... 
Everything what you already received is already on your device... :)


Cheers,

Lajos


2020. 03. 22. 19:20 keltezéssel, Philip Prindeville írta:
> Hi all,
>
> I asked around on IRC but no one seems to know the answer, so I thought I’d go to the source… Seemed like something Stephen or Eric might be able to answer.
>
> I have a SoHo router with two physical subnets, which we’ll call “production” (eth0) and “guest” (eth1), and the egress interface “wan” (eth5).
>
> The uplink is G.PON 50/10 mbps.  I’d like to cap the usage on “guest” to 10/2 mbps.  Any unused bandwidth from “guest” goes to “production”.
>
> I thought about marking the traffic coming in off “wan" (the public interface).  Then using HTB to have a 50 mbps cap at the root, and allocating 10mb/s to the child “guest”.  The other sibling would be “production”, and he gets the remaining traffic.
>
> Upstream would be the reverse, marking ingress traffic from “guest” with a separate tag.  Allocating upstream root on “wan” with 10 mbps, and the child “guest” getting 2 mbps.  The remainder goes to the sibling “production”.
>
> Should be straightforward enough, right? (Well, forwarding is more straightforward than traffic terminating on the router itself, I guess… bonus points for getting that right, too.)
>
> I’m hoping that the limiting will work adequately so that the end-to-end path has adequate congestion avoidance happening, and that upstream doesn’t overrun the receiver and cause a lot of packets to be dropped on the last hop (work case of wasted bandwidth).  Not sure if I need special accommodations for bursting or if that would just delay the “settling” of congestion avoidance into steady-state.
>
> Also not sure if ECN is worth marking at this point.  Congestion control is supposed to work better than congestion avoidance, right?
>
> Anyone know what the steps would look like to accomplish the above?
>
> A bunch of people responded, “yeah, I’ve been wanting to do that too…” when I brought up my question, so if I get a good solution I’ll submit a FAQ entry to LARC.
>
> Thanks,
>
> -Philip
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* tc question about ingress bandwidth splitting
@ 2020-03-22 18:20 Philip Prindeville
  2020-03-23  6:47 ` Gáspár Lajos
  0 siblings, 1 reply; 19+ messages in thread
From: Philip Prindeville @ 2020-03-22 18:20 UTC (permalink / raw)
  To: netfilter

Hi all,

I asked around on IRC but no one seems to know the answer, so I thought I’d go to the source… Seemed like something Stephen or Eric might be able to answer.

I have a SoHo router with two physical subnets, which we’ll call “production” (eth0) and “guest” (eth1), and the egress interface “wan” (eth5).

The uplink is G.PON 50/10 mbps.  I’d like to cap the usage on “guest” to 10/2 mbps.  Any unused bandwidth from “guest” goes to “production”.

I thought about marking the traffic coming in off “wan" (the public interface).  Then using HTB to have a 50 mbps cap at the root, and allocating 10mb/s to the child “guest”.  The other sibling would be “production”, and he gets the remaining traffic.

Upstream would be the reverse, marking ingress traffic from “guest” with a separate tag.  Allocating upstream root on “wan” with 10 mbps, and the child “guest” getting 2 mbps.  The remainder goes to the sibling “production”.

Should be straightforward enough, right? (Well, forwarding is more straightforward than traffic terminating on the router itself, I guess… bonus points for getting that right, too.)

I’m hoping that the limiting will work adequately so that the end-to-end path has adequate congestion avoidance happening, and that upstream doesn’t overrun the receiver and cause a lot of packets to be dropped on the last hop (work case of wasted bandwidth).  Not sure if I need special accommodations for bursting or if that would just delay the “settling” of congestion avoidance into steady-state.

Also not sure if ECN is worth marking at this point.  Congestion control is supposed to work better than congestion avoidance, right?

Anyone know what the steps would look like to accomplish the above?

A bunch of people responded, “yeah, I’ve been wanting to do that too…” when I brought up my question, so if I get a good solution I’ll submit a FAQ entry to LARC.

Thanks,

-Philip


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-04-17  9:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-22 21:56 tc question about ingress bandwidth splitting Philip Prindeville
2020-03-22 22:59 ` Grant Taylor
2020-03-24  6:51 ` Philip Prindeville
2020-03-24  9:21 ` Marco Gaiarin
2020-03-24 17:57 ` Grant Taylor
2020-03-24 18:17 ` Grant Taylor
2020-03-26  3:44 ` Philip Prindeville
2020-03-26  3:53 ` Fwd: " Philip Prindeville
2020-03-26 12:50   ` Toke Høiland-Jørgensen
2020-03-26  4:03 ` Grant Taylor
2020-04-01  9:48 ` Marco Gaiarin
2020-04-03 22:44 ` Grant Taylor
2020-04-06  9:13 ` Marco Gaiarin
2020-04-13  1:11 ` Grant Taylor
2020-04-17  9:58 ` Marco Gaiarin
  -- strict thread matches above, loose matches on Subject: below --
2020-03-22 18:20 Philip Prindeville
2020-03-23  6:47 ` Gáspár Lajos
2020-03-23  9:36   ` Marc SCHAEFER
2020-03-23 18:15     ` Philip Prindeville

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.