[RFC] implicit per-namespace devlink instance to set kernel resource limitations

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] implicit per-namespace devlink instance to set kernel resource limitations
@ 2019-08-06 16:40 Jiri Pirko
  2019-08-06 17:38 ` David Ahern
  2019-08-06 18:27 ` Jakub Kicinski
  0 siblings, 2 replies; 14+ messages in thread
From: Jiri Pirko @ 2019-08-06 16:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, dsahern, mlxsw, jakub.kicinski, andrew, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

Hi all.

I just discussed this with DavidA and I would like to bring this to
broader audience. David wants to limit kernel resources in network
namespaces, for example fibs, fib rules, etc.

He claims that devlink api is rich enough to program this limitations
as it already does for mlxsw hw resources for example. If we have this
api for hardware, why don't to reuse it for the kernel and it's
resources too?

So the proposal is to have some new device, say "kernelnet", that would
implicitly create per-namespace devlink instance. This devlink
instance would be used to setup resource limits. Like:

devlink resource set kernelnet path /IPv4/fib size 96
devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8

To me it sounds a bit odd for kernel namespace to act as a device, but
thinking about it more, it makes sense. Probably better than to define
a new api. User would use the same tool to work with kernel and hw.

Also we can implement other devlink functionality, like dpipe.
User would then have visibility of network pipeline, tables,
utilization, etc. It is related to the resources too.

What do you think?

Jiri

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-06 16:40 [RFC] implicit per-namespace devlink instance to set kernel resource limitations Jiri Pirko
@ 2019-08-06 17:38 ` David Ahern
  2019-08-06 18:03   ` Andrew Lunn
  2019-08-06 18:27 ` Jakub Kicinski
  1 sibling, 1 reply; 14+ messages in thread
From: David Ahern @ 2019-08-06 17:38 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, mlxsw, jakub.kicinski, andrew, f.fainelli, vivien.didelot,
	mkubecek, stephen, daniel, brouer, eric.dumazet

On 8/6/19 10:40 AM, Jiri Pirko wrote:
> Hi all.
> 
> I just discussed this with DavidA and I would like to bring this to
> broader audience. David wants to limit kernel resources in network
> namespaces, for example fibs, fib rules, etc.
> 
> He claims that devlink api is rich enough to program this limitations
> as it already does for mlxsw hw resources for example. If we have this
> api for hardware, why don't to reuse it for the kernel and it's
> resources too?

The analogy is that a kernel is 'programmed' just like hardware, it has
resources just like hardware (e.g., memory) and those resources are
limited as well. So the resources consumed by fib entries, rules,
nexthops, etc should be controllable.

> 
> So the proposal is to have some new device, say "kernelnet", that would
> implicitly create per-namespace devlink instance. This devlink
> instance would be used to setup resource limits. Like:
> 
> devlink resource set kernelnet path /IPv4/fib size 96
> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8
> 
> To me it sounds a bit odd for kernel namespace to act as a device, but
> thinking about it more, it makes sense. Probably better than to define
> a new api. User would use the same tool to work with kernel and hw.
> 
> Also we can implement other devlink functionality, like dpipe.
> User would then have visibility of network pipeline, tables,
> utilization, etc. It is related to the resources too.
> 
> What do you think?
> 
> Jiri
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-06 17:38 ` David Ahern
@ 2019-08-06 18:03   ` Andrew Lunn
  2019-08-07  2:33     ` David Ahern
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Lunn @ 2019-08-06 18:03 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, jakub.kicinski, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

On Tue, Aug 06, 2019 at 11:38:32AM -0600, David Ahern wrote:
> On 8/6/19 10:40 AM, Jiri Pirko wrote:
> > Hi all.
> > 
> > I just discussed this with DavidA and I would like to bring this to
> > broader audience. David wants to limit kernel resources in network
> > namespaces, for example fibs, fib rules, etc.
> > 
> > He claims that devlink api is rich enough to program this limitations
> > as it already does for mlxsw hw resources for example. If we have this
> > api for hardware, why don't to reuse it for the kernel and it's
> > resources too?
> 
> The analogy is that a kernel is 'programmed' just like hardware, it has
> resources just like hardware (e.g., memory) and those resources are
> limited as well. So the resources consumed by fib entries, rules,
> nexthops, etc should be controllable.

I expect one question that will come up is why not control
groups. That is often used by the rest of the kernel for resource
control.

But cgroups are mostly about limiting resources for a collection of
processes. I don't think that is true for networking resources. The
resources we are talking about are orthogonal to processes. Or are
there any resources which should be linked to processes? eBPF
resources?

> > So the proposal is to have some new device, say "kernelnet", that would
> > implicitly create per-namespace devlink instance.

Maybe kernelns, to make it clear we are talking about namespace
resources.

Going back to cgroups concept. They are generally hierarchical. Do we
need any sort of hierarchy here? Are there some resources we want to
set a global limit on, and then a per namespace limit on top of that?
We would then need two names, and kernelnet sounds more like the
global level?

       Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-06 16:40 [RFC] implicit per-namespace devlink instance to set kernel resource limitations Jiri Pirko
  2019-08-06 17:38 ` David Ahern
@ 2019-08-06 18:27 ` Jakub Kicinski
  2019-08-06 18:38   ` Jiri Pirko
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2019-08-06 18:27 UTC (permalink / raw)
  To: Jiri Pirko, dsahern
  Cc: netdev, davem, mlxsw, andrew, f.fainelli, vivien.didelot,
	mkubecek, stephen, daniel, brouer, eric.dumazet

On Tue, 6 Aug 2019 18:40:36 +0200, Jiri Pirko wrote:
> Hi all.
> 
> I just discussed this with DavidA and I would like to bring this to
> broader audience. David wants to limit kernel resources in network
> namespaces, for example fibs, fib rules, etc.
> 
> He claims that devlink api is rich enough to program this limitations
> as it already does for mlxsw hw resources for example. 

TBH I don't see how you changed anything to do with FIB notifications,
so the fact that the accounting is off now is a bit confusing. I don't
understand how devlink, FIB and namespaces mix :(

> If we have this api for hardware, why don't to reuse it for the
> kernel and it's resources too?

IMHO the netdevsim use of this API is a slight abuse, to prove the
device can fail the FIB changes, nothing more..

> So the proposal is to have some new device, say "kernelnet", that
> would implicitly create per-namespace devlink instance. This devlink
> instance would be used to setup resource limits. Like:
> 
> devlink resource set kernelnet path /IPv4/fib size 96
> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8
> 
> To me it sounds a bit odd for kernel namespace to act as a device, but
> thinking about it more, it makes sense. Probably better than to define
> a new api. User would use the same tool to work with kernel and hw.
> 
> Also we can implement other devlink functionality, like dpipe.
> User would then have visibility of network pipeline, tables,
> utilization, etc. It is related to the resources too.
> 
> What do you think?

I'm no expert here but seems counter intuitive that device tables would
be aware of namespaces in the first place. Are we not reinventing
cgroup controllers based on a device API? IMHO from a perspective of
someone unfamiliar with routing offload this seems backwards :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-06 18:27 ` Jakub Kicinski
@ 2019-08-06 18:38   ` Jiri Pirko
  2019-08-06 18:54     ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Pirko @ 2019-08-06 18:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: dsahern, netdev, davem, mlxsw, andrew, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

Tue, Aug 06, 2019 at 08:27:17PM CEST, jakub.kicinski@netronome.com wrote:
>On Tue, 6 Aug 2019 18:40:36 +0200, Jiri Pirko wrote:
>> Hi all.
>> 
>> I just discussed this with DavidA and I would like to bring this to
>> broader audience. David wants to limit kernel resources in network
>> namespaces, for example fibs, fib rules, etc.
>> 
>> He claims that devlink api is rich enough to program this limitations
>> as it already does for mlxsw hw resources for example. 
>
>TBH I don't see how you changed anything to do with FIB notifications,
>so the fact that the accounting is off now is a bit confusing. I don't
>understand how devlink, FIB and namespaces mix :(
>
>> If we have this api for hardware, why don't to reuse it for the
>> kernel and it's resources too?
>
>IMHO the netdevsim use of this API is a slight abuse, to prove the
>device can fail the FIB changes, nothing more..

It's slightly bigger abuse :) But in this thread, we are not discussing
netdevsim, but separate "dev".


>
>> So the proposal is to have some new device, say "kernelnet", that
>> would implicitly create per-namespace devlink instance. This devlink
>> instance would be used to setup resource limits. Like:
>> 
>> devlink resource set kernelnet path /IPv4/fib size 96
>> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
>> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8
>> 
>> To me it sounds a bit odd for kernel namespace to act as a device, but
>> thinking about it more, it makes sense. Probably better than to define
>> a new api. User would use the same tool to work with kernel and hw.
>> 
>> Also we can implement other devlink functionality, like dpipe.
>> User would then have visibility of network pipeline, tables,
>> utilization, etc. It is related to the resources too.
>> 
>> What do you think?
>
>I'm no expert here but seems counter intuitive that device tables would
>be aware of namespaces in the first place. Are we not reinventing
>cgroup controllers based on a device API? IMHO from a perspective of
>someone unfamiliar with routing offload this seems backwards :)

Can we use cgroup for fib and other limitations instead?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-06 18:38   ` Jiri Pirko
@ 2019-08-06 18:54     ` Jakub Kicinski
  2019-08-06 19:06       ` Andrew Lunn
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2019-08-06 18:54 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: dsahern, netdev, davem, mlxsw, andrew, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

On Tue, 6 Aug 2019 20:38:41 +0200, Jiri Pirko wrote:
> >> So the proposal is to have some new device, say "kernelnet", that
> >> would implicitly create per-namespace devlink instance. This devlink
> >> instance would be used to setup resource limits. Like:
> >> 
> >> devlink resource set kernelnet path /IPv4/fib size 96
> >> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
> >> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8
> >> 
> >> To me it sounds a bit odd for kernel namespace to act as a device, but
> >> thinking about it more, it makes sense. Probably better than to define
> >> a new api. User would use the same tool to work with kernel and hw.
> >> 
> >> Also we can implement other devlink functionality, like dpipe.
> >> User would then have visibility of network pipeline, tables,
> >> utilization, etc. It is related to the resources too.
> >> 
> >> What do you think?  
> >
> >I'm no expert here but seems counter intuitive that device tables would
> >be aware of namespaces in the first place. Are we not reinventing
> >cgroup controllers based on a device API? IMHO from a perspective of
> >someone unfamiliar with routing offload this seems backwards :)  
> 
> Can we use cgroup for fib and other limitations instead?

Not sure the question is to me, I don't feel particularly qualified,
I've never worked with VDCs or wrote a switch driver.. But I'd see
cgroups as a natural fit, and if I read Andrew's reply right so does
he.. There's certainly a feeling of reinventing the wheel here.

We usually model things in software and then compile that abstraction
into device terms. Devlink allows for low level access to the device,
it allows us to, in a sense, see the result of that compilation. But
that's more of a debugging/low level knob than first class citizen :(

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-06 18:54     ` Jakub Kicinski
@ 2019-08-06 19:06       ` Andrew Lunn
  2019-08-08 18:03         ` Jonathan Lemon
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Lunn @ 2019-08-06 19:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, dsahern, netdev, davem, mlxsw, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

On Tue, Aug 06, 2019 at 11:54:49AM -0700, Jakub Kicinski wrote:
> On Tue, 6 Aug 2019 20:38:41 +0200, Jiri Pirko wrote:
> > >> So the proposal is to have some new device, say "kernelnet", that
> > >> would implicitly create per-namespace devlink instance. This devlink
> > >> instance would be used to setup resource limits. Like:
> > >> 
> > >> devlink resource set kernelnet path /IPv4/fib size 96
> > >> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
> > >> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8
> > >> 
> > >> To me it sounds a bit odd for kernel namespace to act as a device, but
> > >> thinking about it more, it makes sense. Probably better than to define
> > >> a new api. User would use the same tool to work with kernel and hw.
> > >> 
> > >> Also we can implement other devlink functionality, like dpipe.
> > >> User would then have visibility of network pipeline, tables,
> > >> utilization, etc. It is related to the resources too.
> > >> 
> > >> What do you think?  
> > >
> > >I'm no expert here but seems counter intuitive that device tables would
> > >be aware of namespaces in the first place. Are we not reinventing
> > >cgroup controllers based on a device API? IMHO from a perspective of
> > >someone unfamiliar with routing offload this seems backwards :)  
> > 
> > Can we use cgroup for fib and other limitations instead?
> 
> Not sure the question is to me, I don't feel particularly qualified,
> I've never worked with VDCs or wrote a switch driver.. But I'd see
> cgroups as a natural fit, and if I read Andrew's reply right so does
> he.. 

Hi Jakub

I think there needs to be a clearly reasoned argument why cgroups is
the wrong answer to this problem. I myself don't know enough to give
that answer, but i can pose the question.

     Andrew


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-06 18:03   ` Andrew Lunn
@ 2019-08-07  2:33     ` David Ahern
  2019-08-07  2:59       ` Andrew Lunn
  2019-08-07 18:49       ` Jakub Kicinski
  0 siblings, 2 replies; 14+ messages in thread
From: David Ahern @ 2019-08-07  2:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jiri Pirko, netdev, davem, mlxsw, jakub.kicinski, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet,
	Jakub Kicinski

Some time back supported was added for devlink 'resources'. The idea is
that hardware (mlxsw) has limited resources (e.g., memory) that can be
allocated in certain ways (e.g., kvd for mlxsw) thus implementing
restrictions on the number of programmable entries (e.g., routes,
neighbors) by userspace.

I contend:

1. The kernel is an analogy to the hardware: it is programmed by
userspace, has limited resources (e.g., memory), and that users want to
control (e.g., limit) the number of networking entities that can be
programmed - routes, rules, nexthop objects etc and by address family
(ipv4, ipv6).

2. A consistent operational model across use cases - s/w forwarding, XDP
forwarding and hardware forwarding - is good for users deploying systems
based on the Linux networking stack. This aligns with my basic point at
LPC last November about better integration of XDP and kernel tables.

The existing devlink API is the right one for all use cases. Most
notably that the kernel can mimic the hardware from a resource
management. Trying to say 'use cgroups for s/w forwarding and devlink
for h/w forwarding' is complicating the lives of users. It is just a
model and models can apply to more than some rigid definition.

As for the namespace piece of this, the kernel's tables for networking
are *per namespace*, and so the resource controller must be per
namespace. This aligns with another consistent theme I have promoted
over the years - the ability to divide up a single ASIC into multiple,
virtual switches which are managed per namespace. This is a very popular
feature from a certain legacy vendor and one that would be good for open
networking to achieve. This is the basis of my response last week about
the devlink instance per namespace, and I thought Jiri was moving in
that direction until our chat today. Jiri's intention is something
different; we can discuss that on the next version of his patches.

###

As for the current controller put into netdevsim...

When I started down this road 18-20 months ago, I was copying a lot of
netdevsim code to create a fake device from which I could have a devlink
instance to implement the devlink resources. At some point it was silly
to keep duplicating the code - just make it part of netdevsim. After all
it really mirrors mlxsw and the resource limits for fib notifier
handling, it allows testing of the userspace APIs and in kernel notifier
APIs which allow an entity to veto a change. This is all consistent with
the intent of netdevsim - s/w based implementation for testing of APIs
that otherwise require hardware.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-07  2:33     ` David Ahern
@ 2019-08-07  2:59       ` Andrew Lunn
  2019-08-07  3:10         ` David Ahern
  2019-08-07 18:49       ` Jakub Kicinski
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Lunn @ 2019-08-07  2:59 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, jakub.kicinski, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

On Tue, Aug 06, 2019 at 08:33:47PM -0600, David Ahern wrote:
> Some time back supported was added for devlink 'resources'. The idea is
> that hardware (mlxsw) has limited resources (e.g., memory) that can be
> allocated in certain ways (e.g., kvd for mlxsw) thus implementing
> restrictions on the number of programmable entries (e.g., routes,
> neighbors) by userspace.
> 
> I contend:
> 
> 1. The kernel is an analogy to the hardware: it is programmed by
> userspace, has limited resources (e.g., memory), and that users want to
> control (e.g., limit) the number of networking entities that can be
> programmed - routes, rules, nexthop objects etc and by address family
> (ipv4, ipv6).
> 
> 2. A consistent operational model across use cases - s/w forwarding, XDP
> forwarding and hardware forwarding - is good for users deploying systems
> based on the Linux networking stack. This aligns with my basic point at
> LPC last November about better integration of XDP and kernel tables.

Hi David

Nice arguments.

However, zoom out a bit, from networking to the whole kernel. In
general, across the kernel as a whole, resource management is done
with cgroups. cgroups is the consistent operational model across the
kernel as a whole.

So i think you need a second leg to your argument. You have said why
devlink is the right way to do this. But you should also be able to
say to Tejun Heo why cgroups is the wrong way to do this, going
against the kernel as a whole model. Why is networking special?

      Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-07  2:59       ` Andrew Lunn
@ 2019-08-07  3:10         ` David Ahern
  2019-08-07 18:57           ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: David Ahern @ 2019-08-07  3:10 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jiri Pirko, netdev, davem, mlxsw, jakub.kicinski, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

On 8/6/19 8:59 PM, Andrew Lunn wrote:
> However, zoom out a bit, from networking to the whole kernel. In
> general, across the kernel as a whole, resource management is done
> with cgroups. cgroups is the consistent operational model across the
> kernel as a whole.
> 
> So i think you need a second leg to your argument. You have said why
> devlink is the right way to do this. But you should also be able to
> say to Tejun Heo why cgroups is the wrong way to do this, going
> against the kernel as a whole model. Why is networking special?
> 

So you are saying mlxsw should be using a cgroups based API for its
resources? netdevsim is for testing kernel APIs sans hardware. Is that
not what the fib controller netdevsim is doing? It is from my perspective.

I am not the one arguing to change code and functionality that has
existed for 16 months. I am arguing that the existing resource
controller satisfies all existing goals (testing in kernel APIs) and
even satisfies additional ones - like a consistent user experience
managing networking resources. ie.., I see no reason to change what exists.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-07  2:33     ` David Ahern
  2019-08-07  2:59       ` Andrew Lunn
@ 2019-08-07 18:49       ` Jakub Kicinski
  2019-08-07 20:55         ` David Ahern
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2019-08-07 18:49 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrew Lunn, Jiri Pirko, netdev, davem, mlxsw, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

On Tue, 6 Aug 2019 20:33:47 -0600, David Ahern wrote:
> Some time back supported was added for devlink 'resources'. The idea is
> that hardware (mlxsw) has limited resources (e.g., memory) that can be
> allocated in certain ways (e.g., kvd for mlxsw) thus implementing
> restrictions on the number of programmable entries (e.g., routes,
> neighbors) by userspace.
> 
> I contend:
> 
> 1. The kernel is an analogy to the hardware: it is programmed by
> userspace, has limited resources (e.g., memory), and that users want to
> control (e.g., limit) the number of networking entities that can be
> programmed - routes, rules, nexthop objects etc and by address family
> (ipv4, ipv6).

Memory hierarchy for ASIC is more complex and changes more often than
we want to change the model and kernel ABIs. The API in devlink is
intended for TCAM partitioning.

> 2. A consistent operational model across use cases - s/w forwarding, XDP
> forwarding and hardware forwarding - is good for users deploying systems
> based on the Linux networking stack. This aligns with my basic point at
> LPC last November about better integration of XDP and kernel tables.
> 
> The existing devlink API is the right one for all use cases. Most
> notably that the kernel can mimic the hardware from a resource
> management. Trying to say 'use cgroups for s/w forwarding and devlink
> for h/w forwarding' is complicating the lives of users. It is just a
> model and models can apply to more than some rigid definition.

This argument holds no water. Only a tiny fraction of Linux networking
users will have an high performance forwarding ASIC attached to their
CPUs. So we'll make 99.9% of users who never seen devlink learn the
tool for device control to control kernel resource?

Perhaps I'm misinterpreting your point there.

> As for the namespace piece of this, the kernel's tables for networking
> are *per namespace*, and so the resource controller must be per
> namespace. This aligns with another consistent theme I have promoted
> over the years - the ability to divide up a single ASIC into multiple,
> virtual switches which are managed per namespace. This is a very popular
> feature from a certain legacy vendor and one that would be good for open
> networking to achieve. This is the basis of my response last week about
> the devlink instance per namespace, and I thought Jiri was moving in
> that direction until our chat today. Jiri's intention is something
> different; we can discuss that on the next version of his patches.

Resource limits per namespace make perfect sense. Just not configured
via devlink..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-07  3:10         ` David Ahern
@ 2019-08-07 18:57           ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2019-08-07 18:57 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrew Lunn, Jiri Pirko, netdev, davem, mlxsw, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

On Tue, 6 Aug 2019 21:10:40 -0600, David Ahern wrote:
> On 8/6/19 8:59 PM, Andrew Lunn wrote:
> > However, zoom out a bit, from networking to the whole kernel. In
> > general, across the kernel as a whole, resource management is done
> > with cgroups. cgroups is the consistent operational model across the
> > kernel as a whole.
> > 
> > So i think you need a second leg to your argument. You have said why
> > devlink is the right way to do this. But you should also be able to
> > say to Tejun Heo why cgroups is the wrong way to do this, going
> > against the kernel as a whole model. Why is networking special?
> >   
> 
> So you are saying mlxsw should be using a cgroups based API for its
> resources? netdevsim is for testing kernel APIs sans hardware. Is that
> not what the fib controller netdevsim is doing? It is from my perspective.

Why would all the drivers have to pay attention to resource limits?
Shouldn't we try to implement that at a higher layer?

> I am not the one arguing to change code and functionality that has
> existed for 16 months. I am arguing that the existing resource
> controller satisfies all existing goals (testing in kernel APIs) and
> even satisfies additional ones - like a consistent user experience
> managing networking resources. ie.., I see no reason to change what exists.

Please don't use the netdevsim code as an argument that something
already exists. The only legitimate use of that code is to validate
the devlink resource API and that the notifier can fail the insertion.

We try to encourage adding tests and are generally more willing to
merge test code. Possible abuse of that for establishing precedents 
is worrying.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-07 18:49       ` Jakub Kicinski
@ 2019-08-07 20:55         ` David Ahern
  0 siblings, 0 replies; 14+ messages in thread
From: David Ahern @ 2019-08-07 20:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, Jiri Pirko, netdev, davem, mlxsw, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet

On 8/7/19 12:49 PM, Jakub Kicinski wrote:
> Perhaps I'm misinterpreting your point there.

yes, this thread is getting out of hand.

I am not pushing for an in-kernel, fib resource controller. Jiri wants
to remove the existing devlink resource code from netdevsim into a
standalone driver, code that was added for testing and as the commit log
shows as a demonstration of how one could create a controller using the
devlink API. I added some color commentary as to why a devlink
controller makes sense for the use case and how it should work, but I am
not asking for such a controller to be added to the kernel.

The netdevsim resource controller is counter based; the absolute
simplest form of limits. If I wanted basic counting for a fib resource
controller, I would add an option to limit the number of fib rules and
routes using sysctl similar to what exists for neighbors. Consistency. I
don't need the overhead and unrelated messiness of cgroups. I don't need
the overhead of handling fib notifiers. fib (rule) add -- check counter,
increment counter; fib (rule) delete -- decrement counter. Simple, per
namespace, done.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
  2019-08-06 19:06       ` Andrew Lunn
@ 2019-08-08 18:03         ` Jonathan Lemon
  0 siblings, 0 replies; 14+ messages in thread
From: Jonathan Lemon @ 2019-08-08 18:03 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Jiri Pirko, dsahern, netdev, davem, mlxsw,
	f.fainelli, vivien.didelot, mkubecek, stephen, daniel, brouer,
	eric.dumazet



On 6 Aug 2019, at 12:06, Andrew Lunn wrote:

> On Tue, Aug 06, 2019 at 11:54:49AM -0700, Jakub Kicinski wrote:
>> On Tue, 6 Aug 2019 20:38:41 +0200, Jiri Pirko wrote:
>>>>> So the proposal is to have some new device, say "kernelnet", that
>>>>> would implicitly create per-namespace devlink instance. This 
>>>>> devlink
>>>>> instance would be used to setup resource limits. Like:
>>>>>
>>>>> devlink resource set kernelnet path /IPv4/fib size 96
>>>>> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
>>>>> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules 
>>>>> size 8
>>>>>
>>>>> To me it sounds a bit odd for kernel namespace to act as a device, 
>>>>> but
>>>>> thinking about it more, it makes sense. Probably better than to 
>>>>> define
>>>>> a new api. User would use the same tool to work with kernel and 
>>>>> hw.
>>>>>
>>>>> Also we can implement other devlink functionality, like dpipe.
>>>>> User would then have visibility of network pipeline, tables,
>>>>> utilization, etc. It is related to the resources too.
>>>>>
>>>>> What do you think?
>>>>
>>>> I'm no expert here but seems counter intuitive that device tables 
>>>> would
>>>> be aware of namespaces in the first place. Are we not reinventing
>>>> cgroup controllers based on a device API? IMHO from a perspective 
>>>> of
>>>> someone unfamiliar with routing offload this seems backwards :)
>>>
>>> Can we use cgroup for fib and other limitations instead?
>>
>> Not sure the question is to me, I don't feel particularly qualified,
>> I've never worked with VDCs or wrote a switch driver.. But I'd see
>> cgroups as a natural fit, and if I read Andrew's reply right so does
>> he..
>
> Hi Jakub
>
> I think there needs to be a clearly reasoned argument why cgroups is
> the wrong answer to this problem. I myself don't know enough to give
> that answer, but i can pose the question.
>
>      Andrew

For the example above, the first question would be why is the 
restriction
based on the number of entries instead of their memory footprint?  The 
resource
being consumed is memory, so I'd think that should be what is monitored.

Quickly scanning the cgroups documentation, it seems there is a device 
controller,
so this isn't just process based.  ISTR that Larry Brakmo was working on 
a network
bandwidth limiter, which is controlled by cgroups.
-- 
Jonathan




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-08-08 18:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-06 16:40 [RFC] implicit per-namespace devlink instance to set kernel resource limitations Jiri Pirko
2019-08-06 17:38 ` David Ahern
2019-08-06 18:03   ` Andrew Lunn
2019-08-07  2:33     ` David Ahern
2019-08-07  2:59       ` Andrew Lunn
2019-08-07  3:10         ` David Ahern
2019-08-07 18:57           ` Jakub Kicinski
2019-08-07 18:49       ` Jakub Kicinski
2019-08-07 20:55         ` David Ahern
2019-08-06 18:27 ` Jakub Kicinski
2019-08-06 18:38   ` Jiri Pirko
2019-08-06 18:54     ` Jakub Kicinski
2019-08-06 19:06       ` Andrew Lunn
2019-08-08 18:03         ` Jonathan Lemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.