counting file descriptors with a cgroup controller

All of lore.kernel.org
 help / color / mirror / Atom feed

* counting file descriptors with a cgroup controller
       [not found] <CGME20170217093725eucas1p12478baf297d25303f3020f4973fbf3b0@eucas1p1.samsung.com>
@ 2017-02-17  9:37 ` Łukasz Stelmach
  2017-02-17 11:37   ` Krzysztof Opasiak
  0 siblings, 1 reply; 17+ messages in thread
From: Łukasz Stelmach @ 2017-02-17  9:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Krzysztof Opasiak, Karol Lewandowski

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

Hi,

We need to limit and monitor the number of file descriptors processes
keep open. If a process exceeds certain limit we'd like to terminate it
and restart it or reboot the whole system. Currently the RLIMIT API
allows limiting the number of file descriptors but to achieve our goals
we'd need to make sure all programmes we run handle EMFILE errno
properly. That is why we consider developing a cgroup controller that
limits the number of open file descriptors of its members (similar to
 memory controler).

Any comments? Is there any alternative that:

+ does not require modifications of user-land code,
+ enables other process (e.g. init) to be notified and apply policy.

Kind regards,
-- 
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
  2017-02-17  9:37 ` counting file descriptors with a cgroup controller Łukasz Stelmach
@ 2017-02-17 11:37   ` Krzysztof Opasiak
  2017-03-06 18:58       ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Krzysztof Opasiak @ 2017-02-17 11:37 UTC (permalink / raw)
  To: tj, lizefan, hannes
  Cc: Łukasz Stelmach, linux-kernel, Karol Lewandowski, cgroups

+ cgroups mailing list
+ cgroup maintainers

On 02/17/2017 10:37 AM, Łukasz Stelmach wrote:
> Hi,
> 
> We need to limit and monitor the number of file descriptors processes
> keep open. If a process exceeds certain limit we'd like to terminate it
> and restart it or reboot the whole system. Currently the RLIMIT API
> allows limiting the number of file descriptors but to achieve our goals
> we'd need to make sure all programmes we run handle EMFILE errno
> properly. That is why we consider developing a cgroup controller that
> limits the number of open file descriptors of its members (similar to
>  memory controler).
> 
> Any comments? Is there any alternative that:
> 
> + does not require modifications of user-land code,
> + enables other process (e.g. init) to be notified and apply policy.
> 
> Kind regards,
> 

-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-06 18:58       ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2017-03-06 18:58 UTC (permalink / raw)
  To: Krzysztof Opasiak
  Cc: lizefan, hannes, Łukasz Stelmach, linux-kernel,
	Karol Lewandowski, cgroups

Hello,

On Fri, Feb 17, 2017 at 12:37:11PM +0100, Krzysztof Opasiak wrote:
> > We need to limit and monitor the number of file descriptors processes
> > keep open. If a process exceeds certain limit we'd like to terminate it
> > and restart it or reboot the whole system. Currently the RLIMIT API
> > allows limiting the number of file descriptors but to achieve our goals
> > we'd need to make sure all programmes we run handle EMFILE errno
> > properly. That is why we consider developing a cgroup controller that
> > limits the number of open file descriptors of its members (similar to
> >  memory controler).
> > 
> > Any comments? Is there any alternative that:
> > 
> > + does not require modifications of user-land code,
> > + enables other process (e.g. init) to be notified and apply policy.

Hmm... I'm not quite sure fds qualify as an independent system-wide
resource.  We did that for pids because pids are globally limited and
can run out way earlier than memory backing it.  I don't think we have
similar restructions for fds, do we?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-06 18:58       ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2017-03-06 18:58 UTC (permalink / raw)
  To: Krzysztof Opasiak
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	Łukasz Stelmach, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Karol Lewandowski, cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Fri, Feb 17, 2017 at 12:37:11PM +0100, Krzysztof Opasiak wrote:
> > We need to limit and monitor the number of file descriptors processes
> > keep open. If a process exceeds certain limit we'd like to terminate it
> > and restart it or reboot the whole system. Currently the RLIMIT API
> > allows limiting the number of file descriptors but to achieve our goals
> > we'd need to make sure all programmes we run handle EMFILE errno
> > properly. That is why we consider developing a cgroup controller that
> > limits the number of open file descriptors of its members (similar to
> >  memory controler).
> > 
> > Any comments? Is there any alternative that:
> > 
> > + does not require modifications of user-land code,
> > + enables other process (e.g. init) to be notified and apply policy.

Hmm... I'm not quite sure fds qualify as an independent system-wide
resource.  We did that for pids because pids are globally limited and
can run out way earlier than memory backing it.  I don't think we have
similar restructions for fds, do we?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-07 11:19         ` Krzysztof Opasiak
  0 siblings, 0 replies; 17+ messages in thread
From: Krzysztof Opasiak @ 2017-03-07 11:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, hannes, Łukasz Stelmach, linux-kernel,
	Karol Lewandowski, cgroups

Hi

On 03/06/2017 07:58 PM, Tejun Heo wrote:
> Hello,
>
> On Fri, Feb 17, 2017 at 12:37:11PM +0100, Krzysztof Opasiak wrote:
>>> We need to limit and monitor the number of file descriptors processes
>>> keep open. If a process exceeds certain limit we'd like to terminate it
>>> and restart it or reboot the whole system. Currently the RLIMIT API
>>> allows limiting the number of file descriptors but to achieve our goals
>>> we'd need to make sure all programmes we run handle EMFILE errno
>>> properly. That is why we consider developing a cgroup controller that
>>> limits the number of open file descriptors of its members (similar to
>>>  memory controler).
>>>
>>> Any comments? Is there any alternative that:
>>>
>>> + does not require modifications of user-land code,
>>> + enables other process (e.g. init) to be notified and apply policy.
>
> Hmm... I'm not quite sure fds qualify as an independent system-wide
> resource.  We did that for pids because pids are globally limited and
> can run out way earlier than memory backing it.  I don't think we have
> similar restructions for fds, do we?

Well I'm not aware of such restrictions...

So maybe let me clarify our use case so we can have some more discussion 
about this. We are dealing with task of monitoring system services on an 
IoT system. So this system needs to run as long as possible without 
reboot just like server. In server world almost whole system state is 
being monitored by services like nagios. They measure each parameter 
(like cpu, memory etc) with some interval. Unfortunately we cannot use 
this it in an embedded system due to power consumption.

So generally now we consider two approaches:

1) Use rlimits when possible to limit resources for each process.

The problem here is that this creates an implicit requirement that all 
system services are well written and able to detect that they for 
example run out of fd and they will just exit with a suitable error code 
instead of hanging forever and responding to clients that they are 
unable to handle their request due to lack of fd. This is hard specially 
when service use a lot of libraries under the hood because they also 
need to return this error code from each functions which opens some 
files. This is especially hard when using some proprietary services or 
libraries for we don't have access to source code.

2) Use cgroups to limit and monitor resources usage

Generally systemd creates a cgroup for each service. cgroups like memory 
cgroup has an ability to notify userspace when memory usage reaches some 
level. So for example systemd could get notification that one of cgroups 
is using more memory than it should but as long as it's not a hard limit 
of the cgroup this service is not going to even notice this. So instead 
of returning error from for example malloc() in service, systemd could 
just send signal to that service and ask it to exit gracefully and the 
restart it. The disadvantage of this solution is the need of having 
cgroup for each resource we would like to monitor. For now we have 
suitable cgroups for everything we need apart from file descriptors.

What do you think about this? Maybe you have some other ideas how we 
could achieve this?

Best regards,
-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-07 11:19         ` Krzysztof Opasiak
  0 siblings, 0 replies; 17+ messages in thread
From: Krzysztof Opasiak @ 2017-03-07 11:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	Łukasz Stelmach, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Karol Lewandowski, cgroups-u79uwXL29TY76Z2rM5mHXA

Hi

On 03/06/2017 07:58 PM, Tejun Heo wrote:
> Hello,
>
> On Fri, Feb 17, 2017 at 12:37:11PM +0100, Krzysztof Opasiak wrote:
>>> We need to limit and monitor the number of file descriptors processes
>>> keep open. If a process exceeds certain limit we'd like to terminate it
>>> and restart it or reboot the whole system. Currently the RLIMIT API
>>> allows limiting the number of file descriptors but to achieve our goals
>>> we'd need to make sure all programmes we run handle EMFILE errno
>>> properly. That is why we consider developing a cgroup controller that
>>> limits the number of open file descriptors of its members (similar to
>>>  memory controler).
>>>
>>> Any comments? Is there any alternative that:
>>>
>>> + does not require modifications of user-land code,
>>> + enables other process (e.g. init) to be notified and apply policy.
>
> Hmm... I'm not quite sure fds qualify as an independent system-wide
> resource.  We did that for pids because pids are globally limited and
> can run out way earlier than memory backing it.  I don't think we have
> similar restructions for fds, do we?

Well I'm not aware of such restrictions...

So maybe let me clarify our use case so we can have some more discussion 
about this. We are dealing with task of monitoring system services on an 
IoT system. So this system needs to run as long as possible without 
reboot just like server. In server world almost whole system state is 
being monitored by services like nagios. They measure each parameter 
(like cpu, memory etc) with some interval. Unfortunately we cannot use 
this it in an embedded system due to power consumption.

So generally now we consider two approaches:

1) Use rlimits when possible to limit resources for each process.

The problem here is that this creates an implicit requirement that all 
system services are well written and able to detect that they for 
example run out of fd and they will just exit with a suitable error code 
instead of hanging forever and responding to clients that they are 
unable to handle their request due to lack of fd. This is hard specially 
when service use a lot of libraries under the hood because they also 
need to return this error code from each functions which opens some 
files. This is especially hard when using some proprietary services or 
libraries for we don't have access to source code.

2) Use cgroups to limit and monitor resources usage

Generally systemd creates a cgroup for each service. cgroups like memory 
cgroup has an ability to notify userspace when memory usage reaches some 
level. So for example systemd could get notification that one of cgroups 
is using more memory than it should but as long as it's not a hard limit 
of the cgroup this service is not going to even notice this. So instead 
of returning error from for example malloc() in service, systemd could 
just send signal to that service and ask it to exit gracefully and the 
restart it. The disadvantage of this solution is the need of having 
cgroup for each resource we would like to monitor. For now we have 
suitable cgroups for everything we need apart from file descriptors.

What do you think about this? Maybe you have some other ideas how we 
could achieve this?

Best regards,
-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
  2017-03-07 11:19         ` Krzysztof Opasiak
  (?)
@ 2017-03-07 19:41         ` Tejun Heo
  2017-03-07 20:06             ` Krzysztof Opasiak
  -1 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2017-03-07 19:41 UTC (permalink / raw)
  To: Krzysztof Opasiak
  Cc: lizefan, hannes, Łukasz Stelmach, linux-kernel,
	Karol Lewandowski, cgroups

Hello, Krzysztof.

On Tue, Mar 07, 2017 at 12:19:52PM +0100, Krzysztof Opasiak wrote:
> So maybe let me clarify our use case so we can have some more discussion
> about this. We are dealing with task of monitoring system services on an IoT
> system. So this system needs to run as long as possible without reboot just
> like server. In server world almost whole system state is being monitored by
> services like nagios. They measure each parameter (like cpu, memory etc)
> with some interval. Unfortunately we cannot use this it in an embedded
> system due to power consumption.

So, we don't add controllers for specific use case scenarios.  The
target actually has to be a fundamental resource which can't be
isolated in a different way.

The use case you're describing is more about working around
shortcomings in userspace by implemneting a major kernel feature, when
the said shortcomings can easily be controlled and mitigated from
userspace - e.g. if running out of fds can't be handled reliably from
the target application for some reason and the application may lock up
from the condition, protect the base resources so that a monitoring
process can always reliably run and let that take a corrective action
when such condition is detected.

This doesn't really seem to qualify as a dedicated kernel
functionality.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-07 20:06             ` Krzysztof Opasiak
  0 siblings, 0 replies; 17+ messages in thread
From: Krzysztof Opasiak @ 2017-03-07 20:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, hannes, Łukasz Stelmach, linux-kernel,
	Karol Lewandowski, cgroups

On 03/07/2017 08:41 PM, Tejun Heo wrote:
> Hello, Krzysztof.
>
> On Tue, Mar 07, 2017 at 12:19:52PM +0100, Krzysztof Opasiak wrote:
>> So maybe let me clarify our use case so we can have some more discussion
>> about this. We are dealing with task of monitoring system services on an IoT
>> system. So this system needs to run as long as possible without reboot just
>> like server. In server world almost whole system state is being monitored by
>> services like nagios. They measure each parameter (like cpu, memory etc)
>> with some interval. Unfortunately we cannot use this it in an embedded
>> system due to power consumption.
>
> So, we don't add controllers for specific use case scenarios.  The
> target actually has to be a fundamental resource which can't be
> isolated in a different way.
>
> The use case you're describing is more about working around
> shortcomings in userspace by implemneting a major kernel feature, when
> the said shortcomings can easily be controlled and mitigated from
> userspace - e.g. if running out of fds can't be handled reliably from
> the target application for some reason and the application may lock up
> from the condition, protect the base resources so that a monitoring
> process can always reliably run and let that take a corrective action
> when such condition is detected.
>

In theory that's what we plan to do but we are looking for an efficient 
method of detecting that this particular application is using more fds 
than it should (declared by developer).

Personally, I don't want to use rlimit for this as it ends up returning 
error code from for example open() when we hit the limit. This may lead 
to some unpredictable crashes in  services (esp. those poor proprietary 
binary blobs). Instead of injecting errors to service we would like to 
just get notification that this service has more opened fds than it 
should and ask it to restart in a polite way.

For memory seems to be quite easy to achieve as we can just get eventfd 
notification when application passes given memory usage using memory 
cgroup controller. Maybe you know some efficient method to do the same 
for fds?

Best regards,
-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-07 20:06             ` Krzysztof Opasiak
  0 siblings, 0 replies; 17+ messages in thread
From: Krzysztof Opasiak @ 2017-03-07 20:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	Łukasz Stelmach, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Karol Lewandowski, cgroups-u79uwXL29TY76Z2rM5mHXA



On 03/07/2017 08:41 PM, Tejun Heo wrote:
> Hello, Krzysztof.
>
> On Tue, Mar 07, 2017 at 12:19:52PM +0100, Krzysztof Opasiak wrote:
>> So maybe let me clarify our use case so we can have some more discussion
>> about this. We are dealing with task of monitoring system services on an IoT
>> system. So this system needs to run as long as possible without reboot just
>> like server. In server world almost whole system state is being monitored by
>> services like nagios. They measure each parameter (like cpu, memory etc)
>> with some interval. Unfortunately we cannot use this it in an embedded
>> system due to power consumption.
>
> So, we don't add controllers for specific use case scenarios.  The
> target actually has to be a fundamental resource which can't be
> isolated in a different way.
>
> The use case you're describing is more about working around
> shortcomings in userspace by implemneting a major kernel feature, when
> the said shortcomings can easily be controlled and mitigated from
> userspace - e.g. if running out of fds can't be handled reliably from
> the target application for some reason and the application may lock up
> from the condition, protect the base resources so that a monitoring
> process can always reliably run and let that take a corrective action
> when such condition is detected.
>

In theory that's what we plan to do but we are looking for an efficient 
method of detecting that this particular application is using more fds 
than it should (declared by developer).

Personally, I don't want to use rlimit for this as it ends up returning 
error code from for example open() when we hit the limit. This may lead 
to some unpredictable crashes in  services (esp. those poor proprietary 
binary blobs). Instead of injecting errors to service we would like to 
just get notification that this service has more opened fds than it 
should and ask it to restart in a polite way.

For memory seems to be quite easy to achieve as we can just get eventfd 
notification when application passes given memory usage using memory 
cgroup controller. Maybe you know some efficient method to do the same 
for fds?

Best regards,
-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-07 20:48               ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2017-03-07 20:48 UTC (permalink / raw)
  To: Krzysztof Opasiak
  Cc: lizefan, hannes, Łukasz Stelmach, linux-kernel,
	Karol Lewandowski, cgroups

Hello,

On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
> Personally, I don't want to use rlimit for this as it ends up returning
> error code from for example open() when we hit the limit. This may lead to
> some unpredictable crashes in  services (esp. those poor proprietary binary
> blobs). Instead of injecting errors to service we would like to just get
> notification that this service has more opened fds than it should and ask it
> to restart in a polite way.
> 
> For memory seems to be quite easy to achieve as we can just get eventfd
> notification when application passes given memory usage using memory cgroup
> controller. Maybe you know some efficient method to do the same for fds?

So, if all you wanna do is reliably detecting open(2) failures, can't
you do that with bpf tracing?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-07 20:48               ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2017-03-07 20:48 UTC (permalink / raw)
  To: Krzysztof Opasiak
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	Łukasz Stelmach, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Karol Lewandowski, cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
> Personally, I don't want to use rlimit for this as it ends up returning
> error code from for example open() when we hit the limit. This may lead to
> some unpredictable crashes in  services (esp. those poor proprietary binary
> blobs). Instead of injecting errors to service we would like to just get
> notification that this service has more opened fds than it should and ask it
> to restart in a polite way.
> 
> For memory seems to be quite easy to achieve as we can just get eventfd
> notification when application passes given memory usage using memory cgroup
> controller. Maybe you know some efficient method to do the same for fds?

So, if all you wanna do is reliably detecting open(2) failures, can't
you do that with bpf tracing?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-08  2:59                 ` Parav Pandit
  0 siblings, 0 replies; 17+ messages in thread
From: Parav Pandit @ 2017-03-08  2:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Krzysztof Opasiak, Li Zefan, Johannes Weiner,
	Łukasz Stelmach, Linux Kernel Mailing List,
	Karol Lewandowski, cgroups

Hi,

On Tue, Mar 7, 2017 at 2:48 PM, Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
> > Personally, I don't want to use rlimit for this as it ends up returning
> > error code from for example open() when we hit the limit. This may lead to
> > some unpredictable crashes in  services (esp. those poor proprietary binary
> > blobs). Instead of injecting errors to service we would like to just get
> > notification that this service has more opened fds than it should and ask it
> > to restart in a polite way.
> >

How does those poor proprietary binary blobs remain polite after restart?
Do you mean you want to keep restarting them when it reaches the limit?

> > For memory seems to be quite easy to achieve as we can just get eventfd
> > notification when application passes given memory usage using memory cgroup
> > controller. Maybe you know some efficient method to do the same for fds?
>
> So, if all you wanna do is reliably detecting open(2) failures, can't
> you do that with bpf tracing?
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-08  2:59                 ` Parav Pandit
  0 siblings, 0 replies; 17+ messages in thread
From: Parav Pandit @ 2017-03-08  2:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Krzysztof Opasiak, Li Zefan, Johannes Weiner,
	Łukasz Stelmach, Linux Kernel Mailing List,
	Karol Lewandowski, cgroups-u79uwXL29TY76Z2rM5mHXA

Hi,

On Tue, Mar 7, 2017 at 2:48 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> Hello,
>
> On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
> > Personally, I don't want to use rlimit for this as it ends up returning
> > error code from for example open() when we hit the limit. This may lead to
> > some unpredictable crashes in  services (esp. those poor proprietary binary
> > blobs). Instead of injecting errors to service we would like to just get
> > notification that this service has more opened fds than it should and ask it
> > to restart in a polite way.
> >

How does those poor proprietary binary blobs remain polite after restart?
Do you mean you want to keep restarting them when it reaches the limit?

> > For memory seems to be quite easy to achieve as we can just get eventfd
> > notification when application passes given memory usage using memory cgroup
> > controller. Maybe you know some efficient method to do the same for fds?
>
> So, if all you wanna do is reliably detecting open(2) failures, can't
> you do that with bpf tracing?
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-08 10:19                   ` Krzysztof Opasiak
  0 siblings, 0 replies; 17+ messages in thread
From: Krzysztof Opasiak @ 2017-03-08 10:19 UTC (permalink / raw)
  To: Parav Pandit, Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Łukasz Stelmach,
	Linux Kernel Mailing List, Karol Lewandowski, cgroups



On 03/08/2017 03:59 AM, Parav Pandit wrote:
> Hi,
>
> On Tue, Mar 7, 2017 at 2:48 PM, Tejun Heo <tj@kernel.org> wrote:
>>
>> Hello,
>>
>> On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
>>> Personally, I don't want to use rlimit for this as it ends up returning
>>> error code from for example open() when we hit the limit. This may lead to
>>> some unpredictable crashes in  services (esp. those poor proprietary binary
>>> blobs). Instead of injecting errors to service we would like to just get
>>> notification that this service has more opened fds than it should and ask it
>>> to restart in a polite way.
>>>
>
> How does those poor proprietary binary blobs remain polite after restart?

They wont.

> Do you mean you want to keep restarting them when it reaches the limit?

We'd like to restart them each time when they reach limit declared by 
developer.

Best regards,
-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
@ 2017-03-08 10:19                   ` Krzysztof Opasiak
  0 siblings, 0 replies; 17+ messages in thread
From: Krzysztof Opasiak @ 2017-03-08 10:19 UTC (permalink / raw)
  To: Parav Pandit, Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Łukasz Stelmach,
	Linux Kernel Mailing List, Karol Lewandowski,
	cgroups-u79uwXL29TY76Z2rM5mHXA



On 03/08/2017 03:59 AM, Parav Pandit wrote:
> Hi,
>
> On Tue, Mar 7, 2017 at 2:48 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>
>> Hello,
>>
>> On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
>>> Personally, I don't want to use rlimit for this as it ends up returning
>>> error code from for example open() when we hit the limit. This may lead to
>>> some unpredictable crashes in  services (esp. those poor proprietary binary
>>> blobs). Instead of injecting errors to service we would like to just get
>>> notification that this service has more opened fds than it should and ask it
>>> to restart in a polite way.
>>>
>
> How does those poor proprietary binary blobs remain polite after restart?

They wont.

> Do you mean you want to keep restarting them when it reaches the limit?

We'd like to restart them each time when they reach limit declared by 
developer.

Best regards,
-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
  2017-03-07 20:48               ` Tejun Heo
  (?)
  (?)
@ 2017-03-08  9:52               ` Krzysztof Opasiak
  2017-03-08 18:59                 ` Tejun Heo
  -1 siblings, 1 reply; 17+ messages in thread
From: Krzysztof Opasiak @ 2017-03-08  9:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, hannes, Łukasz Stelmach, linux-kernel,
	Karol Lewandowski, cgroups

On 03/07/2017 09:48 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
>> Personally, I don't want to use rlimit for this as it ends up returning
>> error code from for example open() when we hit the limit. This may lead to
>> some unpredictable crashes in  services (esp. those poor proprietary binary
>> blobs). Instead of injecting errors to service we would like to just get
>> notification that this service has more opened fds than it should and ask it
>> to restart in a polite way.
>>
>> For memory seems to be quite easy to achieve as we can just get eventfd
>> notification when application passes given memory usage using memory cgroup
>> controller. Maybe you know some efficient method to do the same for fds?
>
> So, if all you wanna do is reliably detecting open(2) failures, can't
> you do that with bpf tracing?
>

Well detecting failures of open is not enough and it has couple of problems:

1) open(2) is not the only syscall which creates fd. In addition to 
other syscalls like socket(2), dup(2), some ioctl() on drivers (for 
example video) also creates fds. I'm not sure if we have any other 
mechanism than grep through kernel source to find out which ioctl() 
creates fd or and which not.

2) As far as I know (I'm not a bpf specialist so please correct me if 
I'm wrong), with bpf we are able only to detect such events but we are 
unable to prevent them from getting to caller. It means that service 
will know that it run out of fds and will need to handle this properly. 
If there is a bug in this error path service may crash.
What we would like to get is just a notification to external process 
that some limit has been reached without returning error to service itself.

3) Theoretically we could do this using bpf or syscall auditing and 
count fds for each userspace process or check /proc/<PID> after each 
notification but it's getting very heavy for production environment.

Best regards,
-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: counting file descriptors with a cgroup controller
  2017-03-08  9:52               ` Krzysztof Opasiak
@ 2017-03-08 18:59                 ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2017-03-08 18:59 UTC (permalink / raw)
  To: Krzysztof Opasiak
  Cc: lizefan, hannes, Łukasz Stelmach, linux-kernel,
	Karol Lewandowski, cgroups

On Wed, Mar 08, 2017 at 10:52:18AM +0100, Krzysztof Opasiak wrote:
> Well detecting failures of open is not enough and it has couple of problems:
> 
> 1) open(2) is not the only syscall which creates fd. In addition to other
> syscalls like socket(2), dup(2), some ioctl() on drivers (for example video)
> also creates fds. I'm not sure if we have any other mechanism than grep
> through kernel source to find out which ioctl() creates fd or and which not.
> 
> 2) As far as I know (I'm not a bpf specialist so please correct me if I'm
> wrong), with bpf we are able only to detect such events but we are unable to
> prevent them from getting to caller. It means that service will know that it
> run out of fds and will need to handle this properly. If there is a bug in
> this error path service may crash.
> What we would like to get is just a notification to external process that
> some limit has been reached without returning error to service itself.
> 
> 3) Theoretically we could do this using bpf or syscall auditing and count
> fds for each userspace process or check /proc/<PID> after each notification
> but it's getting very heavy for production environment.

We simply can't design the kernel to accomodate bandaid workarounds
for grossly misbehaving applications.  If you can find something which
can solve the problem using wider scope tools like bpf, seccomp, and
what not, great.  If not, too bad, but we can't burdern everyone else
with workarounds for the extremely specific and contrived issues that
you're seeing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-03-08 19:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20170217093725eucas1p12478baf297d25303f3020f4973fbf3b0@eucas1p1.samsung.com>
2017-02-17  9:37 ` counting file descriptors with a cgroup controller Łukasz Stelmach
2017-02-17 11:37   ` Krzysztof Opasiak
2017-03-06 18:58     ` Tejun Heo
2017-03-06 18:58       ` Tejun Heo
2017-03-07 11:19       ` Krzysztof Opasiak
2017-03-07 11:19         ` Krzysztof Opasiak
2017-03-07 19:41         ` Tejun Heo
2017-03-07 20:06           ` Krzysztof Opasiak
2017-03-07 20:06             ` Krzysztof Opasiak
2017-03-07 20:48             ` Tejun Heo
2017-03-07 20:48               ` Tejun Heo
2017-03-08  2:59               ` Parav Pandit
2017-03-08  2:59                 ` Parav Pandit
2017-03-08 10:19                 ` Krzysztof Opasiak
2017-03-08 10:19                   ` Krzysztof Opasiak
2017-03-08  9:52               ` Krzysztof Opasiak
2017-03-08 18:59                 ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.